Skip to content

feat(spk): optionally return per-speaker embedding centroids#1

Closed
phoenixray2000 wants to merge 1 commit into
mainfrom
feat/spk-embedding-center
Closed

feat(spk): optionally return per-speaker embedding centroids#1
phoenixray2000 wants to merge 1 commit into
mainfrom
feat/spk-embedding-center

Conversation

@phoenixray2000

Copy link
Copy Markdown
Owner

What

Opt-in return_spk_center flag for AutoModel.generate (speaker diarization path). When enabled, the result includes spk_embedding_center: a [num_speakers, embedding_dim] array of per-speaker centroid embeddings, indexed by the spk ids in sentence_info.

Why

postprocess() already computes these centroids (mean of clustered chunk embeddings) for diarization but discards them. Surfacing them lets downstream speaker-voiceprint / identity workflows reuse the embeddings without a second extraction pass.

Compatibility

Opt-in, default off. postprocess() return shape unchanged unless return_spk_center=True; existing callers (auto_model, auto_frontend) unaffected.

Verification

Local run (paraformer-zh + ERes2NetV2, punc_segment): a 2-speaker clip returns spk_embedding_center shape (2, 192), matching the 2 speakers in sentence_info; cross-speaker cosine 0.34 (distinct).

🤖 Generated with Claude Code

Add a return_spk_center option so AutoModel.generate surfaces the per-speaker centroid embeddings (mean of clustered chunk embeddings) that diarization already computes in postprocess() but currently discards. Lets downstream speaker voiceprint / identity reuse them without re-embedding. Backward compatible: default off; postprocess return shape is unchanged unless return_spk_center=True.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6fa3a8ad0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +191 to +194
if return_spk_center:
# spk_embs[i] is the centroid (mean of clustered chunk embeddings) for
# corrected speaker label i, aligned with the `spk` ids in sentence_info.
return distribute_res, spk_embs

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Recompute centroids after smoothing speaker labels

For recordings containing diarization regions shorter than smooth()'s 0.7s threshold, smooth() can reassign those regions to neighboring speakers, but spk_embs was already computed from the pre-smoothed labels. Returning it here means spk_embedding_center can include speakers that no longer appear in sentence_info, and the remaining speakers' centroids exclude embeddings that were assigned to them in the final diarization output, so downstream voiceprint matching uses centroids that do not match the returned spk IDs.

Useful? React with 👍 / 👎.

@phoenixray2000

Copy link
Copy Markdown
Owner Author

改为向上游 modelscope/FunASR 提交:modelscope#2967

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant