Skip to content

Feature request: Reuse speaker embeddings across multiple audio files #1156

Open
@bianchilo

Description

@bianchilo

Hi, thank you for the fantastic work on WhisperX!

Among the most interesting and valuable features in WhisperX, I’d say speaker diarization stands out. It adds a lot of practical power to transcription workflows.

That said, I wanted to ask if it might be possible to add an option to persist (e.g. export/save) speaker embeddings generated during diarization, and allow them to be reused across multiple sessions or audio files. This could enable speaker consistency across different diarization runs, especially in workflows that involve batching or long-form audio broken into parts.

Possible Implementation Ideas

  • Option to export speaker embeddings after diarization
  • Parameter to load previously saved embeddings and match against them
  • Optional flag to use a cache for embeddings during batch processing

Final note

I understand this could be technically complex and might impact the current pipeline, especially with how pyannote or WeSpeaker is integrated. But I wonder — do you think something like this would be possible or useful within the scope of WhisperX?

Thanks again for the great tool!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions