Description
Hi, thank you for the fantastic work on WhisperX!
Among the most interesting and valuable features in WhisperX, I’d say speaker diarization stands out. It adds a lot of practical power to transcription workflows.
That said, I wanted to ask if it might be possible to add an option to persist (e.g. export/save) speaker embeddings generated during diarization, and allow them to be reused across multiple sessions or audio files. This could enable speaker consistency across different diarization runs, especially in workflows that involve batching or long-form audio broken into parts.
Possible Implementation Ideas
- Option to export speaker embeddings after diarization
- Parameter to load previously saved embeddings and match against them
- Optional flag to use a cache for embeddings during batch processing
Final note
I understand this could be technically complex and might impact the current pipeline, especially with how pyannote or WeSpeaker is integrated. But I wonder — do you think something like this would be possible or useful within the scope of WhisperX?
Thanks again for the great tool!