请问HN:我应该关注哪些说话人分离工具?

2作者: justforfunhere29 天前原帖
你好, 我正在制作一个工具,需要分析两个人之间的对话(非英语)。这段对话以音频格式提供给我。目前,我正在使用OpenAI的Whisper进行转录,并通过API将转录文本输入到ChatGPT-4o模型中进行分析。 到目前为止,它的表现还不错。不过,有时在阅读转录文本时,我发现很难判断哪个说话者在说什么。我不得不听音频来弄清楚。我在想,ChatGPT-4o是否也会有时难以从转录文本中跟上对话。我认为增加一个说话者分离的步骤可能会使转录更易于理解和分析。 我正在寻找可以使用的说话者分离工具。我尝试过使用pyannote speaker-diarization-3.1,但发现效果不是很好。还有哪些其他选项可以考虑呢?
查看原文
Hi,<p>I am making a tool that needs to analyze a conversation (non-English) between two people. The conversation is provided to me in audio format. I am currently using OpenAI Whisper to transcribe and feed the transcription to ChatGPT-4o model through the API for analysis.<p>So far, it&#x27;s doing a fair job. Sometimes, though, reading the transcription, I find it hard to figure out which speaker is speaking what. I have to listen to the audio to figure it out. I am wondering if ChatGPT-4o would also sometimes find it hard to follow the conversation from the transcription. I think that adding a speaker diarization step might make the transcription easier to understand and analyze.<p>I am looking for Speaker Diarization tools that I can use. I have tried using pyannote speaker-diarization-3.1, but I find it does not work very well. What are some other options that I can look at?