HackerNews中文版

你好，我正在制作一个工具，需要分析两个人之间的对话（非英语）。这段对话以音频格式提供给我。目前，我正在使用OpenAI的Whisper进行转录，并通过API将转录文本输入到ChatGPT-4o模型中进行分析。到目前为止，它的表现还不错。不过，有时在阅读转录文本时，我发现很难判断哪个说话者在说什么。我不得不听音频来弄清楚。我在想，ChatGPT-4o是否也会有时难以从转录文本中跟上对话。我认为增加一个说话者分离的步骤可能会使转录更易于理解和分析。我正在寻找可以使用的说话者分离工具。我尝试过使用pyannote speaker-diarization-3.1，但发现效果不是很好。还有哪些其他选项可以考虑呢？

查看原文

Hi,<p>I am making a tool that needs to analyze a conversation (non-English) between two people. The conversation is provided to me in audio format. I am currently using OpenAI Whisper to transcribe and feed the transcription to ChatGPT-4o model through the API for analysis.<p>So far, it's doing a fair job. Sometimes, though, reading the transcription, I find it hard to figure out which speaker is speaking what. I have to listen to the audio to figure it out. I am wondering if ChatGPT-4o would also sometimes find it hard to follow the conversation from the transcription. I think that adding a speaker diarization step might make the transcription easier to understand and analyze.<p>I am looking for Speaker Diarization tools that I can use. I have tried using pyannote speaker-diarization-3.1, but I find it does not work very well. What are some other options that I can look at?

请问HN：我应该关注哪些说话人分离工具？