问HN:构建AI语音助手的方法

4作者: warthog9 个月前原帖
你用来构建语音代理的最佳技术栈和方法是什么?<p>我的困惑如下:<p>1. 语音对语音的技术很有前景,但在质量上还不够理想。不确定底层使用了什么样的模型,但根据我的经验,响应质量比4o还要差。<p>2. 我没有使用过Livekit,但它似乎非常受欢迎。不过不太明白为什么需要它。<p>3. 中断处理:我没有遇到过能够很好处理中断的模型或系统。根据我的经验,即使是4o,在大约两分钟的对话后,遇到一次中断也会变得非常困惑。
查看原文
What are the best stack and methods you use to build voice agents?<p>My struggles are as follows:<p>1. Voice to voice is promising but is not quite there in quality. Not sure what kind of model is being leveraged underneath but responses are worse than 4o<p>2. Have not used Livekit but seems very popular. Though not sure why it is needed<p>3. Interruption handling: Did not come across a model or system that handles these well. Even 4o gets highly confused after around 2 minutes of talking and one single interruption in my experience