HackerNews中文版

大家好，我是来自Leaping AI的阿尔卡迪（Arkadiy），我们的官网是<a href="https://leapingai.com">https://leapingai.com</a>。Leaping允许您以多阶段、图形化的格式构建语音AI代理，这使得测试和改进变得更加容易。通过评估通话的每个阶段，我们可以将错误和回归追溯到特定阶段。然后，我们会自主调整该阶段的提示并进行A/B测试，从而使代理能够随着时间的推移自我改进。您可以直接与我们的一个机器人对话，网址是<a href="https://leapingai.com">https://leapingai.com</a>，还有一个演示视频可以在<a href="https://www.youtube.com/watch?v=xSajXYJmxW4" rel="nofollow">https://www.youtube.com/watch?v=xSajXYJmxW4</a>观看。大型公司对AI开始接听电话自然是有所顾虑——这项技术在某种程度上是有效的，但往往效果不佳。如果他们真的下定决心，通常会花费数月时间来调整仅一个用例的提示，有时甚至最终不会发布语音机器人。问题是双向的：用简单的语言准确地指定机器人应该如何行为并非易事，同时确保大型语言模型（LLM）始终按照您的意图执行指令也是一项繁琐的工作。现有的语音AI解决方案在复杂用例的设置上非常麻烦。它们需要数月时间来处理所有边缘案例，然后再花几个月时间进行监控和改进提示。我们通过运行持续的分析和测试循环，比人类提示者做得更好、更快。我们的技术大致分为三个子组件：核心库、语音服务器和自我改进逻辑。核心库负责建模和执行多阶段（类似n8n风格）的语音代理。对于语音服务器，我们使用传统的STT（语音转文本）->LLM（大型语言模型）->TTS（文本转语音）级联方式。我们尝试过语音对语音模型，虽然与之对话的体验非常好，但功能调用的性能显然较差，因此我们仍在等待它们的改进。自我改进的工作原理是首先收集对话指标和评估结果，以生成“反馈”，即关于如何改进语音代理设置的具体想法。在收集到足够的反馈后，我们会触发一个专门的自我改进代理进行运行。它是一种光标风格的AI，能够访问各种工具，改变主要的语音代理。它可以重写提示，配置一个阶段使用摘要对话而不是完整对话，等等。每次迭代都会生成代理的新快照，使我们能够将一小部分流量引导到它上面，并在情况良好时将其推广到生产环境。这个循环可以在没有任何人工干预的情况下运行，从而使代理能够自我改进。 Leaping对用例没有特定限制，但我们目前专注于入站客户支持（旅游、零售、房地产等）和潜在客户预筛选（医疗保险、家庭服务、绩效营销），因为我们在这些领域有很多成功案例。我们最初在德国起步，因为我们在大学时在那里，但最初的增长面临挑战。我们决定立即针对企业客户，但他们对将语音AI作为公司前台“面孔”的接受度不高。此外，对于每天有成千上万通电话的企业来说，监控所有电话并手动调整代理是不可行的。为了应对他们非常合理的担忧，我们将所有精力投入到可靠性上——至今仍未提供自助访问，这也是我们尚未制定固定定价的原因之一。（此外，对于某些客户，我们采用基于结果的定价，即对于未转化为潜在客户的电话不收取任何费用，仅对成功转化的电话收费。）自从我们进入YC并搬到美国以来，事情开始加速发展，但如果您试图向大型企业销售，谨慎的情绪在这里也依然存在。我们相信，出色地进行评估、模拟和A/B测试是我们的竞争优势，也是我们能够解决大型、敏感用例的关键。我们非常希望听到您的想法和反馈！

查看原文

Hey HN, I'm Arkadiy from Leaping AI (<a href="https://leapingai.com">https://leapingai.com</a>). Leaping lets you build voice AI agents in a multi-stage, graph-like format that makes testing and improvement much easier. By evaluating each stage of a call, we can trace errors and regressions to a particular stage. Then we autonomously vary the prompt for that stage and A/B test it, allowing agents to self-improve over time.You can talk to one of our bots directly at <a href="https://leapingai.com">https://leapingai.com</a>, and there’s a demo video at <a href="https://www.youtube.com/watch?v=xSajXYJmxW4" rel="nofollow">https://www.youtube.com/watch?v=xSajXYJmxW4</a>.Large companies are understandably reluctant to have AI start picking up their phone calls—the technology kind of works, but often not very well. If they do take the plunge, they often end up spending months tuning the prompts for just one use-case, and sometimes never even end up releasing the voice bot.The problem is two-sided: it's non-trivial to specify the exact way a bot should behave using plain language, and it's tedious to ensure the LLM always follows your instructions the way you intended them.Existing voice AI solutions are a pain to set up for complex use cases. They require months of prompting all edge cases before going live, and then months of monitoring and improving prompting afterwards. We do that better than human prompters, and much faster, by running a continuous analysis + testing loop.Our tech is roughly divided into three subcomponents: core library, voice server, and self-improvement logic. Core library models and executes the multi-stage (think n8n-style) voice agents. For the voice server we are using the ol’ reliable cascading way of STT->LLM->TTS. We tried out the voice-to-voice models, and although they felt really great to talk to, function-calling performance was expectedly much worse, so we are still waiting for them to get better.The self-improvement works by first taking conversation metrics and evaluation results to produce ‘feedback’, i.e. specific ideas how the voice agent setup could be improved. After enough feedback is collected, we trigger a run of a specialized self-improvement agent. It is a cursor-style AI with access to various tools that changes the main voice agent. It can rewrite prompts, configure a stage to use a summarized conversation instead of a full one, and more. Each iteration produces a new snapshot of the agent, enabling us to route a small part of the traffic to it and promote it to production if things look ok. This loop can be set to run without any human involvement, thus making agents self-improve.Leaping is use-case agnostic, but we currently focus on inbound customer support (travel, retail, real estate, etc.) and lead pre-qualification (medicare, home services, performance marketing) since we have a lot of success stories there.We started out in Germany since that’s where we were in university, but initially growth was challenging. We decided to target enterprise customers right away and they showed reluctance to adopt voice AI as the front-door ‘face’ of their company. Additionally, for an enterprise with thousands of calls daily, it is infeasible to monitor all the calls and tune agents manually. To address their very valid concerns, we put all effort into reliability—and still haven’t gotten around to offering self-serve access, which is one reason we don’t have fixed pricing yet. (Also, with some clients we have outcome-based pricing, i.e. you pay nothing for calls that didn't convert a lead, only the ones that did.)Things picked up momentum ever since we got into YC and moved to the US, but the cautious sentiment is also present here if you try to sell to big enterprises. We believe that doing evals, simulation, and A/B testing really really well is our competitive edge and what will enable us to solve large, sensitive use cases.We’d love to hear your thoughts and feedback!

Launch HN: Leaping (YC W25) – 自我提升的语音人工智能