HackerNews中文版

由于需要跨多个系统进行协调以及链式调用大型语言模型（LLM），目前许多代理的响应速度可能会显得非常缓慢。我很想知道其他人是如何解决这个问题的： - 你们是如何识别代理中的性能瓶颈的？ - 哪些类型的改动为你们带来了最大的速度提升？对我们来说，我们开发了一个分析工具来识别缓慢的LLM调用——有时我们可以在这个步骤中更换为更快的模型，或者意识到可以通过消除不必要的上下文来减少输入的令牌数量。对于需要外部访问（如浏览器使用、API调用）的步骤，我们已经转向使用快速启动的外部容器和线程池来实现并行处理。我们还尝试了一些用户界面的改动，以掩盖部分延迟。还有哪些其他提升性能的技术是大家在使用的？

查看原文

Because of the coordination across multiple systems + chaining LLM calls, a lot of agents today can feel really slow. I would love to know how others are tackling this:<p>- How are you all identifying performance bottlenecks in agents?<p>- What types of changes have gotten you the biggest speedups?<p>For us we vibe-coded a profiler to identify slow LLM calls - sometimes we could then switch out a faster model for that step or we'd realize we could shrink the input tokens by eliminating unnecessary context. For steps requiring external access (browser usage, API calls), we've moved to fast start external containers + thread pools for parallelization. We've also experimented some with UI changes to mask some of the latency.<p>What other performance enhancing techniques are people using?

问HN：对于那些正在构建AI代理的人，你们是如何提高它们的速度的？