问HN:你在将自主应用程序投入生产时遇到的最糟糕的经历是什么?
为了提供一些背景信息,我目前正在公司创建一个AI代理团队,通过扩展大量子代理来处理大量的转录数据并生成报告。当分析在中途失败时,比如某个步骤(如API调用返回错误或机器内存不足),会导致级联错误,几乎没有可见性,从而破坏整个生成过程。我刚花了一个月的时间将各个任务重写为在DBOS上的持久执行任务,但我在想是否还有更好的解决方案,以及其他人是否遇到过类似的问题?还有一个问题是如何将进度反馈给用户,老实说,我只是临时编码处理这个问题……<p>当一个代理在12个步骤中的第9步失败时,你是如何处理的?<p>你在代理基础设施(如持久性、监控、人机协作、实时用户界面)上投入了多少工程周,而不是实际的代理逻辑?我很好奇我的比例是否正常。<p>对于那些在内部构建这些东西的人:这是否曾经是一个自建与购买的讨论?如果要你购买而不是自建,一个工具需要具备什么功能?<p>你目前在代理堆栈中是否支付了任何费用(如LangSmith、Temporal、Braintrust等)?是什么让这个工具值得列为开支,而其他工具却不值得?我是否也应该考虑一下?
查看原文
For a bit of context, I’m currently creating a team of AI agents at work to generate reports by fanning out into a large amount of subagents to process a large amount of transcript data. When the analysis fails mid-way because of some individual step like an API call returns an error or the machine is out of memory, it would create cascading errors that break the entire generation with almost no visibility. I’ve just spent the past month rewriting the individual jobs as durable execution jobs on DBOS but just wondering if there are better solutions out there and if others encountered similar issues? And then there is the issue to reflect back the progress to the users which I’ve just been coding ad-hoc honestly…<p>When an agent fails at step 9 of 12, how do you handle that?<p>Roughly how many engineer-weeks have you sunk into agent infrastructure (durability, monitoring, human-in-the-loop, live UI) vs. the actual agent logic? Curious if my ratio is normal.<p>For those who built this stuff in-house: was it ever a build-vs-buy conversation? What would a tool have had to do for you to buy instead of build?<p>Do you currently pay for anything in your agent stack (LangSmith, Temporal, Braintrust, etc.)? What made that one worth a line item when others weren't and should I look into it too?