一个月调试AI代理:我如何构建了10个代理以及我为什么不得不删除它们
想象一下,雇佣10位专家,给他们1000行的指令,结果却是混乱而非协调的工作。这就是我一个月来构建AI代理框架的经历。
我的目标非常雄心勃勃:建立一个完全自主的系统,让一群AI代理——包括研究员、架构师、TDD测试员等——能够接手任务,并处理从规划到部署的所有工作。我设计了一个复杂的多阶段工作流程,包含像“升级(ESCALATION)”这样的协议和详细的“任务简报”。在纸面上,这是一台完美的自我管理机器。
然而,现实却是一场昂贵的噩梦。系统频繁出现文件编辑错误,无限循环消耗了数万个令牌,还有“幽灵执行”,即协调者在没有写一行代码的情况下就将任务标记为完成。我的工作从开发者转变为全职的提示调试员。
在绝望中,我在Reddit上发帖,得到的解决方案并不是更好的提示,而是一个评论让我去禁用工具设置中的两个“实验性”复选框。奇迹般地,90%的文件编辑问题消失了。
这引发了一个痛苦但至关重要的实验:如果我去掉所有精心设计的、超详细的提示,回归默认设置,会发生什么?结果令人沮丧:系统的表现几乎没有变化。
阅读完整故事,查看详细的架构图和我最终简化的工作流程:https://xor01.substack.com/p/my-war-with-ai-agents
查看原文
Imagine hiring 10 specialists, giving them 1000-line instructions, and getting chaos instead of coordinated work. Welcome to my month of building an AI agent framework.<p>My goal was ambitious: a fully autonomous system where an army of AI agents: a Researcher, an Architect, a TDD-tester, and more—would take a task and handle everything from planning to deployment. I designed a complex, multi-phase workflow with protocols like `ESCALATION` and detailed "Mission Briefs". On paper, it was a perfect, self-managing machine.<p>In reality, it was an expensive nightmare. The system was plagued by constant file editing errors, infinite loops that burned through tens of thousands of tokens, and "phantom executions" where the orchestrator would mark a task as complete without writing a single line of code. My job turned from developer to full-time prompt debugger.<p>In desperation, I posted on Reddit, and the solution wasn't a better prompt. It was a single comment that led me to disable two "experimental" checkboxes in the tool's settings. Miraculously, 90% of the file editing problems vanished.<p>This led to a painful but crucial experiment: what if I removed all my carefully crafted, super-detailed prompts and went back to the default settings? The result was disheartening: the system performed almost exactly the same.<p>Read the full story with detailed architecture diagrams and my final, simplified workflow: https://xor01.substack.com/p/my-war-with-ai-agents