问HN:编码代理是否优化了错误的审查步骤?

1作者: wsxiaoys大约 2 个月前原帖
这是个人观点,但我认为当前的编码代理在错误的时机需要人类对人工智能的产出进行审核。大多数工具在执行之前专注于创建和审查计划。 因此,这一理念的核心是在让代理接触代码库之前先批准其意图。这听起来合理,但在实践中,这并不是实际学习发生的地方。 “计划模式”发生在代理尚未承担现实成本之前。在它还没有浏览代码库、运行测试、遇到奇怪的边缘情况或依赖问题之前。输出本质上是推测性的,通常看起来比实际情况要自信得多。 实际上,更有用的是审查“操作过程”:即代理在尝试解决问题后所做工作的总结。 目前,在大多数编码代理中,默认仍然将计划视为主要检查点,而操作过程则在之后。这将重心放在了错误的地方。 我在软件工程方面的经验是,我们不审查意图,而是信任执行。我们审查结果:差异、测试变更、发生了什么故障、修复了什么,以及原因。这实际上就是一个操作过程。 因此,我觉得当我们对操作过程提供反馈时,我们是在对具体的决策和后果做出反应,而不是基于假设的内容。这种反馈更清晰、更具可操作性,也更接近我们作为工程师今天审查工作的方式。 我很好奇其他人在使用以计划为先的编码代理时是否有相同的感受。原因是我正在开发一个开源编码代理,并决定减少对事先批准计划的重视,而更多地关注审查代理在实际工作中所经历的内容。 但这是我们团队内部正在激烈讨论的问题,希望能听到一些想法,以帮助我们以最佳方式实施这一点。
查看原文
This is a personal opinion, but I think current coding agents requires human reviews AI&#x27;s artficats at the wrong moment. Most tools focus on creating and reviewing the plan before execution.<p>So the idea behind this is to approve intent before letting the agent touch the codebase. That sounds reasonable, but in practice, it’s not where the real learning happens.<p>The &quot;plan mode&quot; takes place before the agent has paid the cost of reality. Before it’s navigated the repo, before it’s run tests, before it’s hit weird edge cases or dependency issues. The output is speculative by design, and it usually looks far more confident than it should.<p>What will actually turn out to be more useful is reviewing the walkthrough: a summary of what the agent did after it tried to solve the problem.<p>Currently, in most coding agents, the default still treats the plan as the primary checkpoint and the walkthrough comes later. That puts the center of gravity in the wrong place.<p>My experience with SWE is that we don’t review intent and trust execution. We review outcomes: the diff, the test changes, what broke, what was fixed, and why. That’s effectively a walkthrough.<p>So I feel when we give feedback on a walkthrough, we’re reacting to concrete decisions and consequences, and not something based on hypotheticals. This feedback is clearer, more actionable, and closer to how we, as engineers, already review work today. Curious if others feel the same when using plan-first coding agents. The reason is that I’m working on an open source coding agent, and have decided to keep less emphasis on approving plans upfront and more emphasis on reviewing what the agent actually experienced while doing the work.<p>But this is something we’re heavily debating internally inside our team, and would love to have thoughts so that it can help us implement this in the best way possible.