HackerNews中文版

编码代理在编写代码方面已经相当出色，但它们仍然无法验证是否破坏了用户界面（UI）。作为人类，我们依赖视觉差异来进行验证。我们打开它们，快速浏览，捕捉明显的回归错误。而代理完全不在这个循环中。我是Argos（视觉测试）的联合创始人，最近我发布了一个命令行工具（CLI），以一种代理可以实际使用的方式展示视觉差异，而不是通过用户界面。当我将其集成到代理工作流程中时，一些有趣的事情开始发生。代理开始捕捉明显的回归错误。有时它甚至会拒绝批准自己的拉取请求（PR）。在良好的提示下，它在看到差异并进行迭代后，甚至能够修复问题。尽管如此，这仍然很粗糙，且不够可靠，不能单独信任。很多因素取决于代理对代码库的理解程度。在本地测试中，它有时会陷入循环并消耗大量令牌。为代理提供对用户界面变化的“视觉”可能为未来更自主的开发代理提供一个有趣的反馈循环。

查看原文

Coding agents are getting pretty good at writing code, but they still have no way to verify if they break the UI.As humans, we rely on visual diffs for that. We open them, scan quickly, and catch obvious regressions. Agents are completely out of that loop.I’m a co-founder of Argos (visual testing), and I recently shipped a CLI to expose visual diffs in a way an agent can actually use, instead of going through a UI.Once I wired it into an agent workflow, a few interesting things started happening. The agent started catching obvious regressions. Sometimes it would refuse to approve its own PR. With a good prompt, it even fixed the issue after seeing the diff and iterating.It’s still rough and not reliable enough to trust on its own. A lot depends on how well the agent understands the codebase. In local tests, it sometimes gets stuck in loops and burns through tokens .Giving agents “eyes” on UI changes might be an interesting feedback loop for more autonomous dev agents in the future.

为代理提供“视觉差异”的“眼睛”