为代理提供“视觉差异”的“眼睛”
编码代理在编写代码方面已经相当出色,但它们仍然无法验证是否破坏了用户界面(UI)。<p>作为人类,我们依赖视觉差异来进行验证。我们打开它们,快速浏览,捕捉明显的回归错误。而代理完全不在这个循环中。<p>我是Argos(视觉测试)的联合创始人,最近我发布了一个命令行工具(CLI),以一种代理可以实际使用的方式展示视觉差异,而不是通过用户界面。<p>当我将其集成到代理工作流程中时,一些有趣的事情开始发生。代理开始捕捉明显的回归错误。有时它甚至会拒绝批准自己的拉取请求(PR)。在良好的提示下,它在看到差异并进行迭代后,甚至能够修复问题。<p>尽管如此,这仍然很粗糙,且不够可靠,不能单独信任。很多因素取决于代理对代码库的理解程度。在本地测试中,它有时会陷入循环并消耗大量令牌。<p>为代理提供对用户界面变化的“视觉”可能为未来更自主的开发代理提供一个有趣的反馈循环。
查看原文
Coding agents are getting pretty good at writing code, but they still have no way to verify if they break the UI.<p>As humans, we rely on visual diffs for that. We open them, scan quickly, and catch obvious regressions. Agents are completely out of that loop.<p>I’m a co-founder of Argos (visual testing), and I recently shipped a CLI to expose visual diffs in a way an agent can actually use, instead of going through a UI.<p>Once I wired it into an agent workflow, a few interesting things started happening. The agent started catching obvious regressions. Sometimes it would refuse to approve its own PR. With a good prompt, it even fixed the issue after seeing the diff and iterating.<p>It’s still rough and not reliable enough to trust on its own. A lot depends on how well the agent understands the codebase. In local tests, it sometimes gets stuck in loops and burns through tokens .<p>Giving agents “eyes” on UI changes might be an interesting feedback loop for more autonomous dev agents in the future.