氛围编码失败

3作者: 10keane9 天前原帖
我正在使用Claude来维护一个代理循环,该循环在重要工具调用之前会暂停以征求用户的批准。在进行一些错误修复时,我发现了一些明显的模式和原因,为什么对没有技术知识和架构专业知识的人来说,Vibe编码可能会失败。 首先让我描述一下我的工作流程——这是我在数百个成功会话中的工作流程: 1. 通过自我测试识别错误 2. 请Claude代码调查代码库中的三个潜在根本原因 3. 将根本原因和建议的修复方案粘贴到Claude项目中,我在这里存储所有架构文档和设计决策,以便让其评估 4. 与Claude在项目中讨论,撰写详细的任务规范——任务规范将具有指定格式,并包含各种测试 5. 将其交还给Claude代码以实施修复 在今天的会话中,根本原因分析仍然很出色,但建议的修复方案却糟糕透顶,我真的认为这就是大多数Vibe编码项目在长期内失去可维护性的原因。 以下是两个根本原因和建议的修复方案: 错误:代理请求用户批准,但有时批准弹窗不会显示。我尝试发送消息以解锁它,消息却被默默吞掉。代理看起来像是死掉了,我不得不重启整个系统。 Claude的评估: 根本原因1: 批准弹窗通过实时连接发送一次。如果用户的界面在那一刻没有连接——页面刷新、手机后台运行、不稳定的连接——他们就永远看不到它。没有重试,没有恢复。 这实际上是正确的。 建议的修复: “让我们将批准状态保存到磁盘,以便在崩溃后仍然存在。”听起来不错,但关键在于设计,如果发生崩溃,代理将从会话日志中冷启动,而无论如何不会获取批准状态。这个修复只增加了架构复杂性,完全没有用。 根本原因2: 当批准被中断(守护进程崩溃、用户重启)时,会话历史中会出现一个孤立的工具调用,但没有匹配的工具结果。 建议的修复: “写一个合成的工具结果,以保持会话文件的结构有效。”听起来很干净。但我问:谁实际上会因为这个而出错?是LLM API吗?不,它可以处理缺失的结果。会话重放呢?不,它读取现有内容。孤立的工具调用准确地表示了发生的事情:工具被调用但从未完成。这才是事实。写一个虚假的结果来掩盖它引入了新的写协调问题(到底什么时候写虚假结果?如果在写入期间守护进程崩溃怎么办?)来解决一个并不存在的问题。会话文件并没有“损坏”,它是准确的。 Claude拥有完整的架构文档、代码库以及上下文中超过一百个会话的项目历史。它仍然选择复杂的解决方案,因为它看起来像是良好的工程。它从未问过“重启后这是否重要?” 我个人多次遇到这种对看似更强大过度工程的偏好。我真心相信,这正是人类操作应该介入的地方,而不是仅仅给出一句话的需求,然后看着代理进行各种“稳健”的工程。
查看原文
i am using claude to maintain an agent loop, which will pause to ask for users&#x27; approval before important tool call. while doing some bug fixes,i have identified some clear patterns and reasons why vibe coding can fail for people who dont have technical knowledge and architecture expertise.<p>let me describe my workflow first - this has been my workflow across hundreds of successful sessions: 1. identify bugs through dogfooding 2. ask claude code to investigate the codebase for three potential root causes. 3. paste the root causes and proposed fixes to claude project where i store all architecture doc and design decision for it to evaluate 4. discuss with claude in project to write detailed task spec - the task spec will have a specified format with all sorts of test 5. give it back to claude code to implement the fix<p>in today&#x27;s session, the root cause analysis was still great, but the proposed fixes are so bad that i really think that&#x27;s how most of vibe coded project lost maintainability in the long run.<p>there is two of the root causes and proposed fix:<p>bug: agent asks for user approval, but sometimes the approval popup doesnt show up. i tried sending a message to unstick it. message got silently swallowed. agent looks dead. and i needed to restart the entire thing.<p>claude&#x27;s evaluation: root cause 1: the approval popup is sent once over a live connection. if the user&#x27;s ui isn&#x27;t connected at that moment — page refresh, phone backgrounded, flaky connection — they never see it. no retry, no recovery.<p>this is actually true.<p>proposed fix &quot;let&#x27;s save approval state to disk so it survives crashes&quot;. sounds fine but then the key is by design, if things crashes, the agent will cold-resume from the session log, and it wont pick up the approval state anyway. the fix just add schema complexity and it&#x27;s completely useless<p>root cause 2: when an approval gets interrupted (daemon crash, user restart), there&#x27;s an orphan tool_call in the session history with no matching tool_result.<p>proposed fix: &quot;write a synthetic tool_result to keep the session file structurally valid.&quot; sounds clean. but i asked: who actually breaks on this? the LLM API? no, it handles missing results. the session replay? no, it reads what&#x27;s there. the orphan tool_call accurately represents what happened: the tool was called but never completed. that&#x27;s the truth. writing a fake result to paper over it introduces a new write-coordination concern (when exactly do you write the fake result? what if the daemon crashes during the write?) to solve a problem that doesn&#x27;t exist. the session file isn&#x27;t &quot;broken.&quot; it&#x27;s accurate.<p>claude had full architecture docs, the codebase, and over a hundred sessions of project history in context. it still reaches for the complex solution because it LOOKS like good engineering. it never asked &quot;does it even matter after a restart?&quot;<p>i have personally encounterd this preference for seemingly more robust over-engineering multiple times. and i genuinely believe that this is where human operate actually should step in, instead of giving an one-sentence requirement and watches agents to do all sorts of &quot;robust&quot; engineering.