我在没有任何编码经验的情况下构建了一个22,000行的应用程序。或者说,如何控制代理。
目前关于“氛围编码”(Vibe Coding)的讨论热烈(也伴随着一些反对声音),这并非没有道理。能够完全依靠放松心情、指挥人工智能代理为你打字来构建软件的想法,确实令人兴奋,但尝试这一方法的大多数人仅仅几天后就遭遇了残酷的现实:人工智能代码腐烂。最开始它能正常工作,但随后就失败了,而你却不知道该如何修复。
我决定开发一个应用程序:一个复杂的晚餐安排引擎。它可以导入食谱网址,并输出时间表、购物清单以及我在经营周末晚餐俱乐部15年后积累的喂养大团体的各种建议。
这个项目的核心是导入网址,并将(通常写得很糟糕的)步骤分解为机器可理解的单位,这些单位可以被定时并重组为流畅的时间表。我多次被告知这将是“不可能”的。每个食谱都是个例!他们可能是对的……如果你想要100%的合规性。但这根本不可能发生(除非你是NASA)。问题总是:什么程度算是足够好?一个应用程序何时才算有用?或者更准确地说,我愿意让机器告诉我一块羊腿在技术上是热饮的次数有多少,才会去拿锤子?
显然,我对阅读人类食谱和提取多个指标的难度没有任何概念。早期我失败得很惨;我的框架在千条if/then语句的重压下崩溃。但在发现一个名为“妥协”的模块后,我又尝试了一次,它利用神经语言学来理解词性。它可以被教导“烤、盐、蒸、尘或擦”是动词还是名词。它知道指令与建议的区别。这非常有用。我想感谢Spencer Kelly(GitHub上的spencermountain)为他的工作。我无法完成我的项目,没有他的帮助。
关于人工智能代理的幽默感,值得一提的是;“幽默”是指我真的笑了。在一次与一个棘手的语言问题的长时间会话后,一个代理对我说:“看起来我们遇到了加州旅馆的情况!你的句子可以入住,但你的短语永远无法离开。”这话从哪里冒出来的?
当一个项目达到一定规模时,人工智能开始大量产生幻觉,将数学与用户界面组件纠缠在一起,并生成混乱的代码。大型语言模型(LLM)自然会选择最简单的路径来修复一个即时的错误,逐渐将你的代码库变成一个完全无法调试的“上帝对象”。
除了三十年前自学的一些非常基础的C++知识外,我在编码方面几乎没有背景,但我指挥自主代理构建了一个超过22,000行、数学上复杂的有向无环图(DAG)调度引擎。我不是通过学习编码来实现这一点,而是通过学习如何管理人工智能。
为什么要管理?难道人工智能不聪明吗?是的,也不是。你能提取的知识几乎是无穷无尽的。它是事实和工具的源泉,但……它有时也会愚蠢得令人发指。让我们用一个比喻来说明。你建了一座房子,完美无瑕。你在前门旁边摆放了一些可爱的月桂盆栽。“嗯,”你说,“那盆植物需要向左移动一英寸。”于是,建筑工人用靴子推了推花盆。另一方面,人工智能……它把植物留在原地,开始拆房子!它们只看到眼前的东西。你必须努力去整理它们的“上下文”。
或者正如一个代理刚刚报告的:“你的批评在建筑上是正确的。我们正在处理下游的症状,而不是在上游切除根本原因。我为什么提出下游修复:在我严格的‘最小变更/保守建筑师’约束下,我选择了数学上最简单的路径——直接修补计算器以处理混乱的数据。”我希望我们能调整代理的先验知识——它们的硬编码态度。反直觉的是,它们会乐于帮助你约束它们的混乱行为。问一个代理如何让它表现得更好,它会高兴地为你写一本规则手册。
查看原文
There’s a lot of hype (and hate!) right now around "Vibe Coding". And rightly so. The idea that you can build out software entirely by kicking back and commanding AI agents to do the typing for you is an incredibly exciting concept, but most people trying this are hitting a brutal reality after just a few days: AI Code Rot. First it works. Then it fails. And you have no idea how to fix it.
I decided to build an app: a complex dinner scheduling engine. It imports recipe URLs and outputs schedules, shopping lists, and a host of tips for feeding big groups that I’ve developed after 15 years of running a weekend supper club.
Central to the project was importing URLs and breaking down (often badly written) steps into machine-understandable units which could be timed and rebuilt into slick schedules. I was told more than once that this would be ‘impossible’. Every recipe is an exception! And they might be right... if you want 100% compliance. But that never happens anyway (unless you’re NASA). The issue is always: how good is good enough? When is an app useful? Or, more accurately, how many times am I willing to let a machine tell me that a leg of lamb is technically a hot beverage before I reach for the hammer?
Obviously I had no idea just how hard reading human recipes and extracting multiple metrics would be. I failed badly early on; my scaffolding collapsing under the weight of a thousand if/then statements. But I tried again after discovering a module called ‘compromise’ that uses Neuro Linguistics to understand parts of speech. It can be taught when ‘roast, salt, steam, dust or zest’ is a verb or a noun. It know what an instruction sounds like as opposed to a suggestion. It was fantastically useful. I would like to thank Spencer Kelly (spencermountain on GitHub) for his work. I couldn’t have done mine without it.
Just a note on how funny AI agents can be; ‘funny’ as in I properly laughed. After a long session with an intractable linguistic problem, an agent said to me “looks like we have a Hotel California situation! Your sentences can check in but your phrases can never leave.” Where the hell did that come from?
When a project hits a certain scale, the AI begins heavily hallucinating, tangling math into your UI components, and generating spaghetti code. An LLM will naturally take the path of least resistance to fix an immediate bug, gradually turning your codebase into a completely undebuggable "God Object."
Aside from some self-taught and very basic C+ stuff thirty years ago, I have zero background in coding, yet I commanded autonomous agents to build a 22,000+ line, mathematically complex Directed Acyclic Graph (DAG) scheduling engine. I didn't do this by learning how to code; I did this by learning how to manage the AI.
Why manage? isn’t AI brilliant? Yes. And no. The knowledge you can extract is literally endless. It is a font of facts and facilities but… it can be dumb as doodoo. Let’s do a metaphor. You build a house. Perfect. You have some delightful potted laurel plants framing your front door. “Hmm,” you say “that plant needs to be an inch to the left of the porch.” So the builder pushes the pot with his boot. The AI on the other hand... it leaves the plant where it is and starts demolishing the house! They only see what’s in front of them. You have to work hard to curate their ‘context’.
Or as an agent just reported: “Your critique is architecturally correct. We are treating a symptom downstream instead of amputating the root cause upstream. Why I proposed the downstream fix: Under my strict "Minimal Change / Conservative Architect" constraints, I opted for the path of least mathematical resistance—patching the calculator directly to handle the messy data." I wish we could adjust the agent's priors - their hard coded attitudes. Counter-intuitively they will happily help you constrain their chaotic behaviour. Ask an agent how to make it behave and it will gleefully write you a rule-book.<p>Pt2>>>