为什么大多数通用智能体会失败,以及我为何避免使用大型语言模型的“推理”功能。
代理的核心能力完全源于基础的大型语言模型(LLM)。因此,代理的未来严格取决于当前LLM的状态。
那么,LLM现在处于什么阶段呢?
我认为我们目前正处于人工智能的“家庭工业”阶段——工业化的黎明。用一个历史类比来说:我们刚刚发明了第一台蒸汽机。它们笨重、固定,只能用于从煤矿抽水。我们距离拥有蒸汽机车还远得很。
现在,定制代理的数量正在迅速增长。但如果仔细观察,它们几乎都是“自给自足”的,并且相互孤立。每个人都在为自己特定的用例构建自己的代理,但要将它们适应或扩展到更广泛的用途却非常困难。这就像每个家庭都有自己的织布机,织自己的布料,从不使用别人的。
为什么会这样?这归结于当前LLM的局限性。如果暂时不考虑多模态能力,基于文本的LLM基本上有四个核心支柱:
1. 自然语言理解(NLU)
2. 自然语言生成(NLG)
3. 工具调用
4. 推理
前面三个已经非常成熟且可靠。但第四个——推理——仍然是一个充满幻觉的雷区。
然而,代理开发者最为痴迷的是什么?推理。为什么?因为在演示中看起来很酷。这种痴迷正是我们无法真正“工业化”代理的原因。这也是为什么在实际应用中很难找到一个真正可靠的通用代理(最近关于Manus的炒作与现实检查就是一个教科书式的例子)。
当然,总有一天LLM的推理能力可能会超过99%的人类。当那一天到来时,我们将最终看到真正强大、通用的代理。但老实说,没有人确切知道这个时间表会在何时到来。
我的结论是:如果我今天要为生产构建一个通用代理,我将严格利用NLU、NLG和工具调用。我会尽量避免依赖“推理”。
最近和一些朋友关于人工智能的对话让我思考了这些。我的观点似乎引起了他们的共鸣,所以我在这里分享,希望听听你的想法。
查看原文
An Agent's core capability comes entirely from the underlying LLM. Therefore, the future of Agents is strictly dictated by the present state of LLMs.<p>So, where exactly are LLMs right now?<p>I believe we are currently in the "cottage industry" (or subsistence) phase of AI—the very dawn of industrialization. To use a historical analogy: we just invented the first steam engines. They are bulky, stationary, and only good for pumping water out of coal mines. We aren't anywhere close to having steam locomotives yet.<p>Right now, there's a massive explosion of custom Agents being built. But if you look closely, they are almost entirely "self-sufficient" and siloed. Everyone is building their own Agent for their own specific use case, but it's incredibly hard to adapt or scale them for broader use. It’s like every household having its own loom, weaving its own cloth, and never using anyone else's.<p>Why is this happening? It comes down to the current limits of LLMs. If we put aside multimodal capabilities for a moment, text-based LLMs basically have four core pillars:<p>Natural Language Understanding (NLU)<p>Natural Language Generation (NLG)<p>Tool Calling<p>Reasoning<p>The first three are already highly mature and reliable. But the fourth—Reasoning—is still an absolute minefield of hallucinations.<p>Yet, what do Agent developers obsess over the most? Reasoning. Why? Because it looks cool on a demo. This obsession is exactly why we can't truly "industrialize" Agents yet. It’s why it is so damn hard to find a genuinely reliable, general-purpose Agent in the wild (the recent hype and reality check around Manus is a textbook example of this).<p>Sure, one day LLM reasoning capabilities might surpass 99% of humanity. When that day comes, we will finally see truly powerful, general-purpose Agents. But honestly, nobody knows exactly when that timeline will hit.<p>My takeaway: If I am building a general Agent for production today, I am strictly utilizing NLU, NLG, and Tool Calling. I am staying the hell away from relying on "Reasoning."<p>A recent convo with some friends about AI got me thinking. My take seemed to resonate with them, so I’m sharing it here to hear your thoughts