为什么大多数通用智能体会失败,以及我为何避免使用大型语言模型的“推理”功能。

1作者: cid43528 天前原帖
代理的核心能力完全源于基础的大型语言模型(LLM)。因此,代理的未来严格取决于当前LLM的状态。 那么,LLM现在处于什么阶段呢? 我认为我们目前正处于人工智能的“家庭工业”阶段——工业化的黎明。用一个历史类比来说:我们刚刚发明了第一台蒸汽机。它们笨重、固定,只能用于从煤矿抽水。我们距离拥有蒸汽机车还远得很。 现在,定制代理的数量正在迅速增长。但如果仔细观察,它们几乎都是“自给自足”的,并且相互孤立。每个人都在为自己特定的用例构建自己的代理,但要将它们适应或扩展到更广泛的用途却非常困难。这就像每个家庭都有自己的织布机,织自己的布料,从不使用别人的。 为什么会这样?这归结于当前LLM的局限性。如果暂时不考虑多模态能力,基于文本的LLM基本上有四个核心支柱: 1. 自然语言理解(NLU) 2. 自然语言生成(NLG) 3. 工具调用 4. 推理 前面三个已经非常成熟且可靠。但第四个——推理——仍然是一个充满幻觉的雷区。 然而,代理开发者最为痴迷的是什么?推理。为什么?因为在演示中看起来很酷。这种痴迷正是我们无法真正“工业化”代理的原因。这也是为什么在实际应用中很难找到一个真正可靠的通用代理(最近关于Manus的炒作与现实检查就是一个教科书式的例子)。 当然,总有一天LLM的推理能力可能会超过99%的人类。当那一天到来时,我们将最终看到真正强大、通用的代理。但老实说,没有人确切知道这个时间表会在何时到来。 我的结论是:如果我今天要为生产构建一个通用代理,我将严格利用NLU、NLG和工具调用。我会尽量避免依赖“推理”。 最近和一些朋友关于人工智能的对话让我思考了这些。我的观点似乎引起了他们的共鸣,所以我在这里分享,希望听听你的想法。
查看原文
An Agent&#x27;s core capability comes entirely from the underlying LLM. Therefore, the future of Agents is strictly dictated by the present state of LLMs.<p>So, where exactly are LLMs right now?<p>I believe we are currently in the &quot;cottage industry&quot; (or subsistence) phase of AI—the very dawn of industrialization. To use a historical analogy: we just invented the first steam engines. They are bulky, stationary, and only good for pumping water out of coal mines. We aren&#x27;t anywhere close to having steam locomotives yet.<p>Right now, there&#x27;s a massive explosion of custom Agents being built. But if you look closely, they are almost entirely &quot;self-sufficient&quot; and siloed. Everyone is building their own Agent for their own specific use case, but it&#x27;s incredibly hard to adapt or scale them for broader use. It’s like every household having its own loom, weaving its own cloth, and never using anyone else&#x27;s.<p>Why is this happening? It comes down to the current limits of LLMs. If we put aside multimodal capabilities for a moment, text-based LLMs basically have four core pillars:<p>Natural Language Understanding (NLU)<p>Natural Language Generation (NLG)<p>Tool Calling<p>Reasoning<p>The first three are already highly mature and reliable. But the fourth—Reasoning—is still an absolute minefield of hallucinations.<p>Yet, what do Agent developers obsess over the most? Reasoning. Why? Because it looks cool on a demo. This obsession is exactly why we can&#x27;t truly &quot;industrialize&quot; Agents yet. It’s why it is so damn hard to find a genuinely reliable, general-purpose Agent in the wild (the recent hype and reality check around Manus is a textbook example of this).<p>Sure, one day LLM reasoning capabilities might surpass 99% of humanity. When that day comes, we will finally see truly powerful, general-purpose Agents. But honestly, nobody knows exactly when that timeline will hit.<p>My takeaway: If I am building a general Agent for production today, I am strictly utilizing NLU, NLG, and Tool Calling. I am staying the hell away from relying on &quot;Reasoning.&quot;<p>A recent convo with some friends about AI got me thinking. My take seemed to resonate with them, so I’m sharing it here to hear your thoughts