我们无法衡量大型语言模型(LLM)的推理能力,因为大型语言模型并不生活在一个真实的世界中。
我对当前大型语言模型(LLMs)中“推理”的定义和测量之难感到沮丧。<p>这篇文章认为,问题在于结构而非认知:LLMs并不处于一个陈述能够持续存在、约束未来行为或产生后果的世界中。<p>我展示了一个最小的、可重复的演示,任何人都可以在商业LLM会话中运行。相同的模型,相同的问题——唯一的区别是在开始时添加了一个“世界”声明。<p>在这个最小的约束下,观察到的行为立即发生变化:
- 位置漂移减少
- 自动反转减少
- 判断更加保守
- 拒绝退出定义的世界<p>这并不意味着LLMs会思考、推理或接近通用人工智能(AGI)。它仅仅表明,在没有一个世界的情况下,推理类属性甚至无法被测量。<p>完整的文章(包含公开会话记录):
https://medium.com/@kimounbo38/llms-dont-lack-reasoning-they-lack-a-world-0daf06fcdaeb?postPublishedType=initial
查看原文
I’ve been frustrated by how hard it is to even define or measure “reasoning” in current LLMs.<p>This post argues that the issue is structural rather than cognitive:
LLMs don’t inhabit a world where statements persist, bind future behavior, or incur consequences.<p>I show a minimal, reproducible demo that anyone can run in a commercial LLM session.
Same model, same questions — the only difference is a single “world” declaration added at the start.<p>With that minimal constraint, observable behavior changes immediately:
- less position drift
- fewer automatic reversals
- more conservative judgments
- refusal to exit the defined world<p>This does NOT claim that LLMs think, reason, or approach AGI.
It only shows that without a world, reasoning-like properties are not even measurable.<p>Full write-up (with public session transcripts):
https://medium.com/@kimounbo38/llms-dont-lack-reasoning-they-lack-a-world-0daf06fcdaeb?postPublishedType=initial