HackerNews中文版

所有领先的模型都宣传工具使用，包括代码执行。那么，为什么仍然常常会出现简单的 Python 脚本，其中存在逻辑错误，而这个错误在运行 Python 解释器 0.1 秒后就能立即被发现呢？

查看原文

The leading models all advertise tool use including code execution. So why is it still common to get a simple Python script that has a logical bug which would be immediately discoverable upon running a Python interpreter for 0.1 seconds?

为什么大型语言模型（LLMs）在给你代码之前仍然不执行代码？