GPT-5在文本处理方面不如4.1-mini,在编码方面不如Sonnet 4。

3作者: hitradostava大约 1 个月前原帖
看起来OpenAI的公关机器运转得非常出色。Cursor的首席执行官表示这是最好的,Simon Willison也有类似的看法(https://simonwillison.net/2025/Aug/7/gpt-5/)。 但我发现它非常糟糕。在Cursor中进行编码时,它运行缓慢,工具调用经常失败(没有MCP,只有标准的Cursor工具),并且在globalThis中存储了一些新的应用状态——这是在过去一年多的Cursor/Claude Code使用中,没有任何模型尝试过的事情。 对于我正在开发的摘要/洞察API,它的表现远不如gpt-4.1-mini。我尝试了mini和完整的gpt-5,使用了不同的推理设置。它没有遵循指示,输出在我所有的评估中都更差,即使经过了大量的提示调整。我进行了大量采样,结果客观上很糟糕。 我是不是唯一一个这样想的人?有没有人看到GPT-5相比其他模型的实际好处?
查看原文
It seems that OpenAI have got the PR machine working amazingly. The Cursor CEO said it&#x27;s the best, as did Simon Willison (https:&#x2F;&#x2F;simonwillison.net&#x2F;2025&#x2F;Aug&#x2F;7&#x2F;gpt-5&#x2F;).<p>But I&#x27;ve found it terrible. For coding (in Cursor), it&#x27;s slow, fails with tool calls often (no MCP just stock Cursor tools) and stored some new application state in globalThis - something that no model has ever attempted to do in over a year of very heavy Cursor &#x2F; Claude Code use).<p>For a summarization&#x2F;insights API that I work on, it was way worse than gpt-4.1-mini. I tried both mini and full gpt5, with different reasoning settings. It didn&#x27;t follow instructions, and output was worse across all my evals, even after heavy prompt adjustment. I did a lot of sampling and the results were objectively bad.<p>Am I the only one? Has anyone seen actual real-world benefits of GPT-5 vs other models?