HackerNews中文版

我以为自己快要疯了，试图使用 Gemini 3.5 Flash 来评分一些答案，但它总是给出 7 分，而不是正确答案的 10 分。显然，一旦你添加了“评分标准”的文本，模型就会陷入一种“向评分中心压缩”的幻觉（或训练集过拟合）。在 X 上有人让我尝试重现这个问题，我实际上在他们的 Gemini Chat 中第一次尝试就成功了： https://x.com/XCSme/status/2057613611959279988 我不太确定该如何看待这个（或大多数最先进的）模型。它们在编码和工具使用方面变得更聪明了，但在其他方面却变得愚蠢了很多……

查看原文

I thought I was going crazy, trying to use Gemini 3.5 Flash to rate some answers, but it kept giving 7 instead of 10 for correct answers.<p>Apparently once you add a "Grading criteria" text, the model collapses into a "compressed toward the center of the scale" hallucination (or training set overfitting).<p>Someone on X asked me to try to reproduce it, and I actually got it on the first try on their Gemini Chat:<p>https://x.com/XCSme/status/2057613611959279988<p>I am not sure what to make of this (or most SOTA) models. They got a lot smarter with coding and tool usage, but a lot dumber in other ways...

告诉HN：Gemini 3.5 Flash 以愚蠢的方式出现故障