告诉HN:Gemini 3.5 Flash 以愚蠢的方式出现故障

6作者: XCSme大约 1 个月前原帖
我以为自己快要疯了,试图使用 Gemini 3.5 Flash 来评分一些答案,但它总是给出 7 分,而不是正确答案的 10 分。 显然,一旦你添加了“评分标准”的文本,模型就会陷入一种“向评分中心压缩”的幻觉(或训练集过拟合)。 在 X 上有人让我尝试重现这个问题,我实际上在他们的 Gemini Chat 中第一次尝试就成功了: https://x.com/XCSme/status/2057613611959279988 我不太确定该如何看待这个(或大多数最先进的)模型。它们在编码和工具使用方面变得更聪明了,但在其他方面却变得愚蠢了很多……
查看原文
I thought I was going crazy, trying to use Gemini 3.5 Flash to rate some answers, but it kept giving 7 instead of 10 for correct answers.<p>Apparently once you add a &quot;Grading criteria&quot; text, the model collapses into a &quot;compressed toward the center of the scale&quot; hallucination (or training set overfitting).<p>Someone on X asked me to try to reproduce it, and I actually got it on the first try on their Gemini Chat:<p>https:&#x2F;&#x2F;x.com&#x2F;XCSme&#x2F;status&#x2F;2057613611959279988<p>I am not sure what to make of this (or most SOTA) models. They got a lot smarter with coding and tool usage, but a lot dumber in other ways...