HackerNews中文版

我对目前如何评估人工智能的进展很感兴趣，并试图建立一个人们实际使用的主要方法列表。我意识到这些评估指标都有局限性，许多方法存在争议或设计上并不完美。我并不认为它们是“好的”，也不认为它们能够准确映射到现实世界的能力。我希望听到： - 你认为哪些评估指标、基准或方法应该列入这个列表 - 你认为它们的主要优点和失败模式是什么 - 你个人如何（或是否）使用这些方法来解读人工智能的进展我在这里的目标是探索和理解，而不是为任何特定框架辩护或攻击。

查看原文

I’m interested in how AI progress is currently evaluated and trying to build a list of the major approaches people actually use.I’m aware that all of these measures have limitations and that many are controversial or imperfect by design. I’m not assuming they’re “good” or that they cleanly map to real-world capability.I’d love to hear:- What measures, benchmarks, or methodologies you think belong on this list- What you see as their key strengths and failure modes- How (or whether) you personally use them to interpret AI progressMy goal here is discovery and understanding, not to defend or attack any particular framework.

请问HN：人工智能进展——人们主要通过哪些方式来衡量它？