请问HN:人工智能进展——人们主要通过哪些方式来衡量它?
我对目前如何评估人工智能的进展很感兴趣,并试图建立一个人们实际使用的主要方法列表。我意识到这些评估指标都有局限性,许多方法存在争议或设计上并不完美。我并不认为它们是“好的”,也不认为它们能够准确映射到现实世界的能力。
我希望听到:
- 你认为哪些评估指标、基准或方法应该列入这个列表
- 你认为它们的主要优点和失败模式是什么
- 你个人如何(或是否)使用这些方法来解读人工智能的进展
我在这里的目标是探索和理解,而不是为任何特定框架辩护或攻击。
查看原文
I’m interested in how AI progress is currently evaluated and trying to build a list of the major approaches people actually use.<p>I’m aware that all of these measures have limitations and that many are controversial or imperfect by design. I’m not assuming they’re “good” or that they cleanly map to real-world capability.<p>I’d love to hear:<p>- What measures, benchmarks, or methodologies you think belong on this list<p>- What you see as their key strengths and failure modes<p>- How (or whether) you personally use them to interpret AI progress<p>My goal here is discovery and understanding, not to defend or attack any particular framework.