展示HN:我训练了一个象棋引擎,使其像人类一样下棋

3作者: hazard大约 1 个月前原帖
我创建了1e4.ai——一个国际象棋网页应用,你可以与经过训练以模仿特定等级范围内人类Lichess玩家的神经网络对弈。每个100分的等级区间(从约800到2200+)都有一个独立的模型,这些机器人不仅选择类似人类的走法,还会消耗时间,在时间压力下表现更差,并以人类常见的方式犯错。 <p>实时演示:<a href="https://1e4.ai" rel="nofollow">https://1e4.ai</a> 代码:<a href="https://github.com/thomasj02/1e4_ai" rel="nofollow">https://github.com/thomasj02/1e4_ai</a> <p>一些可能有趣的内容: <p>- 训练数据几乎涵盖了一整年的Lichess快棋比赛,总计约10亿局游戏。 - 网络架构是一个小型的(约900万参数)基于变换器的网络,输入包括棋盘、最近的走棋历史、玩家的等级和剩余时间。每个等级区间有三个独立的模型:走法、时间使用和胜率。时间模型使得机器人在时间压力下的表现更像人类,而不是瞬间决策。由于走法模型将时间作为一个输入参数,它也学习在时间压力下犯错,类似于人类的表现。 - 由于网络非常小,推理时不需要GPU——它可以轻松在本地CPU上运行。 - 小型网络的缺点是,当等级超过约1700时,表现会稍显弱势。它能够识别短期战术,但无法处理长时间的多步组合。 - 最初在租用的8xH100集群上进行训练,然后在我的本地GPU上针对不同的等级范围进行微调。 - 受到Maia-2和DeepMind的“无搜索的国际象棋大师级别”启发。在一个保留的Lichess快棋基准测试中,它在第一步走法预测上击败了Maia-2(56.7%对52.7%),在胜率校准上也有显著优势(Brier 0.176对0.272)。相关数据和代码见<a href="https://github.com/thomasj02/1e4_ai/tree/master/experiments/maia2_benchmark" rel="nofollow">https://github.com/thomasj02/1e4_ai/tree/master/experiments/...</a>。 <p>- 数据管道通过C++和nanobind实现,然后使用Pytorch进行训练。确保这一点是我花费最多时间的部分。预先打乱数据集,并能够在训练时顺序读取打乱后的数据集,使得GPU的利用率保持在高水平。没有这一点,GPU在空闲时会浪费大量时间在输入/输出上。 <p>欢迎就等级条件、时间模型或数据管道提出问题。
查看原文
I built 1e4.ai - a chess web app where you play against neural networks trained to mimic human Lichess players at specific Elo ranges. There&#x27;s a separate model for each 100-point rating bucket from ~800 to 2200+, and the bots not only choose human-like moves but also burn clock time, play worse under time pressure, and blunder in human-like ways.<p>Live demo: <a href="https:&#x2F;&#x2F;1e4.ai" rel="nofollow">https:&#x2F;&#x2F;1e4.ai</a> Code: <a href="https:&#x2F;&#x2F;github.com&#x2F;thomasj02&#x2F;1e4_ai" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;thomasj02&#x2F;1e4_ai</a><p>A few things that might be interesting:<p>- Trained on almost a full year of Lichess blitz games, around 1B total games<p>- Architecture is an a small (~9MM parameters) transformer-based network that takes the board, recent move history, the player&#x27;s rating, and remaining clock time as input. Three separate models per rating bucket: move, clock-usage, and win probability. The clock model is what makes the bots feel humanish under time pressure rather than instant. Because the move model takes the clock as one input parameter, it also learns to blunder under time pressure like a human might.<p>- Because the network is so tiny, no GPU is needed for inference - it runs easily on a local CPU<p>- Downside of the tiny network is that it&#x27;s a bit weak as you turn up the rating past around 1700. It can spot short tactics but not long multi-move combinations.<p>- Initial training on a rented 8xH100 cluster, then fine-tunes on my local GPU for different rating ranges<p>- Inspired by Maia-2 and DeepMind&#x27;s &quot;Grandmaster-Level Chess Without Search&quot;. On a held-out Lichess blitz benchmark, the it beats Maia-2 blitz on top-1 move prediction (56.7% vs 52.7%) and pretty substantially on win-probability calibration (Brier 0.176 vs 0.272). Numbers and code in <a href="https:&#x2F;&#x2F;github.com&#x2F;thomasj02&#x2F;1e4_ai&#x2F;tree&#x2F;master&#x2F;experiments&#x2F;maia2_benchmark" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;thomasj02&#x2F;1e4_ai&#x2F;tree&#x2F;master&#x2F;experiments&#x2F;...</a><p>- The data pipeline is C++ via nanobind, then training with Pytorch. Getting this right was actually the thing I spent the most time on. Pre-shuffling the dataset and then being able to read the shuffled dataset sequentially at training time kept the GPU utilization high. Without this it spent a huge percentage of time on I&#x2F;O while the GPU sat idle.<p>Happy to answer questions about the rating-conditioning, the clock model, or the data pipeline.