请问HN:为什么我的Node.js多人游戏在500名玩家时会出现延迟,尽管CPU使用率很低?

3作者: jbryu8 个月前原帖
我在一台单一的Hetzner CCX23 x86云服务器上托管一个回合制多人浏览器游戏(4个虚拟CPU,16GB内存,80GB磁盘)。后端使用Node.js和Socket.IO构建,并通过Docker Swarm运行。我还使用Traefik进行负载均衡。 匹配机制采用轮询分片的方法:每个房间始终由同一个后端实例处理,这让我可以将游戏状态保存在内存中,并在不使用Redis的情况下进行横向扩展。 现在遇到的问题是: 在大约500名同时在线玩家和约60个房间(每个房间最多8名玩家)时,我观察到CPU使用率较低,但事件循环延迟较高。游戏中的一个功能是在玩家回合期间输入文字——每个限制的按键都会实时广播给其他玩家。如果我移除这个逻辑,我可以轻松处理1000名以上的玩家。 在我的单服务器上扩展后端实例并没有帮助。我原本期待每个后端实例的负载减少会有所改善,但我仍然在大约500名玩家时遇到同样的限制。这让我觉得瓶颈并不是CPU或应用逻辑,而是更深层次的原因。但我不太确定是什么。 在500名玩家时的一些服务器指标: - CPU:每个核心25%(根据htop) - PPS:约3000进/约3000出 - 带宽:约100KBps进/约800KBps出 500名同时在线玩家是否只是我单服务器设置的一个现实上限,还是有什么配置错误?我知道扩展新服务器应该能解决这个问题,但我想先在网上咨询一下,看看我是否遗漏了什么。我对多人架构还很陌生,因此任何见解都将不胜感激。
查看原文
I’m hosting a turn-based multiplayer browser game on a single Hetzner CCX23 x86 cloud server (4 vCPU, 16GB RAM, 80GB disk). The backend is built with Node.js and Socket.IO and is run via Docker Swarm. I use also use Traefik for load balancing.<p>Matchmaking uses a round-robin sharding approach: each room is always handled by the same backend instance, letting me keep game state in memory and scale horizontally without Redis.<p>Here’s the issue: At ~500 concurrent players across ~60 rooms (max 8 players&#x2F;room), I see low CPU usage but high event loop lag. One feature in my game is typing during a player&#x27;s turn - each throttled keystroke is broadcast to the other players in real-time. If I remove this logic, I can handle 1000+ players without issue.<p>Scaling out backend instances on my single-server doesn&#x27;t help. I expected less load per backend instance to help, but I still hit the same limit around 500 players. This suggests to me that the bottleneck isn’t CPU or app logic, but something deeper in the stack. But I’m not sure what.<p>Some server metrics at 500 players:<p>- CPU: 25% per core (according to htop)<p>- PPS: ~3000 in &#x2F; ~3000 out<p>- Bandwidth: ~100KBps in &#x2F; ~800KBps out<p>Could 500 concurrent players just be a realistic upper bound for my single-server setup, or is something misconfigured? I know scaling out with new servers should fix the issue, but I wanted to check in with the internet first to see if I&#x27;m missing anything. I’m new to multiplayer architecture so any insight would be greatly appreciated.