问HN:当我无法在本地运行一个GPT-4时,ChatGPT如何能够服务7亿用户?
萨姆昨天说,ChatGPT 每周处理大约 7 亿用户。同时,我连一个 GPT-4 级别的模型都无法在本地运行,除非有巨大的显存,否则速度会慢得令人痛苦。
当然,他们有庞大的 GPU 集群,但肯定还有其他的技术在发挥作用——模型优化、分片、定制硬件、巧妙的负载均衡等等。
在如此大规模的情况下,是什么工程技巧使得这一切成为可能,同时保持低延迟?
我很想听听那些构建过大规模机器学习系统的人的见解。
查看原文
Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or painfully slow speeds.<p>Sure, they have huge GPU clusters, but there must be more going on - model optimizations, sharding, custom hardware, clever load balancing, etc.<p>What engineering tricks make this possible at such massive scale while keeping latency low?<p>Curious to hear insights from people who've built large-scale ML systems.