请问HN:大规模下如何测量GPU的功耗?
人们在大型(32倍)自托管设置或小型多机架设置中如何测量GPU的功耗?我见过一些可以收集和传输数据的电源分配单元(PDU),但我不太清楚具体的流程,以及人们在小型构建中是如何做到这一点的。
目前,我通过NVML的nvmlDeviceGetPowerUsage接口收集数据,在推理过程中每100毫秒轮询一次,记录每次请求的峰值和平均功耗,得到以下数据:
模型 平均功耗范围(瓦特) 波动 标准差
qwen3-8b 114.3-121.9 7.6W 1.17
llama-3.1-8b-instruct 104.7-122.1 17.4W 4.29
qwen2.5-1.5b-instruct 53.7-73.0 19.3W 5.23
mistral-7b-instruct-v0.3 96.2-120.0 23.8W 6.01
qwen2.5-7b-instruct 88.7-124.5 35.8W 7.73
gemma-3-1b-it 49.4-56.7 7.3W 2.13
以上数据是针对每个GPU、单卡的测量结果——我不确定在机架规模下是否存在每次请求的功耗归属,或者监控是否完全在PDU/BMC级别进行。
查看原文
How do people measure power usage of GPUs at large (32x) self-hosted setups or small multi-rack setups? I've seen some PDUs which collect and transmit data, but I'm unsure of the processes and if/how people do this on small builds.<p>Currently, I collect NVML nvmlDeviceGetPowerUsage, polled at 100ms during inference, peak and mean per request, and get this type of data:<p>model mean-power range (W) spread stdev<p>qwen3-8b 114.3-121.9 7.6W 1.17<p>llama-3.1-8b-instruct 104.7-122.1 17.4W 4.29<p>qwen2.5-1.5b-instruct 53.7-73.0 19.3W 5.23<p>mistral-7b-instruct-v0.3 96.2-120.0 23.8W 6.01<p>qwen2.5-7b-instruct 88.7-124.5 35.8W 7.73<p>gemma-3-1b-it 49.4-56.7 7.3W 2.13<p>this is per-GPU, single-card data - I don't know whether anything like per-request attribution survives at rack scale, or whether monitoring there happens entirely at the PDU/BMC level instead.