HackerNews中文版

人们在大型（32倍）自托管设置或小型多机架设置中如何测量GPU的功耗？我见过一些可以收集和传输数据的电源分配单元（PDU），但我不太清楚具体的流程，以及人们在小型构建中是如何做到这一点的。目前，我通过NVML的nvmlDeviceGetPowerUsage接口收集数据，在推理过程中每100毫秒轮询一次，记录每次请求的峰值和平均功耗，得到以下数据：模型平均功耗范围（瓦特）波动标准差 qwen3-8b 114.3-121.9 7.6W 1.17 llama-3.1-8b-instruct 104.7-122.1 17.4W 4.29 qwen2.5-1.5b-instruct 53.7-73.0 19.3W 5.23 mistral-7b-instruct-v0.3 96.2-120.0 23.8W 6.01 qwen2.5-7b-instruct 88.7-124.5 35.8W 7.73 gemma-3-1b-it 49.4-56.7 7.3W 2.13 以上数据是针对每个GPU、单卡的测量结果——我不确定在机架规模下是否存在每次请求的功耗归属，或者监控是否完全在PDU/BMC级别进行。

查看原文

How do people measure power usage of GPUs at large (32x) self-hosted setups or small multi-rack setups? I've seen some PDUs which collect and transmit data, but I'm unsure of the processes and if/how people do this on small builds.Currently, I collect NVML nvmlDeviceGetPowerUsage, polled at 100ms during inference, peak and mean per request, and get this type of data:model mean-power range (W) spread stdevqwen3-8b 114.3-121.9 7.6W 1.17llama-3.1-8b-instruct 104.7-122.1 17.4W 4.29qwen2.5-1.5b-instruct 53.7-73.0 19.3W 5.23mistral-7b-instruct-v0.3 96.2-120.0 23.8W 6.01qwen2.5-7b-instruct 88.7-124.5 35.8W 7.73gemma-3-1b-it 49.4-56.7 7.3W 2.13this is per-GPU, single-card data - I don't know whether anything like per-request attribution survives at rack scale, or whether monitoring there happens entirely at the PDU/BMC level instead.

请问HN：大规模下如何测量GPU的功耗？