设计一种用于1M QPS CPM广告的DSP架构,避免过度支出
我正在为一个高吞吐量的广告技术DSP(需求方平台)设计系统架构,希望能得到那些曾构建大规模竞价/服务系统的人的反馈。
**约束条件/目标**
- 仅限DSP(不涉及交易所)
- 目标:每秒处理100万次广告请求
- 端到端DSP延迟预算:约100毫秒
- 定价模型:千次展示成本(CPM)
- 硬性要求:不允许广告主或活动超支
**定向/活动获取**
我使用Redis和Roaring Bitmaps对定向(地理位置、兴趣等)进行了建模。仅获取候选活动的性能如下:
- Redis:约1000次请求每秒,延迟约8毫秒(本地机器,不是云端)
- Aerospike:约200-400次请求每秒,延迟约10毫秒
这仅仅是活动获取,不包括竞价或评分。
**预算/钱包模型**
广告主有一个钱包。每个活动有:
- 总预算
- 每日预算
- 每日支出跟踪
超支是不可接受的(即使是小比例在大规模下也很重要)。
**考虑的预算控制方法**
- 将每日预算拆分为每小时的预算
- 通过以下方式进行速率限制:
- 令牌桶
- PID控制器
这些方法可以减少超支,但在突发流量下并不能保证正确性。最近考虑使用微单位(整数货币单位)来减少舍入误差。
**待解的问题**
在每秒100万次请求的情况下,人们是如何实际执行预算保证的?
- 软性超支与对账?
- 热路径中的硬性原子检查?
- 在这个规模下,基于Redis位图的定向是否可行,还是每个人最终都:
- 预先生成活动集?
- 将逻辑推入内存/C++?
你如何平衡:
- 严格的预算执行
- 低延迟
- 高吞吐量
而不引入全局锁或跨区域竞争?
“绝不超支”是否是一个现实的要求,还是有界错误才是行业常态?
我对教科书式的答案不太感兴趣,更希望了解在生产环境中实际有效(或失败)的经验。
查看原文
I’m working on the system architecture for a high-throughput AdTech DSP and would love feedback from people who’ve built large-scale bidding / serving systems.<p>Constraints / Goals<p>DSP only (no exchange)<p>Target: 1M ad requests/sec<p>End-to-end DSP latency budget: ~100ms<p>Pricing model: CPM<p>Hard requirement: no advertiser or campaign overspend<p>Targeting / Campaign Fetch<p>I modeled targeting (geo, interests, etc.) using Redis + Roaring Bitmaps<p>Fetching candidate campaigns alone:<p>Redis: ~1000 RPS at ~8ms (Local machine not Cloud)<p>Aerospike: ~200–400 RPS at ~10ms<p>This is only campaign fetching, not bidding or scoring<p>Budget / Wallet Model<p>Advertiser has a wallet<p>Campaign has:<p>Total budget<p>Daily budget<p>Daily spend tracking<p>Overspend is not acceptable (even small % matters at scale)<p>Budget Control Approaches Considered<p>Splitting daily budgets into hourly buckets<p>Rate limiting via:<p>Token bucket<p>PID controllers<p>These reduce overspend but don’t guarantee correctness under bursty traffic<p>Recently considering micros (integer currency units) to reduce rounding errors<p>Open Questions<p>At 1M QPS, how do people actually enforce budget guarantees?<p>Soft overspend with reconciliation?<p>Hard atomic checks in the hot path?<p>Is Redis bitmap–based targeting viable at this scale, or does everyone eventually:<p>Pre-materialize campaign sets?<p>Push logic into memory / C++?<p>How do you balance:<p>Strict budget enforcement<p>Low latency<p>High throughput
without introducing global locks or cross-region contention?<p>Is “no overspend ever” a realistic requirement, or is bounded error the industry norm?<p>I’m less interested in textbook answers and more in what has actually worked (or failed) in production