展示HN:用于Cloudflare Workers的使用电路断路器
我运营着3mins.news(https://3mins.news),这是一个完全基于Cloudflare Workers构建的人工智能新闻聚合器。后端有10多个定时任务每几分钟运行一次,包括RSS抓取、文章聚类、LLM调用和电子邮件发送。
问题是:Workers付费计划有严格的月度限制(1000万次请求、100万次KV写入、100万次队列操作等)。没有内置的“达到限制时暂停”功能——Cloudflare会直接开始计费超出部分。KV写入超出限制后每百万次需支付5美元,因此重试循环的错误可能会迅速变得昂贵。
AWS有预算警报,但这些只是被动通知——等你读到邮件时,损失已经发生。我想要的是主动的、应用级别的自我保护。
因此,我构建了一个面向内部的断路器——它不是用来防止下游故障(Hystrix模式),而是监控我自己的资源消耗,并在达到上限之前优雅地降级。
关键设计决策:
- 每个资源的阈值:Workers请求(超出部分每百万次0.30美元)在80%时仅发出警告。KV写入(超出部分每百万次5美元)在90%时可以触发断路器。并非所有资源都同样危险,因此某些资源被配置为仅警告(触发=空)。
- 滞后效应:在90%时触发,在85%时恢复。5%的间隙可以防止振荡——没有它,系统会在每个检查周期之间在触发和恢复之间波动。
- 监控失败的安全保障:如果Cloudflare的使用API出现故障,保持最后已知状态,而不是假设“一切正常”。监控中断不应掩盖使用的激增。
- 警报去重:按资源和月份去重。没有这个,你将在一个资源达到80%后收到大约8600封相同的邮件。
实现:每5分钟查询Cloudflare的GraphQL API(请求、CPU、KV、队列)和可观察性遥测API(日志/追踪),并行评估8个资源维度,将状态缓存到KV。在检查之间只需进行一次KV读取——基本上是免费的。
当触发时,所有计划任务都会被跳过。定时触发器仍然会触发(你无法停止),但它首先检查断路器,如果触发则退出。
这个系统已经在生产环境中运行了两周。月初捕捉到KV读取的激增,达到了82%——收到了一封警告邮件,进行了调查,修复了根本原因,之后从未触及触发阈值。
这个模式应该适用于任何计量的无服务器平台(Lambda、Vercel、Supabase)或任何有预算上限的API(OpenAI、Twilio)。核心思想是:将自己的资源预算视为健康信号,就像你对待下游服务的错误率一样。
如果有兴趣,我很乐意分享代码细节。
完整的实现代码和测试的详细说明请见:https://yingjiezhao.com/en/articles/Usage-Circuit-Breaker-for-Cloudflare-Workers
查看原文
I run 3mins.news (https://3mins.news), an AI news aggregator built entirely on Cloudflare Workers. The backend has 10+ cron triggers running every few minutes — RSS fetching, article clustering, LLM calls, email delivery.<p>The problem: Workers Paid Plan has hard monthly limits (10M requests, 1M KV writes, 1M queue ops, etc.). There's no built-in "pause when you hit the limit" — CF just starts billing overages. KV writes cost $5/M over the cap, so a retry loop bug can get expensive fast.<p>AWS has Budget Alerts, but those are passive notifications — by the time you read the email, the damage is done. I wanted active, application-level self-protection.<p>So I built a circuit breaker that faces inward — instead of protecting against downstream failures (the Hystrix pattern), it monitors my own resource consumption and gracefully degrades before hitting the ceiling.<p>Key design decisions:<p>- Per-resource thresholds: Workers Requests ($0.30/M overage) only warns at 80%. KV Writes ($5/M overage) can trip the breaker at 90%. Not all resources are equally dangerous, so some are configured as warn-only (trip=null).<p>- Hysteresis: Trips at 90%, recovers at 85%. The 5% gap prevents oscillation — without it the system flaps between tripped and recovered every check cycle.<p>- Fail-safe on monitoring failure: If the CF usage API is down, maintain last known state rather than assuming "everything is fine." A monitoring outage shouldn't mask a usage spike.<p>- Alert dedup: Per-resource, per-month. Without it you'd get ~8,600 identical emails for the rest of the month once a resource hits 80%.<p>Implementation: every 5 minutes, queries CF's GraphQL API (requests, CPU, KV, queues) + Observability Telemetry API (logs/traces) in parallel, evaluates 8 resource dimensions, caches state to KV. Between checks it's a single KV read — essentially free.<p>When tripped, all scheduled tasks are skipped. The cron trigger still fires (you can't stop that), but the first thing it does is check the breaker and bail out if tripped.<p>It's been running in production for two weeks. Caught a KV reads spike at 82% early in the month — got one warning email, investigated, fixed the root cause, never hit the trip threshold.<p>The pattern should apply to any metered serverless platform (Lambda, Vercel, Supabase) or any API with budget ceilings (OpenAI, Twilio). The core idea: treat your own resource budget as a health signal, just like you'd treat a downstream service's error rate.<p>Happy to share code details if there's interest.<p>Full writeup with implementation code and tests: https://yingjiezhao.com/en/articles/Usage-Circuit-Breaker-for-Cloudflare-Workers