在生产环境中,您如何处理丢失的 webhook?
我曾在几家公司工作过,我们常常在几个小时后才发现来自 Stripe/Shopify 的关键 webhook 从未到达(可能是部署、超时、bug 等原因)。<p>每个团队最终都构建了相同的解决方案:重试逻辑、死信队列、监控。<p>想了解其他团队是如何处理这个问题的:
- 你们依赖于服务提供商的重试策略吗?
- 自己构建了可靠性层吗?
- 使用某种服务吗?
- 发生时只是手动对账吗?<p>(背景:正在构建 https://relaehook.com 来解决这个问题,但我真心想知道行业的普遍做法是什么。)
查看原文
I've worked at several companies where we'd discover hours later that critical webhooks from Stripe/Shopify never arrived (deployment, timeout, bug, etc.).<p>Every team ended up building the same solution: retry logic, dead letter queue, monitoring.<p>Curious how others handle this:
- Do you rely on the provider's retry policy?
- Built your own reliability layer?
- Use a service?
- Just manually reconcile when it happens?<p>(Context: Building https://relaehook.com to solve this, but genuinely curious what the norm is)