HackerNews中文版

AWS 最近发布了关于 2025 年 10 月 us-east-1 区域故障的事后分析报告[1]。DynamoDB 中的 DNS 竞争条件导致了 EC2、Lambda、Redshift 和 NLB 的级联故障，造成了大约 14 小时的新实例启动操作受损，并对多个服务产生了连锁影响。有没有人对 AWS 的有效可用性进行过定量建模，考虑到它们控制平面和数据平面内的服务间依赖关系？换句话说：如果 EC2 依赖于 DynamoDB，而 Lambda 依赖于 EC2 + NLB，那么在实际情况中，复合可用性是多少？ [1] - https://aws.amazon.com/message/101925/

查看原文

AWS recently published a postmortem on the October 2025 us-east-1 outage [1]. A DNS race condition in DynamoDB cascaded across EC2, Lambda, Redshift, and NLB, leading to ~14 hours of degraded operations for new instance launches and knock-on effects on multiple services.<p>Has anyone quantitatively modelled AWS’s effective availability once you account for inter-service dependencies inside their control plane and data plane?<p>In other words: if EC2 depends on DynamoDB, and Lambda depends on EC2 + NLB, what’s the composite availability in practice?<p>[1] - https://aws.amazon.com/message/101925/

问HN：由于最近的故障（美国东部1区事件），AWS损失了多少个9？