不稳定的测试并不是测试问题。它们是你打破的反馈循环。
在你的持续集成(CI)管道中,每一个重试规则都是一种止痛药。它抑制了症状,而底层的破损代码库存却在不断增加,直到整个系统上瘾时,没人会感受到疼痛。
我看到了一篇在 Hacker News 上的帖子,完美地说明了这种模式:https://news.ycombinator.com/item?id=46967724
重试、隔离、增加等待时间——这些都不是解决方案。它们只是让 CI 暂时跳过错误。真正的问题是:有越来越多的破损代码在滋养你的管道,而没有机制让引入这些代码的人感受到痛苦。
两个相互强化的循环开始运作:
```
红色管道 --> <重试> --> 绿色构建
^ |
| (长期) |
| v
更多不稳定性 <---- 隐藏的错误积累
```
R1 “上瘾”:每次重试都让灯变绿。但隐藏的错误在底层积累,使系统变得更加不稳定,明天需要更多的重试。这是教科书式的“转移负担”,来自 Donella Meadows。
R2 “侵蚀”:因为你不信任 CI 的信号,你降低了标准。因为你降低了标准,更糟糕的代码被合并。信号变得更加不可靠。重复这个过程,直到你的管道变成装饰品。
原帖问 QA 和工程应该如何分担责任。这是个错误的问题。正确的问题是:如何让引入不稳定性的人感受到痛苦?
我也遇到了同样的障碍。
我构建了 CI,将 973 个 ROS 2 包移植到两个非官方支持的 Linux 发行版(openEuler + openKylin,RISC-V)。没有任何上游支持。
v1 - 粗暴探测。拉取所有 973 个包,让它们崩溃。597 个构建成功,214 个依赖缺口和 151 个失败被记录。这个管道并不是为了通过,而是为了让每一个隐藏的库存可见。
v2 - 验证引擎。先探测环境,在构建之前识别缺口。停止向管道输入垃圾。构建尝试减少,成功率提高。
v3 - 增量库存管理。将小批量问题隔离,一组一组地解决。减法,而不是加法。
我的系统也上瘾了。
在这里,我要自我反省。我的 CI 也有相同的模式。虚拟环境用来绕过依赖冲突。伪装规则用来伪装包的身份。都是权宜之计。
但我知道这些只是权宜之计。大多数团队并不知道。他们认为重试是解决方案。意识到上瘾和被其吞噬是两回事。
我可以识别出毒害你管道的库存。我可以设计反馈循环,让合适的人感受到痛苦。但我无法强迫一个组织去关心。这通常是实际的瓶颈——不是不稳定的测试,而是系统拒绝让任何人感受到后果。
代码库:https://github.com/Sebastianhayashi/the_adaptive_verification_engine
查看原文
Every retry rule in your CI pipeline is a painkiller. It suppresses the symptom, the stock of broken code keeps growing underneath, and nobody feels the pain until the whole system is addicted.<p>I came across this HN post that perfectly illustrates the pattern: https://news.ycombinator.com/item?id=46967724<p>Retries, quarantining, adding waits - these aren't fixes. They're letting CI skip errors temporarily. The real problem: there's a growing stock of broken code feeding your pipeline, and no mechanism exists to make the person who introduced it feel the pain.<p>Two reinforcing loops start running:<p><pre><code> RED PIPELINE --> <RETRY> --> GREEN BUILD
^ |
| (Long-Term) |
| v
MORE FLAKINESS <---- HIDDEN BUGS ACCUMULATE
</code></pre>
R1 "The Addiction": Every retry makes the light green. But hidden bugs accumulate underneath, making the system flakier, forcing more retries tomorrow. This is textbook "Shifting the Burden" from Donella Meadows.<p>R2 "The Erosion": Because you don't trust CI signal, you lower standards. Because you lower standards, worse code gets merged. Signal becomes even less trustworthy. Repeat until your pipeline is a decoration.<p>The original post asked how QA and engineering should split responsibility. Wrong question. The right question: how do you make the pain of instability felt by the person who introduced it?<p>I HIT THE SAME WALL<p>I built CI to port 973 ROS 2 packages onto two non-officially-supported Linux distros (openEuler + openKylin, RISC-V). Zero upstream support.<p>v1 - Brute-force probe. Pull all 973 packages, let them break. 597 built, 214 dependency gaps and 151 failures mapped. The pipeline wasn't meant to pass. It was meant to make every hidden stock visible.<p>v2 - Verification engine. Probe the environment first, identify gaps before building. Stop feeding garbage into the pipeline. Build attempts dropped, success rate went up.<p>v3 - Incremental stock management. Isolate small batches of problems, resolve them one group at a time. Subtraction, not addition.<p>MY SYSTEM IS ADDICTED TOO<p>Here's where I punch myself in the face. My own CI has the same pattern. Virtual environments to bypass dependency conflicts. Masquerade rules that spoof package identities. Band-aids.<p>But I know they're band-aids. Most teams don't. They think retries are solutions. Being aware of the addiction and being consumed by it are two very different things.<p>I can identify the stock poisoning your pipeline. I can design the feedback loop that makes the right person feel the pain. What I can't do is force an organization to care. That's usually the real bottleneck - not the flaky tests, but the system's refusal to let anyone feel the consequences.<p>Repo: https://github.com/Sebastianhayashi/the_adaptive_verification_engine