不稳定的测试并不是测试问题。它们是你打破的反馈循环。

1作者: microseyuyu大约 1 个月前原帖
在你的持续集成(CI)管道中,每一个重试规则都是一种止痛药。它抑制了症状,而底层的破损代码库存却在不断增加,直到整个系统上瘾时,没人会感受到疼痛。 我看到了一篇在 Hacker News 上的帖子,完美地说明了这种模式:https://news.ycombinator.com/item?id=46967724 重试、隔离、增加等待时间——这些都不是解决方案。它们只是让 CI 暂时跳过错误。真正的问题是:有越来越多的破损代码在滋养你的管道,而没有机制让引入这些代码的人感受到痛苦。 两个相互强化的循环开始运作: ``` 红色管道 --> <重试> --> 绿色构建 ^ | | (长期) | | v 更多不稳定性 <---- 隐藏的错误积累 ``` R1 “上瘾”:每次重试都让灯变绿。但隐藏的错误在底层积累,使系统变得更加不稳定,明天需要更多的重试。这是教科书式的“转移负担”,来自 Donella Meadows。 R2 “侵蚀”:因为你不信任 CI 的信号,你降低了标准。因为你降低了标准,更糟糕的代码被合并。信号变得更加不可靠。重复这个过程,直到你的管道变成装饰品。 原帖问 QA 和工程应该如何分担责任。这是个错误的问题。正确的问题是:如何让引入不稳定性的人感受到痛苦? 我也遇到了同样的障碍。 我构建了 CI,将 973 个 ROS 2 包移植到两个非官方支持的 Linux 发行版(openEuler + openKylin,RISC-V)。没有任何上游支持。 v1 - 粗暴探测。拉取所有 973 个包,让它们崩溃。597 个构建成功,214 个依赖缺口和 151 个失败被记录。这个管道并不是为了通过,而是为了让每一个隐藏的库存可见。 v2 - 验证引擎。先探测环境,在构建之前识别缺口。停止向管道输入垃圾。构建尝试减少,成功率提高。 v3 - 增量库存管理。将小批量问题隔离,一组一组地解决。减法,而不是加法。 我的系统也上瘾了。 在这里,我要自我反省。我的 CI 也有相同的模式。虚拟环境用来绕过依赖冲突。伪装规则用来伪装包的身份。都是权宜之计。 但我知道这些只是权宜之计。大多数团队并不知道。他们认为重试是解决方案。意识到上瘾和被其吞噬是两回事。 我可以识别出毒害你管道的库存。我可以设计反馈循环,让合适的人感受到痛苦。但我无法强迫一个组织去关心。这通常是实际的瓶颈——不是不稳定的测试,而是系统拒绝让任何人感受到后果。 代码库:https://github.com/Sebastianhayashi/the_adaptive_verification_engine
查看原文
Every retry rule in your CI pipeline is a painkiller. It suppresses the symptom, the stock of broken code keeps growing underneath, and nobody feels the pain until the whole system is addicted.<p>I came across this HN post that perfectly illustrates the pattern: https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=46967724<p>Retries, quarantining, adding waits - these aren&#x27;t fixes. They&#x27;re letting CI skip errors temporarily. The real problem: there&#x27;s a growing stock of broken code feeding your pipeline, and no mechanism exists to make the person who introduced it feel the pain.<p>Two reinforcing loops start running:<p><pre><code> RED PIPELINE --&gt; &lt;RETRY&gt; --&gt; GREEN BUILD ^ | | (Long-Term) | | v MORE FLAKINESS &lt;---- HIDDEN BUGS ACCUMULATE </code></pre> R1 &quot;The Addiction&quot;: Every retry makes the light green. But hidden bugs accumulate underneath, making the system flakier, forcing more retries tomorrow. This is textbook &quot;Shifting the Burden&quot; from Donella Meadows.<p>R2 &quot;The Erosion&quot;: Because you don&#x27;t trust CI signal, you lower standards. Because you lower standards, worse code gets merged. Signal becomes even less trustworthy. Repeat until your pipeline is a decoration.<p>The original post asked how QA and engineering should split responsibility. Wrong question. The right question: how do you make the pain of instability felt by the person who introduced it?<p>I HIT THE SAME WALL<p>I built CI to port 973 ROS 2 packages onto two non-officially-supported Linux distros (openEuler + openKylin, RISC-V). Zero upstream support.<p>v1 - Brute-force probe. Pull all 973 packages, let them break. 597 built, 214 dependency gaps and 151 failures mapped. The pipeline wasn&#x27;t meant to pass. It was meant to make every hidden stock visible.<p>v2 - Verification engine. Probe the environment first, identify gaps before building. Stop feeding garbage into the pipeline. Build attempts dropped, success rate went up.<p>v3 - Incremental stock management. Isolate small batches of problems, resolve them one group at a time. Subtraction, not addition.<p>MY SYSTEM IS ADDICTED TOO<p>Here&#x27;s where I punch myself in the face. My own CI has the same pattern. Virtual environments to bypass dependency conflicts. Masquerade rules that spoof package identities. Band-aids.<p>But I know they&#x27;re band-aids. Most teams don&#x27;t. They think retries are solutions. Being aware of the addiction and being consumed by it are two very different things.<p>I can identify the stock poisoning your pipeline. I can design the feedback loop that makes the right person feel the pain. What I can&#x27;t do is force an organization to care. That&#x27;s usually the real bottleneck - not the flaky tests, but the system&#x27;s refusal to let anyone feel the consequences.<p>Repo: https:&#x2F;&#x2F;github.com&#x2F;Sebastianhayashi&#x2F;the_adaptive_verification_engine