请问HN:对于AI辅助的代码更改,差异(diffs)仍然有用吗?

3作者: nuky24 天前原帖
我在想,传统的差异比较是否越来越不适合AI辅助开发了。 最近,在审查时,当AI生成大量更改时,我感到很沮丧。即使差异“很小”,也很难理解实际在行为或结构上发生了什么变化。 我开始尝试一种不同的方法:比较代码的两个快照(基线和当前),而不是直接比较行差异。每个快照捕捉到一个粗略的API形状和从抽象语法树(AST)派生的行为信号。目标不是进行深入的语义分析,而是快速地发出信号,表明是否有任何有意义的变化发生。 这种方法故意保持表面化和非评判性——只是信号,而不是裁决。 与此同时,我看到越来越多基于大型语言模型(LLM)的工具在帮助进行PR审查。用概率工具审查概率变化让我觉得有点危险。 我很好奇这里的其他人对此怎么看: – 对于AI生成的更改,差异比较仍然有效吗? – 你们今天是如何审查大规模AI辅助重构的?
查看原文
I’m wondering whether traditional diffs are becoming less suitable for AI-assisted development..<p>Lately I’ve been feeling frustrated during reviews when an AI generates a large number of changes. Even if the diff is &quot;small&quot;, it can be very hard to understand what actually changed in behavior or structure.<p>I started experimenting with a different approach: comparing two snapshots of the code (baseline and current) instead of raw line diffs. Each snapshot captures a rough API shape and a behavior signal derived from the AST. The goal isn’t deep semantic analysis, but something fast that can signal whether anything meaningful actually changed.<p>It’s intentionally shallow and non-judgmental — just signals, not verdicts.<p>At the same time, I see more and more LLM-based tools helping with PR reviews. Probabilistic changes reviewed by probabilistic tools feels a bit dangerous to me.<p>Curious how others here think about this: – Do diffs still work well for AI-generated changes? – How do you review large AI-assisted refactors today?