Aks HN:使用大型语言模型进行大规模重构。有什么经验可以分享吗?
我正在处理一个至关重要(公司依赖于它)、庞大(超过10万个文件)、繁忙(每天有数百名开发者提交代码)且老旧(超过10年)的代码库。
在这种情况下,我认为我们在保持整个代码库的可管理性方面做得相当出色。代码规范和约定已建立并得到遵守。
但当然,这并不是“干净的代码”。
开发效率低,测试困难且繁琐。
我们组件之间的依赖关系非常紧密,几乎所有东西都相互依赖。
我和我的团队有明确的任务来改善这种情况。
我们已经构建了许多工具来管理整体复杂性,但我认为我们已经达到了一个平台,额外的工具不会改善现状。如果有的话,只会增加开发者的认知负担。
我开始认为,处理代码库的整体复杂性是前进的方向。
依赖关系是必要的,但我们在隔离和保持依赖关系最小化方面做得并不好。
这导致了巨大的文件,其中包含多个关键且繁忙的类。创建的依赖关系往往是出于语法原因,而非语义原因。
我认为手动解决这些问题并不可行。此外,我的团队也没有合适的业务背景。
而且,我们应该进行的任何更改从商业角度来看都没有合理性。
我们认为可行的解决方案有两个:
1. 设法强迫/说服其他团队处理他们的复杂性。我们已经尝试过,但失败了。
2. 找到一种方法自己来解决。
考虑到第一个方案已经失败,以及我们可以投入的社会资本,只有第二个方案是可接受的解决方案。
手动处理这个问题是不可行的,自然我倾向于使用大型语言模型(LLM)来进行这种重构。
我们的想法是避免更新架构,而是将其置于一个更好的位置,以便最终进行架构改进。
我希望有一种管道,我们可以将代码库和问题输入一侧(例如这个文件太大,移动这个类),然后在另一侧得到一个拉取请求(PR)。
我确实尝试过一次相当具有挑战性的重构,但AI没有成功。虽然不是特别糟糕,但也还不足以让我推广。
我在这里向社区询问,是否有人尝试过类似的事情。
查看原文
I am working on a critical (the company depends on it), large (100k+ files), busy (100s of developer committing daily) and old (10+ years) codebase.<p>Given the conditions, I believe we are doing a stellar job at keeping the whole codebase somehow manageable. Linting and conventions are in place and respected.<p>But of course it is not "clean code".<p>Developer velocity is low, testing is difficult and cumbersome.<p>Dependencies between our components are very tight, and everything depends on everything else.<p>My team and I have a clear mandate to make the situation better.<p>A lot of tooling to manage the overall complexity have been build but I believe we have reached a plateau where extra tooling will not make the situation any better. If anything it will increase cognitive load on developers.<p>I start to think that handling the overall complexity of the codebase is the way forward.<p>Dependencies are needed, but we are not doing a stellar job at isolation and at keeping dependencies at a minimum.<p>This comes out as huge files with multiple critical and busy classes. Creating dependencies that are there for syntaxical reasons but not semantical reason.<p>I don't think it is feasible to manually address those problems. Also my team doesn't have the right business context.<p>Moreover none of the changes we should do are justificable from a business perspective.<p>The solution that we see somehow feasible are 2:<p>1. Somehow force/convince the other teams to handle their complexity. We already tried this and it failed.
2. Figure out a way to do it ourselves.<p>Only 2. is an acceptable solution given that 1. already failed and the social capital we can deploy.<p>Approaching this manually is unfeasible, and naturally I am leaning toward using LLM for this kinda of refactoring.<p>The idea is to avoid updating the architecture and simply put as in a better position to eventually make architectural improvements.<p>I would like some sort of pipeline where we feed the codebase and the problem on one side (this file is too big, move this class), and get a PR on the other side.<p>I did try a quite challenging refactoring, and the AI failed. Not terribly, but not something that I can sell just yet.<p>I am here asking the community if you have tried something similar.