HackerNews中文版

我正在处理一个至关重要（公司依赖于它）、庞大（超过10万个文件）、繁忙（每天有数百名开发者提交代码）且老旧（超过10年）的代码库。在这种情况下，我认为我们在保持整个代码库的可管理性方面做得相当出色。代码规范和约定已建立并得到遵守。但当然，这并不是“干净的代码”。开发效率低，测试困难且繁琐。我们组件之间的依赖关系非常紧密，几乎所有东西都相互依赖。我和我的团队有明确的任务来改善这种情况。我们已经构建了许多工具来管理整体复杂性，但我认为我们已经达到了一个平台，额外的工具不会改善现状。如果有的话，只会增加开发者的认知负担。我开始认为，处理代码库的整体复杂性是前进的方向。依赖关系是必要的，但我们在隔离和保持依赖关系最小化方面做得并不好。这导致了巨大的文件，其中包含多个关键且繁忙的类。创建的依赖关系往往是出于语法原因，而非语义原因。我认为手动解决这些问题并不可行。此外，我的团队也没有合适的业务背景。而且，我们应该进行的任何更改从商业角度来看都没有合理性。我们认为可行的解决方案有两个： 1. 设法强迫/说服其他团队处理他们的复杂性。我们已经尝试过，但失败了。 2. 找到一种方法自己来解决。考虑到第一个方案已经失败，以及我们可以投入的社会资本，只有第二个方案是可接受的解决方案。手动处理这个问题是不可行的，自然我倾向于使用大型语言模型（LLM）来进行这种重构。我们的想法是避免更新架构，而是将其置于一个更好的位置，以便最终进行架构改进。我希望有一种管道，我们可以将代码库和问题输入一侧（例如这个文件太大，移动这个类），然后在另一侧得到一个拉取请求（PR）。我确实尝试过一次相当具有挑战性的重构，但AI没有成功。虽然不是特别糟糕，但也还不足以让我推广。我在这里向社区询问，是否有人尝试过类似的事情。

查看原文

I am working on a critical (the company depends on it), large (100k+ files), busy (100s of developer committing daily) and old (10+ years) codebase.Given the conditions, I believe we are doing a stellar job at keeping the whole codebase somehow manageable. Linting and conventions are in place and respected.But of course it is not "clean code".Developer velocity is low, testing is difficult and cumbersome.Dependencies between our components are very tight, and everything depends on everything else.My team and I have a clear mandate to make the situation better.A lot of tooling to manage the overall complexity have been build but I believe we have reached a plateau where extra tooling will not make the situation any better. If anything it will increase cognitive load on developers.I start to think that handling the overall complexity of the codebase is the way forward.Dependencies are needed, but we are not doing a stellar job at isolation and at keeping dependencies at a minimum.This comes out as huge files with multiple critical and busy classes. Creating dependencies that are there for syntaxical reasons but not semantical reason.I don't think it is feasible to manually address those problems. Also my team doesn't have the right business context.Moreover none of the changes we should do are justificable from a business perspective.The solution that we see somehow feasible are 2:1. Somehow force/convince the other teams to handle their complexity. We already tried this and it failed. 2. Figure out a way to do it ourselves.Only 2. is an acceptable solution given that 1. already failed and the social capital we can deploy.Approaching this manually is unfeasible, and naturally I am leaning toward using LLM for this kinda of refactoring.The idea is to avoid updating the architecture and simply put as in a better position to eventually make architectural improvements.I would like some sort of pipeline where we feed the codebase and the problem on one side (this file is too big, move this class), and get a PR on the other side.I did try a quite challenging refactoring, and the AI failed. Not terribly, but not something that I can sell just yet.I am here asking the community if you have tried something similar.

Aks HN：使用大型语言模型进行大规模重构。有什么经验可以分享吗？