请问HN:如何在不消耗过多令牌的情况下,给AI代理提供真实的代码库上下文?
在处理一个大型的 Rust 代码库时,令牌问题确实存在——Claude Code 在理解两个模块之间的关系时,可能会花费 5 美元的上下文费用,甚至在写出一行代码之前。而一旦上下文压缩开始生效,情况就更糟了——代理完全失去线索,开始从头重新搜索相同的文件。
我尝试过的几种方法:
手动提供 CLAUDE.md / 架构文档——有帮助,但很快就会过时。Cursor 的内置索引——在单一代码库上失效,而且我不喜欢专有代码发送到他们的服务器。基本的 MCP 服务器配合 grep——对精确匹配有效,但对语义查询毫无用处。
最终,我构建了一个更为严谨的工具:一个本地的 Tree-sitter 索引器,它构建了文件关系的知识图谱,并通过 MCP 暴露出来,以便代理可以进行语义查询,而不是盲目地使用 grep。只需一次工具调用,而不是 15 次 grep 迭代。我在这里发布了它:https://github.com/Muvon/octocode
但我真的很想知道其他人在这方面的做法,以便在深入研究之前获取一些灵感。
我有三个具体问题:
1. 你们如何处理“涟漪效应”问题——知道更改一个文件在语义上会影响其他并不明显关联的文件?
2. 你们是否信任闭源索引与专有代码,还是选择了本地优先?
3. 有没有人成功在大规模实践中实现 GraphRAG 风格的关系映射,还是这仍然主要是炒作?
查看原文
Working on a large Rust codebase. The token problem is real — Claude Code will happily spend $5 of context just trying to understand how two modules relate before writing a single line. And once context compaction kicks in, it's even worse — the agent loses the thread completely and starts grepping the same files again from scratch.<p>Approaches I've tried:<p>Feeding CLAUDE.md / architecture docs manually — helps, but gets stale fast. Cursor's built-in indexing — breaks on monorepos, and I don't love proprietary code going to their servers. Basic MCP server with grep — works for exact matches, useless for semantic queries.<p>Eventually built something more serious: a local Tree-sitter indexer that builds a knowledge graph of file relationships and exposes it via MCP so agents query semantically instead of grepping blind. One tool call instead of 15 grep iterations. Published it here: https://github.com/Muvon/octocode<p>But genuinely curious what others are doing before I go deeper on it.<p>Three specific questions:<p>1. How do you handle the "ripple effect" problem — knowing that changing one file semantically affects others that aren't obviously linked?<p>2. Do you trust closed-source indexing with proprietary code, or have you gone local-first?<p>3. Has anyone gotten GraphRAG-style relationship mapping to work in practice at scale, or is it still mostly hype?