HackerNews中文版

大家好，我正在进行一个名为 L88 的项目——这是一个本地的 RAG 系统，最初我专注于用户界面和用户体验，因此检索和模型架构仍需进一步完善。代码库： [https://github.com/Hundred-Trillion/L88-Full](https://github.com/Hundred-Trillion/L88-Full) 我在 8GB 显存和强大的 CPU（128GB 内存）上运行这个项目。嵌入和预处理在 CPU 上进行，主要模型则在 GPU 上运行。我遇到的一个限制是，由于计算资源的限制，我的评估器和生成器 LLM 最终使用的是同一个模型，这样就失去了评估的意义。我非常希望能得到以下方面的反馈： - 小显存 RAG 的更好架构建议 - 有效分离评估器和生成器角色 - 改进 LangGraph 流程 - 你注意到的任何错误或设计缺陷 - 针对本地硬件优化系统的方法我今年 18 岁，仍在学习关于 LLM 架构的知识，因此任何技术上的批评或建议都将帮助我作为开发者成长。如果你查看代码库或留下反馈，我将非常感激——我希望通过实际项目建立一个坚实的基础和声誉。谢谢！

查看原文

Hey everyone,I’ve been working on a project called L88 — a local RAG system that I initially focused on UI/UX for, so the retrieval and model architecture still need proper refinement.Repo: <a href="https://github.com/Hundred-Trillion/L88-Full" rel="nofollow">https://github.com/Hundred-Trillion/L88-Full</a>I’m running this on 8GB VRAM and a strong CPU (128GB RAM). Embeddings and preprocessing run on CPU, and the main model runs on GPU. One limitation I ran into is that my evaluator and generator LLM ended up being the same model due to compute constraints, which defeats the purpose of evaluation.I’d really appreciate feedback on:Better architecture ideas for small-VRAM RAGSplitting evaluator/generator roles effectivelyImproving the LangGraph pipelineAny bugs or design smells you noticeWays to optimize the system for local hardwareI’m 18 and still learning a lot about proper LLM architecture, so any technical critique or suggestions would help me grow as a developer. If you check out the repo or leave feedback, it would mean a lot — I’m trying to build a solid foundation and reputation through real projects.Thanks!

展示HN：L88 – 一种基于8GB显存的本地RAG系统（需要架构反馈）