HackerNews中文版

— 作者：Raj Guru Yadav 像许多开发者一样，我对大型语言模型（LLMs）充满了好奇。但当我问到：“我能在离线状态下快速运行一个类似ChatGPT的助手，而不需要16GB以上的内存吗？”这个挑战让我无法抗拒。 目标构建一个完全离线、轻量级的AI助手，具备以下特点： 下载大小小于50MB 无需互联网 快速响应（在1秒以内） 零遥测 完全本地的嵌入和推理 结果：一个40MB的离线ChatGPT克隆，可以在浏览器中或USB闪存驱动器上运行。 40MB内部包含什么？这是我如何将智能对话压缩到如此小的包裹中的： 模型：通过llama.cpp量化的Mistral 7B Q4_K_M 推理引擎：llama.cpp（编译为WebAssembly或本地C++） 用户界面：轻量级的React/Tailwind界面 存储：用于本地聊天历史的IndexedDB 嵌入：本地MiniLM用于智能PDF或笔记搜索 附加功能：Whisper.cpp用于本地语音输入；Coqui TTS用于语音输出 我为什么要构建它我（Raj Guru Yadav），一名16岁的开发者和学生，想要： 深入了解大型语言模型的实际工作原理 构建一个尊重隐私且本地化的工具 证明AI不需要云端也能强大 为离线用户（如许多印度学生）提供真正的AI支持 挑战在低内存设备上的内存瓶颈 对小型模型进行提示调优以获得更智能的回复 WebAssembly优化以提高浏览器性能 与小型TTS/ASR模型的离线语音和文本集成 性能（在4GB的笔记本电脑上）能够合理回答事实、编码和数学问题 离线读取和总结PDF文件 本地记忆对话内容 （可选）大声朗读答案 最后的想法 AI不应该被锁在付费墙或云端后面。我的目标是将智能助手带到每个人的手中—— 完全离线，完全免费，完全属于你。 由 Raj Guru Yadav 制作 开发者 | 700多个项目的构建者 | 热衷于开放AI为所有人服务

查看原文

— by Raj Guru Yadav Like many developers, I’ve been fascinated by LLMs. But the moment I asked: “Can I run a ChatGPT-like assistant offline, fast, and without needing 16GB+ RAM?” The challenge became too tempting to ignore.The Goal Build a fully offline, lightweight AI assistant with:< 50MB download sizeNo internet requirementFast responses (under 1 second)Zero telemetryFully local embeddings & inferenceResult: A 40MB offline ChatGPT clone you can run in-browser or on a USB stick.What’s Inside the 40MB? Here’s how I squeezed intelligent conversation into such a tiny package:Model: Mistral 7B Q4_K_M quantized via llama.cppInference Engine: llama.cpp (compiled to WebAssembly or native C++)UI: Lightweight React/Tailwind interfaceStorage: IndexedDB for local chat historyEmbeddings: Local MiniLM for smart PDF or note searchExtras: Whisper.cpp for local voice input; Coqui TTS for speech outputWhy I Built It I (Raj Guru Yadav), a 16-year-old dev and student, wanted to:Learn deeply how LLMs actually work under the hoodBuild something privacy-respecting and localProve that AI doesn’t need the cloud to be powerfulGive offline users (like many students in India) real AI supportChallenges Memory bottlenecks in low-RAM devicesPrompt tuning for smarter replies in tiny modelsWebAssembly optimizations for browser performanceOffline voice + text integration with small TTS/ASR modelsPerformance (on a 4GB laptop) Answers factual, coding, and math questions decentlyReads and summarizes offline PDFsRemembers conversation locally(Optional) Speaks answers aloudFinal Thought AI shouldn’t be locked behind paywalls or clouds. My goal is to bring smart assistants into everyone’s hands — fully offline, fully free, fully yours.Made with by Raj Guru YadavDev | Builder of 700+ projects | Passionate about open AI for all

我在40MB内构建了一个开源的离线ChatGPT替代品