展示HN:离网:设备端AI网页浏览,工具视觉、图像、语音——速度提升3倍

1作者: ali_chherawalla大约 3 小时前原帖
九天前,我在这里发布了《离网》,你们的反应让我感到惊喜——124个点赞,66条评论,错误报告我当天就修复了,还有让开源变得有意义的反馈。 你们告诉我你们想要什么。现在,我带来了以下更新: 你的人工智能现在可以使用工具——完全离线。 网页搜索、计算器、日期/时间、设备信息——配备自动工具循环。 你的3B参数模型不再仅仅生成文本。它可以推理、调用工具并综合结果。 在你的手机上。无需API密钥。无需服务器。无需云功能。 这有什么意义?这意味着“本地玩具”和“实用助手”之间的差距大大缩小了。 你不需要GPT-4来查找信息并给出答案。运行在你的Snapdragon上的量化Qwen 3 / SMOLLM3可以迅速完成这个任务。 速度提升了3倍,并且可以配置KV缓存。 你现在可以选择f16、q8_0和q4_0三种KV缓存类型。在q4_0上,之前只能处理10个token的模型现在可以处理30个。应用在你第一次生成后甚至会提示你:“嘿,你可以运行得更快。”只需轻触一下。 这有什么意义?关于设备端AI的最大抱怨是“它太慢,无法实用。”这个论点现在失去了很多说服力。在手机上每秒30个token的速度比大多数人阅读的速度还要快。 现在在两个应用商店上线。无需侧载。无需Xcode。 《离网》现在已在App Store和Google Play上架。像安装其他应用一样安装它。你的父母也能使用这个。 这有什么意义?设备端AI从“开发者的酷炫周末项目”变成了“普通人实际上可以尝试的东西。”这很重要,因为隐私不应该需要计算机科学学位。 未改变的内容: - MIT许可证。完全开源。每一行代码 - 你的设备上没有数据外泄。没有分析。没有遥测。没有“匿名使用数据”。 - 文本生成(15-30个token)、图像生成(在NPU上5-10秒)、视觉AI、语音转录、文档分析——全部离线 - 带上任何GGUF模型。运行Qwen 3、Llama 3.2、Gemma 3、Phi-4,随你选择。 我之所以构建这个,是因为我相信你口袋里的手机应该是你拥有的最私密的计算机——而不是监控最严重的。每周模型都在变得更小、更快。硬件已经到位。软件只需要跟上。 如果你对此感同身受,给我在GitHub上点个星真的很有帮助:<a href="https://github.com/alichherawalla/off-grid-mobile" rel="nofollow">https://github.com/alichherawalla/off-grid-mobile</a> 我会在评论区等你。告诉我接下来该构建什么。
查看原文
Nine days ago I posted Off Grid here and you showed up - 124 points, 66 comments, bug reports I fixed same-day, and the kind of feedback that makes open source worth it.<p>You told me what you wanted. Here&#x27;s what I shipped: Your AI can now use tools — entirely offline.<p>Web search, calculator, date&#x2F;time, device info — with automatic tool loops.<p>Your 3B parameter model doesn&#x27;t just generate text anymore. It reasons, calls tools, and synthesizes results.<p>On your phone. No API key. No server. No cloud function.<p>So what? It means the gap between &quot;local toy&quot; and &quot;useful assistant&quot; just got dramatically smaller.<p>You don&#x27;t need GPT-4 to look something up and give you an answer. A quantized Qwen 3 &#x2F; SMOLLM3 running on your Snapdragon can do it in no time.<p>3x faster with configurable KV cache. You can now choose between f16, q8_0, and q4_0 KV cache types. On q4_0, models that were doing 10 tok&#x2F;s are hitting 30. The app even nudges you after your first generation: &quot;Hey, you could be running faster.&quot; One tap.<p>So what? The #1 complaint about on-device AI is &quot;it&#x27;s too slow to be useful.&quot; That argument just lost a lot of weight. 30tokens&#x2F;second on a phone is faster than most people read.<p>Live on both stores. No sideloading. No Xcode.<p>Off Grid is now on the App Store and Google Play. Install it like any other app. Your parents could use this.<p>So what? On-device AI just went from &quot;cool weekend project for developers&quot; to &quot;thing normal people can actually try.&quot; That matters because privacy shouldn&#x27;t require a CS degree.<p>What hasn&#x27;t changed: - MIT licensed. Fully open source. Every line - Zero data leaves your device. No analytics. No telemetry. No &quot;anonymous usage data.&quot; - Text gen (15-30 tok&#x2F;s), image gen (5-10s on NPU), vision AI, voice transcription, document analysis — all offline - Bring any GGUF model. Run Qwen 3, Llama 3.2, Gemma 3, Phi-4, whatever you want.<p>I&#x27;m building this because I believe the phone in your pocket should be the most private computer you own — not the most surveilled. Every week the models get smaller and faster. The hardware is already there. The software just needs to catch up.<p>If this resonates, a star on GitHub genuinely helps: <a href="https:&#x2F;&#x2F;github.com&#x2F;alichherawalla&#x2F;off-grid-mobile" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;alichherawalla&#x2F;off-grid-mobile</a><p>I&#x27;m in the comments. Tell me what to build next.