HackerNews中文版

三天前，我在这里发布了 OctoFlow 0.83（GPU 原生编程语言，2.2 MB 二进制文件）。反馈非常好。从那时起，我推出了 v1.0.0，这是我一直在努力构建的内容：一个 GPU 虚拟机。这个想法是：GPU 是计算机，CPU 是 BIOS。你启动一个虚拟机，编程一个内核实例的调度链，使用 vkQueueSubmit 提交一次，然后所有的操作——层执行、层间通信、自我调节、压缩、数据库查询——都在 GPU 上完成，而无需 CPU 的往返。CPU 仅提供输入/输出。 ```rust let vm = vm_boot() let prog = vm_program(vm, kernels, 4) vm_write_register(vm, 0, 0, input) vm_execute(prog) let result = vm_read_register(vm, 3, 30) ``` 4 个虚拟机实例，一次提交，阶段之间没有 CPU 参与。内存模型包含 5 个 SSBOs：寄存器（每个虚拟机的工作内存）、指标（调节信号）、全局（共享可变——KV 缓存、数据库表）、控制（间接调度参数）、堆（不可变的批量数据——量化权重）。使其有趣的地方： - 自我稳态调节器：每个虚拟机实例都有一个内核，监控激活规范、内存压力和吞吐量。GPU 自我调节，无需 CPU 的干预。 - GPU 自我编程：一个内核将工作组计数写入控制缓冲区，下一次 vkCmdDispatchIndirect 会读取这些计数。GPU 自行决定工作负载。 - 压缩作为计算：Q4_K 反量化、增量编码、字典查找——这些只是调度链中的内核，而不是一个特殊的子系统。添加一个新的编解码器 = 编写一个发射器。无需更改 Rust。 - CPU 轮询：指标和控制是 HOST_VISIBLE 的。CPU 可以轮询 GPU 状态，并在不重建命令缓冲区的情况下激活休眠的虚拟机。GPU 广播需求，CPU 满足这些需求。虚拟机是与工作负载无关的。同样的架构可以处理 LLM 推理、数据库查询、物理仿真、图神经网络、数字信号处理管道和游戏 AI。我们已经验证了这六种情况。调度链是通用原语。 v1.0.0 中的新内容超越了 GPU 虚拟机： - 247 个标准库模块（从 51 个增加） - 原生媒体编解码器（PNG、JPEG、GIF、MP4/H.264——无需 ffmpeg） - 带有 15 个以上小部件的 GUI 工具包 - 终端图形（Kitty/Sixel） - 1,169 个测试通过 - 仍然是 2.3 MB，仍然没有外部依赖零依赖的情况是真实的——没有 Rust crate。该二进制文件链接到 vulkan-1 和系统库，别无其他。cargo audit 没有任何需要审计的内容。登陆页面：[https://octoflow-lang.github.io/octoflow/](https://octoflow-lang.github.io/octoflow/) GPU 虚拟机详情：[https://octoflow-lang.github.io/octoflow/gpu-vm.html](https://octoflow-lang.github.io/octoflow/gpu-vm.html) GitHub：[https://github.com/octoflow-lang/octoflow](https://github.com/octoflow-lang/octoflow) 下载：[https://github.com/octoflow-lang/octoflow/releases/latest](https://github.com/octoflow-lang/octoflow/releases/latest) 我是一名开发者。这是早期版本。GPU 虚拟机可以正常工作，测试通过且位精确，但前方还有很多路要走——真正的大规模 LLM 推理、多智能体编排、完整的数据库引擎。我希望能收到任何与 GPU 计算、Vulkan 或语言设计相关的反馈。

查看原文

Three days ago I posted OctoFlow 0.83 here (GPU-native programming language, 2.2 MB binary). The feedback was great. Since then I've pushed v1.0.0 with the thing I've actually been building toward: a GPU Virtual Machine.The idea: the GPU is the computer, the CPU is the BIOS.You boot a VM, program a dispatch chain of kernel instances, submit once with vkQueueSubmit, and everything — layer execution, inter-layer communication, self-regulation, compression, database queries — happens on the GPU without CPU round-trips. The CPU just provides I/O.<pre><code> let vm = vm_boot() let prog = vm_program(vm, kernels, 4) vm_write_register(vm, 0, 0, input) vm_execute(prog) let result = vm_read_register(vm, 3, 30) </code></pre> 4 VM instances, one submit, no CPU involvement between stages.The memory model is 5 SSBOs: Registers (per-VM working memory), Metrics (regulator signals), Globals (shared mutable — KV cache, DB tables), Control (indirect dispatch params), Heap (immutable bulk data — quantized weights).What makes it interesting:- Homeostasis regulator: each VM instance has a kernel that monitors activation norms, memory pressure, throughput. The GPU self-regulates without asking the CPU.- GPU self-programming: a kernel writes workgroup counts to the Control buffer, the next vkCmdDispatchIndirect reads them. The GPU decides its own workload.- Compression as computation: Q4_K dequantization, delta encoding, dictionary lookup — these are just kernels in the dispatch chain, not a special subsystem. Adding a new codec = writing an emitter. No Rust changes.- CPU polling: Metrics and Control are HOST_VISIBLE. CPU can poll GPU state and activate dormant VMs without rebuilding the command buffer. The GPU broadcasts needs, the CPU fulfills them.The VM is workload-agnostic. Same architecture handles LLM inference, database queries, physics sims, graph neural networks, DSP pipelines, and game AI. We've validated all six. The dispatch chain is the universal primitive.What's new in v1.0.0 beyond GPU VM: - 247 stdlib modules (up from 51) - Native media codecs (PNG, JPEG, GIF, MP4/H.264 — no ffmpeg) - GUI toolkit with 15+ widgets - Terminal graphics (Kitty/Sixel) - 1,169 tests passing - Still 2.3 MB, still zero external dependenciesThe zero-dep thing is real — zero Rust crates. The binary links against vulkan-1 and system libs, nothing else. cargo audit has nothing to audit.Landing page: <a href="https://octoflow-lang.github.io/octoflow/" rel="nofollow">https://octoflow-lang.github.io/octoflow/</a> GPU VM details: <a href="https://octoflow-lang.github.io/octoflow/gpu-vm.html" rel="nofollow">https://octoflow-lang.github.io/octoflow/gpu-vm.html</a> GitHub: <a href="https://github.com/octoflow-lang/octoflow" rel="nofollow">https://github.com/octoflow-lang/octoflow</a> Download: <a href="https://github.com/octoflow-lang/octoflow/releases/latest" rel="nofollow">https://github.com/octoflow-lang/octoflow/releases/latest</a>I'm one developer. This is early. The GPU VM works and tests pass bit-exact, but there's a lot of road ahead — real LLM inference at scale, multi-agent orchestration, the full database engine. I'd love feedback from anyone who works with GPU compute, Vulkan, or language design.

展示HN：OctoFlow v1.0.0 – 一款GPU虚拟机，GPU自主运行，CPU作为BIOS。