展示HN:OctoFlow v1.0.0 – 一款GPU虚拟机,GPU自主运行,CPU作为BIOS。
三天前,我在这里发布了 OctoFlow 0.83(GPU 原生编程语言,2.2 MB 二进制文件)。反馈非常好。从那时起,我推出了 v1.0.0,这是我一直在努力构建的内容:一个 GPU 虚拟机。
这个想法是:GPU 是计算机,CPU 是 BIOS。
你启动一个虚拟机,编程一个内核实例的调度链,使用 vkQueueSubmit 提交一次,然后所有的操作——层执行、层间通信、自我调节、压缩、数据库查询——都在 GPU 上完成,而无需 CPU 的往返。CPU 仅提供输入/输出。
```rust
let vm = vm_boot()
let prog = vm_program(vm, kernels, 4)
vm_write_register(vm, 0, 0, input)
vm_execute(prog)
let result = vm_read_register(vm, 3, 30)
```
4 个虚拟机实例,一次提交,阶段之间没有 CPU 参与。
内存模型包含 5 个 SSBOs:寄存器(每个虚拟机的工作内存)、指标(调节信号)、全局(共享可变——KV 缓存、数据库表)、控制(间接调度参数)、堆(不可变的批量数据——量化权重)。
使其有趣的地方:
- 自我稳态调节器:每个虚拟机实例都有一个内核,监控激活规范、内存压力和吞吐量。GPU 自我调节,无需 CPU 的干预。
- GPU 自我编程:一个内核将工作组计数写入控制缓冲区,下一次 vkCmdDispatchIndirect 会读取这些计数。GPU 自行决定工作负载。
- 压缩作为计算:Q4_K 反量化、增量编码、字典查找——这些只是调度链中的内核,而不是一个特殊的子系统。添加一个新的编解码器 = 编写一个发射器。无需更改 Rust。
- CPU 轮询:指标和控制是 HOST_VISIBLE 的。CPU 可以轮询 GPU 状态,并在不重建命令缓冲区的情况下激活休眠的虚拟机。GPU 广播需求,CPU 满足这些需求。
虚拟机是与工作负载无关的。同样的架构可以处理 LLM 推理、数据库查询、物理仿真、图神经网络、数字信号处理管道和游戏 AI。我们已经验证了这六种情况。调度链是通用原语。
v1.0.0 中的新内容超越了 GPU 虚拟机:
- 247 个标准库模块(从 51 个增加)
- 原生媒体编解码器(PNG、JPEG、GIF、MP4/H.264——无需 ffmpeg)
- 带有 15 个以上小部件的 GUI 工具包
- 终端图形(Kitty/Sixel)
- 1,169 个测试通过
- 仍然是 2.3 MB,仍然没有外部依赖
零依赖的情况是真实的——没有 Rust crate。该二进制文件链接到 vulkan-1 和系统库,别无其他。cargo audit 没有任何需要审计的内容。
登陆页面:[https://octoflow-lang.github.io/octoflow/](https://octoflow-lang.github.io/octoflow/)
GPU 虚拟机详情:[https://octoflow-lang.github.io/octoflow/gpu-vm.html](https://octoflow-lang.github.io/octoflow/gpu-vm.html)
GitHub:[https://github.com/octoflow-lang/octoflow](https://github.com/octoflow-lang/octoflow)
下载:[https://github.com/octoflow-lang/octoflow/releases/latest](https://github.com/octoflow-lang/octoflow/releases/latest)
我是一名开发者。这是早期版本。GPU 虚拟机可以正常工作,测试通过且位精确,但前方还有很多路要走——真正的大规模 LLM 推理、多智能体编排、完整的数据库引擎。我希望能收到任何与 GPU 计算、Vulkan 或语言设计相关的反馈。
查看原文
Three days ago I posted OctoFlow 0.83 here (GPU-native programming language,
2.2 MB binary). The feedback was great. Since then I've pushed v1.0.0 with the
thing I've actually been building toward: a GPU Virtual Machine.<p>The idea: the GPU is the computer, the CPU is the BIOS.<p>You boot a VM, program a dispatch chain of kernel instances, submit once with
vkQueueSubmit, and everything — layer execution, inter-layer communication,
self-regulation, compression, database queries — happens on the GPU without CPU
round-trips. The CPU just provides I/O.<p><pre><code> let vm = vm_boot()
let prog = vm_program(vm, kernels, 4)
vm_write_register(vm, 0, 0, input)
vm_execute(prog)
let result = vm_read_register(vm, 3, 30)
</code></pre>
4 VM instances, one submit, no CPU involvement between stages.<p>The memory model is 5 SSBOs: Registers (per-VM working memory), Metrics
(regulator signals), Globals (shared mutable — KV cache, DB tables), Control
(indirect dispatch params), Heap (immutable bulk data — quantized weights).<p>What makes it interesting:<p>- Homeostasis regulator: each VM instance has a kernel that monitors activation
norms, memory pressure, throughput. The GPU self-regulates without asking the
CPU.<p>- GPU self-programming: a kernel writes workgroup counts to the Control buffer,
the next vkCmdDispatchIndirect reads them. The GPU decides its own workload.<p>- Compression as computation: Q4_K dequantization, delta encoding, dictionary
lookup — these are just kernels in the dispatch chain, not a special subsystem.
Adding a new codec = writing an emitter. No Rust changes.<p>- CPU polling: Metrics and Control are HOST_VISIBLE. CPU can poll GPU state and
activate dormant VMs without rebuilding the command buffer. The GPU broadcasts
needs, the CPU fulfills them.<p>The VM is workload-agnostic. Same architecture handles LLM inference, database
queries, physics sims, graph neural networks, DSP pipelines, and game AI. We've
validated all six. The dispatch chain is the universal primitive.<p>What's new in v1.0.0 beyond GPU VM:
- 247 stdlib modules (up from 51)
- Native media codecs (PNG, JPEG, GIF, MP4/H.264 — no ffmpeg)
- GUI toolkit with 15+ widgets
- Terminal graphics (Kitty/Sixel)
- 1,169 tests passing
- Still 2.3 MB, still zero external dependencies<p>The zero-dep thing is real — zero Rust crates. The binary links against vulkan-1
and system libs, nothing else. cargo audit has nothing to audit.<p>Landing page: <a href="https://octoflow-lang.github.io/octoflow/" rel="nofollow">https://octoflow-lang.github.io/octoflow/</a>
GPU VM details: <a href="https://octoflow-lang.github.io/octoflow/gpu-vm.html" rel="nofollow">https://octoflow-lang.github.io/octoflow/gpu-vm.html</a>
GitHub: <a href="https://github.com/octoflow-lang/octoflow" rel="nofollow">https://github.com/octoflow-lang/octoflow</a>
Download: <a href="https://github.com/octoflow-lang/octoflow/releases/latest" rel="nofollow">https://github.com/octoflow-lang/octoflow/releases/latest</a><p>I'm one developer. This is early. The GPU VM works and tests pass bit-exact, but
there's a lot of road ahead — real LLM inference at scale, multi-agent
orchestration, the full database engine. I'd love feedback from anyone who works
with GPU compute, Vulkan, or language design.