展示HN:OctoFlow v1.0.0 – 一款GPU虚拟机,GPU自主运行,CPU作为BIOS。

1作者: mr_octopus大约 1 个月前原帖
三天前,我在这里发布了 OctoFlow 0.83(GPU 原生编程语言,2.2 MB 二进制文件)。反馈非常好。从那时起,我推出了 v1.0.0,这是我一直在努力构建的内容:一个 GPU 虚拟机。 这个想法是:GPU 是计算机,CPU 是 BIOS。 你启动一个虚拟机,编程一个内核实例的调度链,使用 vkQueueSubmit 提交一次,然后所有的操作——层执行、层间通信、自我调节、压缩、数据库查询——都在 GPU 上完成,而无需 CPU 的往返。CPU 仅提供输入/输出。 ```rust let vm = vm_boot() let prog = vm_program(vm, kernels, 4) vm_write_register(vm, 0, 0, input) vm_execute(prog) let result = vm_read_register(vm, 3, 30) ``` 4 个虚拟机实例,一次提交,阶段之间没有 CPU 参与。 内存模型包含 5 个 SSBOs:寄存器(每个虚拟机的工作内存)、指标(调节信号)、全局(共享可变——KV 缓存、数据库表)、控制(间接调度参数)、堆(不可变的批量数据——量化权重)。 使其有趣的地方: - 自我稳态调节器:每个虚拟机实例都有一个内核,监控激活规范、内存压力和吞吐量。GPU 自我调节,无需 CPU 的干预。 - GPU 自我编程:一个内核将工作组计数写入控制缓冲区,下一次 vkCmdDispatchIndirect 会读取这些计数。GPU 自行决定工作负载。 - 压缩作为计算:Q4_K 反量化、增量编码、字典查找——这些只是调度链中的内核,而不是一个特殊的子系统。添加一个新的编解码器 = 编写一个发射器。无需更改 Rust。 - CPU 轮询:指标和控制是 HOST_VISIBLE 的。CPU 可以轮询 GPU 状态,并在不重建命令缓冲区的情况下激活休眠的虚拟机。GPU 广播需求,CPU 满足这些需求。 虚拟机是与工作负载无关的。同样的架构可以处理 LLM 推理、数据库查询、物理仿真、图神经网络、数字信号处理管道和游戏 AI。我们已经验证了这六种情况。调度链是通用原语。 v1.0.0 中的新内容超越了 GPU 虚拟机: - 247 个标准库模块(从 51 个增加) - 原生媒体编解码器(PNG、JPEG、GIF、MP4/H.264——无需 ffmpeg) - 带有 15 个以上小部件的 GUI 工具包 - 终端图形(Kitty/Sixel) - 1,169 个测试通过 - 仍然是 2.3 MB,仍然没有外部依赖 零依赖的情况是真实的——没有 Rust crate。该二进制文件链接到 vulkan-1 和系统库,别无其他。cargo audit 没有任何需要审计的内容。 登陆页面:[https://octoflow-lang.github.io/octoflow/](https://octoflow-lang.github.io/octoflow/) GPU 虚拟机详情:[https://octoflow-lang.github.io/octoflow/gpu-vm.html](https://octoflow-lang.github.io/octoflow/gpu-vm.html) GitHub:[https://github.com/octoflow-lang/octoflow](https://github.com/octoflow-lang/octoflow) 下载:[https://github.com/octoflow-lang/octoflow/releases/latest](https://github.com/octoflow-lang/octoflow/releases/latest) 我是一名开发者。这是早期版本。GPU 虚拟机可以正常工作,测试通过且位精确,但前方还有很多路要走——真正的大规模 LLM 推理、多智能体编排、完整的数据库引擎。我希望能收到任何与 GPU 计算、Vulkan 或语言设计相关的反馈。
查看原文
Three days ago I posted OctoFlow 0.83 here (GPU-native programming language, 2.2 MB binary). The feedback was great. Since then I&#x27;ve pushed v1.0.0 with the thing I&#x27;ve actually been building toward: a GPU Virtual Machine.<p>The idea: the GPU is the computer, the CPU is the BIOS.<p>You boot a VM, program a dispatch chain of kernel instances, submit once with vkQueueSubmit, and everything — layer execution, inter-layer communication, self-regulation, compression, database queries — happens on the GPU without CPU round-trips. The CPU just provides I&#x2F;O.<p><pre><code> let vm = vm_boot() let prog = vm_program(vm, kernels, 4) vm_write_register(vm, 0, 0, input) vm_execute(prog) let result = vm_read_register(vm, 3, 30) </code></pre> 4 VM instances, one submit, no CPU involvement between stages.<p>The memory model is 5 SSBOs: Registers (per-VM working memory), Metrics (regulator signals), Globals (shared mutable — KV cache, DB tables), Control (indirect dispatch params), Heap (immutable bulk data — quantized weights).<p>What makes it interesting:<p>- Homeostasis regulator: each VM instance has a kernel that monitors activation norms, memory pressure, throughput. The GPU self-regulates without asking the CPU.<p>- GPU self-programming: a kernel writes workgroup counts to the Control buffer, the next vkCmdDispatchIndirect reads them. The GPU decides its own workload.<p>- Compression as computation: Q4_K dequantization, delta encoding, dictionary lookup — these are just kernels in the dispatch chain, not a special subsystem. Adding a new codec = writing an emitter. No Rust changes.<p>- CPU polling: Metrics and Control are HOST_VISIBLE. CPU can poll GPU state and activate dormant VMs without rebuilding the command buffer. The GPU broadcasts needs, the CPU fulfills them.<p>The VM is workload-agnostic. Same architecture handles LLM inference, database queries, physics sims, graph neural networks, DSP pipelines, and game AI. We&#x27;ve validated all six. The dispatch chain is the universal primitive.<p>What&#x27;s new in v1.0.0 beyond GPU VM: - 247 stdlib modules (up from 51) - Native media codecs (PNG, JPEG, GIF, MP4&#x2F;H.264 — no ffmpeg) - GUI toolkit with 15+ widgets - Terminal graphics (Kitty&#x2F;Sixel) - 1,169 tests passing - Still 2.3 MB, still zero external dependencies<p>The zero-dep thing is real — zero Rust crates. The binary links against vulkan-1 and system libs, nothing else. cargo audit has nothing to audit.<p>Landing page: <a href="https:&#x2F;&#x2F;octoflow-lang.github.io&#x2F;octoflow&#x2F;" rel="nofollow">https:&#x2F;&#x2F;octoflow-lang.github.io&#x2F;octoflow&#x2F;</a> GPU VM details: <a href="https:&#x2F;&#x2F;octoflow-lang.github.io&#x2F;octoflow&#x2F;gpu-vm.html" rel="nofollow">https:&#x2F;&#x2F;octoflow-lang.github.io&#x2F;octoflow&#x2F;gpu-vm.html</a> GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;octoflow-lang&#x2F;octoflow" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;octoflow-lang&#x2F;octoflow</a> Download: <a href="https:&#x2F;&#x2F;github.com&#x2F;octoflow-lang&#x2F;octoflow&#x2F;releases&#x2F;latest" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;octoflow-lang&#x2F;octoflow&#x2F;releases&#x2F;latest</a><p>I&#x27;m one developer. This is early. The GPU VM works and tests pass bit-exact, but there&#x27;s a lot of road ahead — real LLM inference at scale, multi-agent orchestration, the full database engine. I&#x27;d love feedback from anyone who works with GPU compute, Vulkan, or language design.