Waypoint 1.1,一个以本地优先为理念的交互式仿真世界模型

6作者: lcastricato大约 4 小时前原帖
在过去几周,世界模型首次开始显得真实。你可以看到连贯的环境、长时间的展开,以及越来越令人信服的视觉效果。同时,这些系统大多数难以运行,难以集成,并且在规模与交互性之间进行了权衡。 我们创立Overworld是因为我们更关注构建可以实际栖息的世界,而不是制作令人印象深刻的视频。这意味着低延迟、持续控制,以及每次你采取行动时系统都会做出响应,而不是每次只在提示时响应。 上周,我们发布了Waypoint 1,这是一个实时扩散世界模型的研究预览,能够在本地运行。下周,我们将发布Waypoint 1.1 Small,旨在能够在现代消费级GPU上运行,并且易于构建和修改。 Waypoint是从零开始构建的,而不是从大型视频模型微调而来。我们在控制频率、稀疏注意力和快速推理方面进行了大量优化,以便系统能够维持持久的世界状态,并以游戏级帧率响应输入。我们的目标是创造一些开发者今天就能集成的东西,而不仅仅是作为演示观看。 我们认为,一旦世界模型遵循类似于大型语言模型(LLMs)的路径:本地执行、开放工具和快速的社区驱动迭代,这个领域将会快速发展。Genie和类似系统展示了在大规模下的可能性。我们的重点是让这个未来变得本地化和可接触。 我们在最近的博客文章中详细讨论了“沉浸差距”,为什么交互性比单纯的视觉效果更重要,以及我们如何优化模型。 代码、演示和发布详情请访问:https://over.world/blog/the-immersion-gap
查看原文
Over the last few weeks, world models have started to feel real for the first time. You can see coherent environments, long rollouts, and increasingly convincing visuals. At the same time, most of these systems are hard to run, hard to integrate, and trade interactivity for scale.<p>We started Overworld because we cared less about producing impressive videos and more about building worlds you can actually inhabit. That means low latency, continuous control, and systems that respond every time you act, not once per prompt.<p>Last week, we released Waypoint 1, a research preview of a real-time diffusion world model that runs locally. Next week, we’re releasing Waypoint 1.1 Small, which is designed to run on modern consumer GPUs and be easy to build on and modify.<p>Waypoint is built from scratch rather than fine-tuned from a large video model. We optimized heavily for control frequency, sparse attention, and fast inference so the system can maintain a persistent world state and respond to input at game-level frame rates. The goal was to make something developers can integrate today, not just watch as a demo.<p>We think this space will move fastest once world models follow a path similar to LLMs: local execution, open tooling, and fast community-driven iteration. Genie and similar systems show what’s possible at a massive scale. Our focus has been on making that future local and accessible.<p>We wrote more about the “immersion gap,” why interactivity matters more than visuals alone, and how we optimized the model in a recent blog post.<p>Code, demos, and release details are here: https:&#x2F;&#x2F;over.world&#x2F;blog&#x2F;the-immersion-gap