HackerNews中文版

在5月7日，Hyunwoo Kim（V4bel）披露了Dirty Frag——两个Linux内核漏洞（CVE-2026-43284和CVE-2026-43500），这些漏洞使得大多数自2017年以来发布的Linux发行版中的非特权用户能够获得确定性的root权限。微软在次日确认了该漏洞的积极利用情况。我们构建了declaw.ai——用于AI代理的沙箱基础设施，基于Firecracker微虚拟机。我们运行的是我们不编写且无法预测的不可信代码，因此当Dirty Frag被披露时，我们的第一个问题是：我们的隔离边界是否有效？我们在一个故意未打补丁的内核上进行了测试，结果证明是有效的。以下是原因。该漏洞是一种页面缓存写入原语：它欺骗内核覆盖任何文件（/usr/bin/su，/etc/passwd）在内存中的内容，并赋予root权限。完全确定性，没有竞争条件。对于多租户平台来说，这个问题的重要性在于：页面缓存是整个机器共享的。容器共享主机内核，而命名空间隔离、seccomp和丢弃的能力都是由该内核强制执行的。内核漏洞不需要逃离容器——它在容器隔离存在的层次之下操作。这与Dirty COW（2016年）和Dirty Pipe（2022年）存在相同的结构性问题。在零日漏洞被披露的当天，在任何补丁存在之前，所有共享该内核的基于容器的沙箱都暴露在外。打补丁只能事后关闭这个窗口，而无法提前关闭。我们在两个环境中运行了公开的概念验证（PoC）（ESP路径，CVE-2026-43284）。测试1——容器沙箱（Docker，seccomp开启，非特权用户uid=1001，主机内核6.8.0）：非特权用户在不到2秒内获得root权限。Seccomp虽然处于激活状态，但并没有帮助——所需的系统调用被配置文件允许。获得root权限后，我们读取了/etc/shadow、主机内核启动参数和Docker overlay2路径。测试2——Firecracker微虚拟机（未打补丁的来宾内核，无seccomp，以root身份启动，具有完全权限——故意比测试1更宽松）。该漏洞在来宾内部有效，但每次尝试访问主机都失败：主机内核不可见，主机进程不可见（来宾有自己的kthreadd/kswapd），所有主机端口关闭，只有虚拟块设备，没有主机硬件身份。它所破坏的页面缓存属于来宾自己的内核，通过EPT映射到主机内存的一个有限区域。这种不对称性正是关键：微虚拟机的权限比容器更高，但仍然无法访问主机。重要的不是软件授予了什么权限——而是内核是否被共享。要逃离Firecracker，你需要在VMM（约5万行Rust代码）或KVM中找到一个漏洞；谷歌的kvmCTF为来宾到主机的逃逸支付25万美元，而迄今为止仅有一个被公开演示过。如果你在多租户环境中运行不可信代码，那么对于任何隔离提供者来说，问题是：如果沙箱内的代码获得root权限，它能否访问主机或其他租户？如果答案是“只要我们打了补丁”——这就是漏洞所在。 PoC: https://github.com/V4bel/dirtyfrag 完整报告（命令+输出）：https://declaw.ai/blog/dirty-frag-microvm-isolation

查看原文

On May 7, Hyunwoo Kim (V4bel) disclosed Dirty Frag — two Linux kernel vulnerabilities (CVE-2026-43284 and CVE-2026-43500) that give unprivileged users deterministic root on most Linux distributions shipped since 2017. Microsoft confirmed active exploitation the next day.We build declaw.ai — sandboxing infrastructure for AI agents, on Firecracker microVMs. We run untrusted code we don't write and can't predict, so when Dirty Frag dropped our first question was: does our isolation boundary hold? We tested it on a deliberately unpatched kernel. It held. Here's why.The exploit is a page-cache write primitive: it tricks the kernel into overwriting the in-memory contents of any file (/usr/bin/su, /etc/passwd) and gives root. Fully deterministic, no race.Why it matters for multi-tenant platforms: the page cache is shared across the whole machine. Containers share the host kernel, and namespace isolation, seccomp, and dropped capabilities are all enforced by that kernel. A kernel exploit doesn't need to escape the container — it operates below the layer where container isolation exists. Same structural issue as Dirty COW (2016) and Dirty Pipe (2022). On the day a zero-day drops, before any patch exists, every container-based sandbox sharing that kernel is exposed. Patching closes the window after the fact; it can't close it in advance.We ran the public PoC (ESP path, CVE-2026-43284) in two environments.Test 1 — container sandbox (Docker, seccomp on, unprivileged uid=1001, host kernel 6.8.0): unprivileged user to root in under 2 seconds. Seccomp was active but didn't help — the required syscalls were permitted by the profile. With root we read /etc/shadow, host kernel boot params, and Docker overlay2 paths.Test 2 — Firecracker microVM (unpatched guest kernel, no seccomp, started as root with full capabilities — intentionally MORE permissive than test 1). The exploit worked inside the guest, but every attempt to reach the host failed: host kernel not visible, host processes invisible (the guest has its own kthreadd/kswapd), all host ports closed, only virtual block devices, no host hardware identity. The page cache it corrupted belongs to the guest's own kernel, mapped to a bounded region of host memory via EPT.The asymmetry is the point: the microVM started with more privilege than the container and still couldn't reach the host. What matters isn't what permissions the software grants — it's whether the kernel is shared. To escape Firecracker you'd need a bug in the VMM (~50K lines of Rust) or KVM; Google's kvmCTF pays $250K for a guest-to-host escape and only one has ever been publicly demonstrated.If you run untrusted code multi-tenant, the question for any isolation provider: if code inside the sandbox becomes root, can it reach the host or other tenants? If the answer is "as long as we're patched" — that's the gap.PoC: https://github.com/V4bel/dirtyfrag Full writeup (commands + output): https://declaw.ai/blog/dirty-frag-microvm-isolation

Dirty Frag：针对容器和微虚拟机沙箱的内核零日漏洞