Dirty Frag:针对容器和微虚拟机沙箱的内核零日漏洞
在5月7日,Hyunwoo Kim(V4bel)披露了Dirty Frag——两个Linux内核漏洞(CVE-2026-43284和CVE-2026-43500),这些漏洞使得大多数自2017年以来发布的Linux发行版中的非特权用户能够获得确定性的root权限。微软在次日确认了该漏洞的积极利用情况。
我们构建了declaw.ai——用于AI代理的沙箱基础设施,基于Firecracker微虚拟机。我们运行的是我们不编写且无法预测的不可信代码,因此当Dirty Frag被披露时,我们的第一个问题是:我们的隔离边界是否有效?我们在一个故意未打补丁的内核上进行了测试,结果证明是有效的。以下是原因。
该漏洞是一种页面缓存写入原语:它欺骗内核覆盖任何文件(/usr/bin/su,/etc/passwd)在内存中的内容,并赋予root权限。完全确定性,没有竞争条件。
对于多租户平台来说,这个问题的重要性在于:页面缓存是整个机器共享的。容器共享主机内核,而命名空间隔离、seccomp和丢弃的能力都是由该内核强制执行的。内核漏洞不需要逃离容器——它在容器隔离存在的层次之下操作。这与Dirty COW(2016年)和Dirty Pipe(2022年)存在相同的结构性问题。在零日漏洞被披露的当天,在任何补丁存在之前,所有共享该内核的基于容器的沙箱都暴露在外。打补丁只能事后关闭这个窗口,而无法提前关闭。
我们在两个环境中运行了公开的概念验证(PoC)(ESP路径,CVE-2026-43284)。
测试1——容器沙箱(Docker,seccomp开启,非特权用户uid=1001,主机内核6.8.0):非特权用户在不到2秒内获得root权限。Seccomp虽然处于激活状态,但并没有帮助——所需的系统调用被配置文件允许。获得root权限后,我们读取了/etc/shadow、主机内核启动参数和Docker overlay2路径。
测试2——Firecracker微虚拟机(未打补丁的来宾内核,无seccomp,以root身份启动,具有完全权限——故意比测试1更宽松)。该漏洞在来宾内部有效,但每次尝试访问主机都失败:主机内核不可见,主机进程不可见(来宾有自己的kthreadd/kswapd),所有主机端口关闭,只有虚拟块设备,没有主机硬件身份。它所破坏的页面缓存属于来宾自己的内核,通过EPT映射到主机内存的一个有限区域。
这种不对称性正是关键:微虚拟机的权限比容器更高,但仍然无法访问主机。重要的不是软件授予了什么权限——而是内核是否被共享。要逃离Firecracker,你需要在VMM(约5万行Rust代码)或KVM中找到一个漏洞;谷歌的kvmCTF为来宾到主机的逃逸支付25万美元,而迄今为止仅有一个被公开演示过。
如果你在多租户环境中运行不可信代码,那么对于任何隔离提供者来说,问题是:如果沙箱内的代码获得root权限,它能否访问主机或其他租户?如果答案是“只要我们打了补丁”——这就是漏洞所在。
PoC: https://github.com/V4bel/dirtyfrag
完整报告(命令+输出):https://declaw.ai/blog/dirty-frag-microvm-isolation
查看原文
On May 7, Hyunwoo Kim (V4bel) disclosed Dirty Frag — two Linux kernel
vulnerabilities (CVE-2026-43284 and CVE-2026-43500) that give unprivileged
users deterministic root on most Linux distributions shipped since 2017.
Microsoft confirmed active exploitation the next day.<p>We build declaw.ai — sandboxing infrastructure for AI agents, on Firecracker
microVMs. We run untrusted code we don't write and can't predict, so when
Dirty Frag dropped our first question was: does our isolation boundary hold?
We tested it on a deliberately <i>unpatched</i> kernel. It held. Here's why.<p>The exploit is a page-cache write primitive: it tricks the kernel into
overwriting the in-memory contents of any file (/usr/bin/su, /etc/passwd) and
gives root. Fully deterministic, no race.<p>Why it matters for multi-tenant platforms: the page cache is shared across
the whole machine. Containers share the host kernel, and namespace isolation,
seccomp, and dropped capabilities are all enforced <i>by</i> that kernel. A kernel
exploit doesn't need to escape the container — it operates below the layer
where container isolation exists. Same structural issue as Dirty COW (2016)
and Dirty Pipe (2022). On the day a zero-day drops, before any patch exists,
every container-based sandbox sharing that kernel is exposed. Patching closes
the window after the fact; it can't close it in advance.<p>We ran the public PoC (ESP path, CVE-2026-43284) in two environments.<p>Test 1 — container sandbox (Docker, seccomp on, unprivileged uid=1001, host
kernel 6.8.0): unprivileged user to root in under 2 seconds. Seccomp was
active but didn't help — the required syscalls were permitted by the profile.
With root we read /etc/shadow, host kernel boot params, and Docker overlay2
paths.<p>Test 2 — Firecracker microVM (unpatched guest kernel, no seccomp, started as
root with full capabilities — intentionally MORE permissive than test 1). The
exploit worked <i>inside</i> the guest, but every attempt to reach the host
failed: host kernel not visible, host processes invisible (the guest has its
own kthreadd/kswapd), all host ports closed, only virtual block devices, no
host hardware identity. The page cache it corrupted belongs to the guest's
own kernel, mapped to a bounded region of host memory via EPT.<p>The asymmetry is the point: the microVM started with more privilege than the
container and still couldn't reach the host. What matters isn't what
permissions the software grants — it's whether the kernel is shared. To
escape Firecracker you'd need a bug in the VMM (~50K lines of Rust) or KVM;
Google's kvmCTF pays $250K for a guest-to-host escape and only one has ever
been publicly demonstrated.<p>If you run untrusted code multi-tenant, the question for any isolation
provider: if code inside the sandbox becomes root, can it reach the host or
other tenants? If the answer is "as long as we're patched" — that's the gap.<p>PoC: https://github.com/V4bel/dirtyfrag
Full writeup (commands + output): https://declaw.ai/blog/dirty-frag-microvm-isolation