Dirty Frag:针对容器和微虚拟机沙箱的内核零日漏洞

1作者: ShivamNayak11大约 1 个月前原帖
在5月7日,Hyunwoo Kim(V4bel)披露了Dirty Frag——两个Linux内核漏洞(CVE-2026-43284和CVE-2026-43500),这些漏洞使得大多数自2017年以来发布的Linux发行版中的非特权用户能够获得确定性的root权限。微软在次日确认了该漏洞的积极利用情况。 我们构建了declaw.ai——用于AI代理的沙箱基础设施,基于Firecracker微虚拟机。我们运行的是我们不编写且无法预测的不可信代码,因此当Dirty Frag被披露时,我们的第一个问题是:我们的隔离边界是否有效?我们在一个故意未打补丁的内核上进行了测试,结果证明是有效的。以下是原因。 该漏洞是一种页面缓存写入原语:它欺骗内核覆盖任何文件(/usr/bin/su,/etc/passwd)在内存中的内容,并赋予root权限。完全确定性,没有竞争条件。 对于多租户平台来说,这个问题的重要性在于:页面缓存是整个机器共享的。容器共享主机内核,而命名空间隔离、seccomp和丢弃的能力都是由该内核强制执行的。内核漏洞不需要逃离容器——它在容器隔离存在的层次之下操作。这与Dirty COW(2016年)和Dirty Pipe(2022年)存在相同的结构性问题。在零日漏洞被披露的当天,在任何补丁存在之前,所有共享该内核的基于容器的沙箱都暴露在外。打补丁只能事后关闭这个窗口,而无法提前关闭。 我们在两个环境中运行了公开的概念验证(PoC)(ESP路径,CVE-2026-43284)。 测试1——容器沙箱(Docker,seccomp开启,非特权用户uid=1001,主机内核6.8.0):非特权用户在不到2秒内获得root权限。Seccomp虽然处于激活状态,但并没有帮助——所需的系统调用被配置文件允许。获得root权限后,我们读取了/etc/shadow、主机内核启动参数和Docker overlay2路径。 测试2——Firecracker微虚拟机(未打补丁的来宾内核,无seccomp,以root身份启动,具有完全权限——故意比测试1更宽松)。该漏洞在来宾内部有效,但每次尝试访问主机都失败:主机内核不可见,主机进程不可见(来宾有自己的kthreadd/kswapd),所有主机端口关闭,只有虚拟块设备,没有主机硬件身份。它所破坏的页面缓存属于来宾自己的内核,通过EPT映射到主机内存的一个有限区域。 这种不对称性正是关键:微虚拟机的权限比容器更高,但仍然无法访问主机。重要的不是软件授予了什么权限——而是内核是否被共享。要逃离Firecracker,你需要在VMM(约5万行Rust代码)或KVM中找到一个漏洞;谷歌的kvmCTF为来宾到主机的逃逸支付25万美元,而迄今为止仅有一个被公开演示过。 如果你在多租户环境中运行不可信代码,那么对于任何隔离提供者来说,问题是:如果沙箱内的代码获得root权限,它能否访问主机或其他租户?如果答案是“只要我们打了补丁”——这就是漏洞所在。 PoC: https://github.com/V4bel/dirtyfrag 完整报告(命令+输出):https://declaw.ai/blog/dirty-frag-microvm-isolation
查看原文
On May 7, Hyunwoo Kim (V4bel) disclosed Dirty Frag — two Linux kernel vulnerabilities (CVE-2026-43284 and CVE-2026-43500) that give unprivileged users deterministic root on most Linux distributions shipped since 2017. Microsoft confirmed active exploitation the next day.<p>We build declaw.ai — sandboxing infrastructure for AI agents, on Firecracker microVMs. We run untrusted code we don&#x27;t write and can&#x27;t predict, so when Dirty Frag dropped our first question was: does our isolation boundary hold? We tested it on a deliberately <i>unpatched</i> kernel. It held. Here&#x27;s why.<p>The exploit is a page-cache write primitive: it tricks the kernel into overwriting the in-memory contents of any file (&#x2F;usr&#x2F;bin&#x2F;su, &#x2F;etc&#x2F;passwd) and gives root. Fully deterministic, no race.<p>Why it matters for multi-tenant platforms: the page cache is shared across the whole machine. Containers share the host kernel, and namespace isolation, seccomp, and dropped capabilities are all enforced <i>by</i> that kernel. A kernel exploit doesn&#x27;t need to escape the container — it operates below the layer where container isolation exists. Same structural issue as Dirty COW (2016) and Dirty Pipe (2022). On the day a zero-day drops, before any patch exists, every container-based sandbox sharing that kernel is exposed. Patching closes the window after the fact; it can&#x27;t close it in advance.<p>We ran the public PoC (ESP path, CVE-2026-43284) in two environments.<p>Test 1 — container sandbox (Docker, seccomp on, unprivileged uid=1001, host kernel 6.8.0): unprivileged user to root in under 2 seconds. Seccomp was active but didn&#x27;t help — the required syscalls were permitted by the profile. With root we read &#x2F;etc&#x2F;shadow, host kernel boot params, and Docker overlay2 paths.<p>Test 2 — Firecracker microVM (unpatched guest kernel, no seccomp, started as root with full capabilities — intentionally MORE permissive than test 1). The exploit worked <i>inside</i> the guest, but every attempt to reach the host failed: host kernel not visible, host processes invisible (the guest has its own kthreadd&#x2F;kswapd), all host ports closed, only virtual block devices, no host hardware identity. The page cache it corrupted belongs to the guest&#x27;s own kernel, mapped to a bounded region of host memory via EPT.<p>The asymmetry is the point: the microVM started with more privilege than the container and still couldn&#x27;t reach the host. What matters isn&#x27;t what permissions the software grants — it&#x27;s whether the kernel is shared. To escape Firecracker you&#x27;d need a bug in the VMM (~50K lines of Rust) or KVM; Google&#x27;s kvmCTF pays $250K for a guest-to-host escape and only one has ever been publicly demonstrated.<p>If you run untrusted code multi-tenant, the question for any isolation provider: if code inside the sandbox becomes root, can it reach the host or other tenants? If the answer is &quot;as long as we&#x27;re patched&quot; — that&#x27;s the gap.<p>PoC: https:&#x2F;&#x2F;github.com&#x2F;V4bel&#x2F;dirtyfrag Full writeup (commands + output): https:&#x2F;&#x2F;declaw.ai&#x2F;blog&#x2F;dirty-frag-microvm-isolation