HackerNews中文版

我们分享了KernelEvolve，这是我们在Meta构建的一个自主系统，旨在自动生成和演化高性能内核，适用于异构AI加速器。其核心动机在于，现代AI堆栈越来越依赖手动优化的内核（如GEMM、注意力机制、归约、融合操作），但为每个硬件目标（如NVIDIA GPU、AMD GPU、自定义加速器如MTIA）编写和调整这些内核并不具备可扩展性。 KernelEvolve将内核编程视为一个搜索与演化问题： - 一个大型语言模型（LLM）生成候选内核（例如，类似Triton的代码） - 内核在真实硬件上进行编译、基准测试和验证 - 性能反馈用于在多次迭代中演化出更好的变体 - 系统在大规模集群和多种加速器类型上扩展评估与一次性代码生成不同，KernelEvolve通过闭环反馈、硬件在环的方式持续改进内核，并能够发现一些不明显的优化，这些优化的性能可以与专家编写的代码相媲美或超越。在论文中，我们描述了： - 代理架构和搜索空间设计 - 如何在异构加速器上高效扩展内核评估 - 案例研究，展示相较于手动调优基线的性能提升 - 在生产机器学习工作负载中部署该系统的实际经验教训论文（arXiv）：https://arxiv.org/abs/2512.23236（66页） LinkedIn：https://www.linkedin.com/posts/gangliao_excited-to-share-our-recent-work-on-kernelevolve-activity-7411781675740897280-AQth?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAzsrfsBRed-BvPAGqq9FgvVZ-v6F-sG4SM 我们非常希望能收到从事编译器、内核、机器学习系统或自主代码生成方法的朋友们的反馈。

查看原文

We’re sharing KernelEvolve, an agentic system we built at Meta to automatically generate and evolve high-performance kernels across heterogeneous AI accelerators.The core motivation is that modern AI stacks increasingly depend on hand-optimized kernels (GEMM, attention, reductions, fused ops), but writing and tuning them for each hardware target (NVIDIA GPUs, AMD GPUs, custom accelerators like MTIA) does not scale.KernelEvolve treats kernel programming as a search + evolution problem:• An LLM generates candidate kernels (e.g., Triton-like code) • Kernels are compiled, benchmarked, and validated on real hardware • Performance feedback is used to evolve better variants over many iterations • The system scales evaluation across large fleets and multiple accelerator typesUnlike one-shot code generation, KernelEvolve continuously improves kernels using closed-loop, hardware-in-the-loop feedback, and can discover non-obvious optimizations that rival or exceed expert-written code.In the paper we describe:• The agent architecture and search space design • How we scale kernel evaluation efficiently across heterogeneous accelerators • Case studies showing performance gains over hand-tuned baselines • Practical lessons from deploying this system in production ML workloadsPaper (arXiv): https://arxiv.org/abs/2512.23236 (66 pages)LinkedIn: https://www.linkedin.com/posts/gangliao_excited-to-share-our-recent-work-on-kernelevolve-activity-7411781675740897280-AQth?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAzsrfsBRed-BvPAGqq9FgvVZ-v6F-sG4SMWe’d love feedback from folks working on compilers, kernels, ML systems, or agentic approaches to code generation.

KernelEvolve：用于异构AI加速器的自主内核编码（Meta）