HackerNews中文版

我开发了一个零抖动排序的动态链接库（DLL），使用Rust编写，能够在处理100万个u64键值对（u32+u32）时始终保持20毫秒的性能，无论数据分布如何。与SkaSort或其他自适应算法不同，我的实现完全独立于数据，并且对通常会导致性能下降到40-50毫秒的“毒化”数据集免疫。该方法受到FPGA的启发，针对x86-64-v3架构（AVX2/BMI2）进行实现。在我的i9-13980HX处理器上，它达到了物理L3带宽的极限（约50GB/s）。虽然纯FPGA硬件可能达到15毫秒，但这很可能是单线程CPU实现的理论极限。该实现可以在任何现代桌面计算机上运行。

查看原文

I’ve developed a zero-jitter sorting DLL (written in Rust) that consistently hits 20ms for 1M u64 KV-pairs (u32+u32), regardless of data distribution. Unlike SkaSort or other adaptive algorithms, my implementation is entirely data-independent and immune to 'poisoned' datasets that typically tank performance to 40-50ms. The approach is FPGA-inspired, implemented for x86-64-v3 (AVX2/BMI2). It hits the physical L3 bandwidth limit on my i9-13980HX (~50GB/s). While pure FPGA hardware might hit 15ms, this is likely the theoretical limit for a single-threaded CPU implementation. Works on any modern desktop.

在i9-13980HX上使用无分支的Rust实现，在20毫秒内对100万个u64键值对进行排序。