为什么我们在向量搜索中接受静默数据损坏?(x86与ARM)
我花了上周的时间在一个RAG管道中追踪一个“幽灵”,我认为我发现了一个行业普遍忽视的问题。
我们假设如果生成一个嵌入并存储它,那么“记忆”是稳定的。但我发现,f32距离计算(FAISS、Chroma等的核心)实际上充当了一个“分叉路径”。
如果你在一台x86服务器(AVX-512)和一台ARM MacBook(NEON)上运行完全相同的插入序列,内存状态在比特级别上会出现分歧。这不仅仅是“浮点噪声”,而是由于FMA(融合乘加)指令差异导致的确定性漂移。
我写了一个脚本来检查我的M3 Max和Xeon实例之间句子变换器向量的原始比特。语义相似度为0.9999,但原始存储却不同。
对于一个受监管的AI代理(金融/医疗),这简直是噩梦。这意味着你的审计轨迹在技术上是虚幻的,具体取决于哪个服务器处理了查询。你无法实现“写一次,随处运行”的索引可移植性。
解决方案(采用no_std)我感到非常沮丧,因此绕过了标准库,使用Rust编写了一个自定义内核(Valori),采用Q16.16定点算术。通过严格执行整数结合律,我在x86、ARM和WASM之间获得了100%比特相同的快照。
召回损失:可以忽略(99.8% Recall@10与标准f32相比)。
性能:延迟小于500µs(可与未优化的f32相比)。
请求/论文我已经写了一篇正式的预印本,分析这个“分叉路径”问题和Q16.16的证明。我目前正在尝试将其提交到arXiv(分布式计算/cs.DC),但我被卡在了推荐队列中。
如果你想拆解我的Rust代码: https://github.com/varshith-Git/Valori-Kernel
如果你是cs.DC(或cs.DB)的arXiv推荐人,并想查看草稿,我很乐意把它发给你。
难道只有我一个人担心在如此不稳定的数值基础上构建“可靠”的代理吗?
查看原文
I spent the last week chasing a "ghost" in a RAG pipeline and I think I’ve found something that the industry is collectively ignoring.<p>We assume that if we generate an embedding and store it, the "memory" is stable. But I found that f32 distance calculations (the backbone of FAISS, Chroma, etc.) act as a "Forking Path."<p>If you run the exact same insertion sequence on an x86 server (AVX-512) and an ARM MacBook (NEON), the memory states diverge at the bit level. It’s not just "floating point noise" it’s a deterministic drift caused by FMA (Fused Multiply-Add) instruction differences.<p>I wrote a script to inspect the raw bits of a sentence-transformers vector across my M3 Max and a Xeon instance. Semantic similarity was 0.9999, but the raw storage was different<p>For a regulated AI agent (Finance/Healthcare), this is a nightmare. It means your audit trail is technically hallucinating depending on which server processed the query. You cannot have "Write Once, Run Anywhere" index portability.<p>The Fix (Going no_std) I got so frustrated that I bypassed the standard libraries and wrote a custom kernel (Valori) in Rust using Q16.16 Fixed-Point Arithmetic. By strictly enforcing integer associativity, I got 100% bit-identical snapshots across x86, ARM, and WASM.<p>Recall Loss: Negligible (99.8% Recall@10 vs standard f32).<p>Performance: < 500µs latency (comparable to unoptimized f32).<p>The Ask / Paper I’ve written a formal preprint analyzing this "Forking Path" problem and the Q16.16 proofs. I am currently trying to submit it to arXiv (Distributed Computing / cs.DC) but I'm stuck in the endorsement queue.<p>If you want to tear apart my Rust code: https://github.com/varshith-Git/Valori-Kernel<p>If you are an arXiv endorser for cs.DC (or cs.DB) and want to see the draft, I’d love to send it to you.<p>Am I the only one worried about building "reliable" agents on such shaky numerical foundations?