HackerNews中文版

我花了上周的时间在一个RAG管道中追踪一个“幽灵”，我认为我发现了一个行业普遍忽视的问题。我们假设如果生成一个嵌入并存储它，那么“记忆”是稳定的。但我发现，f32距离计算（FAISS、Chroma等的核心）实际上充当了一个“分叉路径”。如果你在一台x86服务器（AVX-512）和一台ARM MacBook（NEON）上运行完全相同的插入序列，内存状态在比特级别上会出现分歧。这不仅仅是“浮点噪声”，而是由于FMA（融合乘加）指令差异导致的确定性漂移。我写了一个脚本来检查我的M3 Max和Xeon实例之间句子变换器向量的原始比特。语义相似度为0.9999，但原始存储却不同。对于一个受监管的AI代理（金融/医疗），这简直是噩梦。这意味着你的审计轨迹在技术上是虚幻的，具体取决于哪个服务器处理了查询。你无法实现“写一次，随处运行”的索引可移植性。解决方案（采用no_std）我感到非常沮丧，因此绕过了标准库，使用Rust编写了一个自定义内核（Valori），采用Q16.16定点算术。通过严格执行整数结合律，我在x86、ARM和WASM之间获得了100%比特相同的快照。召回损失：可以忽略（99.8% Recall@10与标准f32相比）。性能：延迟小于500µs（可与未优化的f32相比）。请求/论文我已经写了一篇正式的预印本，分析这个“分叉路径”问题和Q16.16的证明。我目前正在尝试将其提交到arXiv（分布式计算/cs.DC），但我被卡在了推荐队列中。如果你想拆解我的Rust代码： https://github.com/varshith-Git/Valori-Kernel 如果你是cs.DC（或cs.DB）的arXiv推荐人，并想查看草稿，我很乐意把它发给你。难道只有我一个人担心在如此不稳定的数值基础上构建“可靠”的代理吗？

查看原文

I spent the last week chasing a "ghost" in a RAG pipeline and I think I’ve found something that the industry is collectively ignoring.We assume that if we generate an embedding and store it, the "memory" is stable. But I found that f32 distance calculations (the backbone of FAISS, Chroma, etc.) act as a "Forking Path."If you run the exact same insertion sequence on an x86 server (AVX-512) and an ARM MacBook (NEON), the memory states diverge at the bit level. It’s not just "floating point noise" it’s a deterministic drift caused by FMA (Fused Multiply-Add) instruction differences.I wrote a script to inspect the raw bits of a sentence-transformers vector across my M3 Max and a Xeon instance. Semantic similarity was 0.9999, but the raw storage was differentFor a regulated AI agent (Finance/Healthcare), this is a nightmare. It means your audit trail is technically hallucinating depending on which server processed the query. You cannot have "Write Once, Run Anywhere" index portability.The Fix (Going no_std) I got so frustrated that I bypassed the standard libraries and wrote a custom kernel (Valori) in Rust using Q16.16 Fixed-Point Arithmetic. By strictly enforcing integer associativity, I got 100% bit-identical snapshots across x86, ARM, and WASM.Recall Loss: Negligible (99.8% Recall@10 vs standard f32).Performance: < 500µs latency (comparable to unoptimized f32).The Ask / Paper I’ve written a formal preprint analyzing this "Forking Path" problem and the Q16.16 proofs. I am currently trying to submit it to arXiv (Distributed Computing / cs.DC) but I'm stuck in the endorsement queue.If you want to tear apart my Rust code: https://github.com/varshith-Git/Valori-KernelIf you are an arXiv endorser for cs.DC (or cs.DB) and want to see the draft, I’d love to send it to you.Am I the only one worried about building "reliable" agents on such shaky numerical foundations?

为什么我们在向量搜索中接受静默数据损坏？（x86与ARM）