展示HN:一种使用向量检索和重排序的生产级推荐系统
我一直在探索推荐系统在实际生产中的实现方式,不仅仅是训练模型。<p>我发现一个常见的模式是将问题分为两个阶段:<p>1. 检索一小部分相关候选项<p>2. 使用模型对它们进行重新排序<p>与其对所有项目进行暴力推理,我围绕这个想法构建了一个小原型。<p>流程如下:<p>- 在向量数据库(ChromaDB)中存储嵌入<p>- 根据向量相似性检索最相似的前K个项目/用户<p>- 运行TensorFlow.js模型对候选项进行重新排序<p>目标是在应用推理之前减少搜索空间,这在延迟和规模重要时似乎是必要的。<p>我发现有趣的是,一旦你采用这种方法,很多复杂性就从模型本身转移到了检索层:<p>- 选择K值<p>- 过滤候选项<p>- 嵌入质量<p>- 延迟与召回的权衡<p>我很好奇其他人在实际系统中是如何处理这些问题的:<p>- 你是如何决定K值的?<p>- 你是完全依赖向量相似性,还是添加启发式方法?<p>- 你是如何在大规模下处理重新排序的?<p>项目:
<a href="https://github.com/ftonato/recommendation-system-chromadb-tfjs" rel="nofollow">https://github.com/ftonato/recommendation-system-chromadb-tf...</a>
查看原文
I’ve been exploring how recommendation systems are actually implemented in production, beyond just training models.<p>A common pattern I kept seeing is to split the problem into two stages:<p>1. Retrieve a small set of relevant candidates<p>2. Re-rank them using a model<p>Instead of doing brute-force inference across all items, I built a small prototype around this idea.<p>The flow looks like this:<p>- Store embeddings in a vector database (ChromaDB)<p>- Retrieve the Top-K most similar items/users based on vector similarity<p>- Run a TensorFlow.js model to re-rank the candidates<p>The goal is to reduce the search space before applying inference, which seems necessary when latency and scale matter.<p>What I found interesting is that once you move to this approach, a lot of the complexity shifts from the model itself to the retrieval layer:<p>- choosing K<p>- filtering candidates<p>- embedding quality<p>- latency vs recall trade-offs<p>Curious how others approach this in real systems:<p>- How do you decide on K?<p>- Do you rely purely on vector similarity or add heuristics?<p>- How do you handle re-ranking at scale?<p>Project:
<a href="https://github.com/ftonato/recommendation-system-chromadb-tfjs" rel="nofollow">https://github.com/ftonato/recommendation-system-chromadb-tf...</a>