HackerNews中文版

我一直在探索推荐系统在实际生产中的实现方式，不仅仅是训练模型。我发现一个常见的模式是将问题分为两个阶段：1. 检索一小部分相关候选项2. 使用模型对它们进行重新排序与其对所有项目进行暴力推理，我围绕这个想法构建了一个小原型。流程如下：- 在向量数据库（ChromaDB）中存储嵌入- 根据向量相似性检索最相似的前K个项目/用户- 运行TensorFlow.js模型对候选项进行重新排序目标是在应用推理之前减少搜索空间，这在延迟和规模重要时似乎是必要的。我发现有趣的是，一旦你采用这种方法，很多复杂性就从模型本身转移到了检索层：- 选择K值- 过滤候选项- 嵌入质量- 延迟与召回的权衡我很好奇其他人在实际系统中是如何处理这些问题的：- 你是如何决定K值的？- 你是完全依赖向量相似性，还是添加启发式方法？- 你是如何在大规模下处理重新排序的？项目： <a href="https://github.com/ftonato/recommendation-system-chromadb-tfjs" rel="nofollow">https://github.com/ftonato/recommendation-system-chromadb-tf...</a>

查看原文

I’ve been exploring how recommendation systems are actually implemented in production, beyond just training models.A common pattern I kept seeing is to split the problem into two stages:1. Retrieve a small set of relevant candidates2. Re-rank them using a modelInstead of doing brute-force inference across all items, I built a small prototype around this idea.The flow looks like this:- Store embeddings in a vector database (ChromaDB)- Retrieve the Top-K most similar items/users based on vector similarity- Run a TensorFlow.js model to re-rank the candidatesThe goal is to reduce the search space before applying inference, which seems necessary when latency and scale matter.What I found interesting is that once you move to this approach, a lot of the complexity shifts from the model itself to the retrieval layer:- choosing K- filtering candidates- embedding quality- latency vs recall trade-offsCurious how others approach this in real systems:- How do you decide on K?- Do you rely purely on vector similarity or add heuristics?- How do you handle re-ranking at scale?Project: <a href="https://github.com/ftonato/recommendation-system-chromadb-tfjs" rel="nofollow">https://github.com/ftonato/recommendation-system-chromadb-tf...</a>

展示HN：一种使用向量检索和重排序的生产级推荐系统