展示HN:我抓取了Reddit数据,以寻找最具争议的厨师刀

1作者: p-s-v27 天前原帖
我想量化在 r/chefknives 上关于“我应该买哪把刀”的无尽争论,因此我建立了一个数据分析管道,以获得一些真实的答案。 该项目是一个由 Node.js 构建的五阶段系统。它首先使用 Fuse.js 对约 450 个已知品牌和约 8,700 个型号进行快速、容错的模糊匹配。剩余的文本随后通过 LLM(通过 OpenRouter)进行处理,以发现新的、未知的实体,并对每次提及进行情感分析。我在超过 1,000 个主题上运行了该系统,总计超过 25,000 条评论。 一些有趣的发现: - 黑马:经济实惠的 Tojiro 拥有高达 27:1 的正面与负面提及比例。 - 争议之王:Shun 无疑是最具争议的品牌,引发了强烈的爱与恨讨论(59 条正面提及 vs. 24 条负面提及)。 - 不受欢迎:Dalstrong 是少数几个负面提及超过正面提及的品牌之一。 该系统并不完美——我在报告中坦诚存在一个关键的实体聚合错误。完整的技术架构、结果和原始数据均可获取。 我在这里回答任何问题! 博客文章(完整故事及可视化):[https://new.knife.day/blog/we-analyzed-25000-reddit-comments-to-find-most-loved-and-hated-chef-knives](https://new.knife.day/blog/we-analyzed-25000-reddit-comments-to-find-most-loved-and-hated-chef-knives) GitHub(技术细节及原始数据):[https://github.com/pvijeh/reddit-named-entity-recognition/blob/main/chefknives-brands.md](https://github.com/pvijeh/reddit-named-entity-recognition/blob/main/chefknives-brands.md) 原始 Reddit 讨论:[https://www.reddit.com/r/chefknives/comments/1o2p363/i_analyzed_over_1000_posts_on_rchefknives_heres/](https://www.reddit.com/r/chefknives/comments/1o2p363/i_analyzed_over_1000_posts_on_rchefknives_heres/)
查看原文
I wanted to quantify the endless &quot;which knife should I buy&quot; debates on r&#x2F;chefknives, so I built a data analysis pipeline to get some real answers.<p>The project is a 5-phase system built with Node.js. It first uses Fuse.js for fast, typo-tolerant fuzzy matching of ~450 known brands and ~8,700 models. The remaining text is then passed to an LLM (via OpenRouter) for discovering new, unknown entities and performing sentiment analysis on every mention. I ran it on over 1,000 threads, totaling more than 25,000 comments.<p>A few interesting findings:<p>The Underdog: Budget-friendly Tojiro has a massive 27-to-1 positive-to-negative mention ratio.<p>The Controversy King: Shun is by far the most polarizing brand, sparking strong love&#x2F;hate discussions (59 positive vs. 24 negative mentions).<p>The Unloved: Dalstrong was one of the few brands to receive more negative mentions than positive.<p>The system isn&#x27;t perfect—I&#x27;m open about a critical entity aggregation bug in the write-up. The full technical architecture, results, and raw data are available.<p>I&#x27;m here to answer any questions!<p>Blog Post (full story &amp; visualizations): <a href="https:&#x2F;&#x2F;new.knife.day&#x2F;blog&#x2F;we-analyzed-25000-reddit-comments-to-find-most-loved-and-hated-chef-knives" rel="nofollow">https:&#x2F;&#x2F;new.knife.day&#x2F;blog&#x2F;we-analyzed-25000-reddit-comments...</a><p>GitHub (technical breakdown &amp; raw data): <a href="https:&#x2F;&#x2F;github.com&#x2F;pvijeh&#x2F;reddit-named-entity-recognition&#x2F;blob&#x2F;main&#x2F;chefknives-brands.md" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;pvijeh&#x2F;reddit-named-entity-recognition&#x2F;bl...</a><p>Original Reddit Discussion: <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;chefknives&#x2F;comments&#x2F;1o2p363&#x2F;i_analyzed_over_1000_posts_on_rchefknives_heres&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;chefknives&#x2F;comments&#x2F;1o2p363&#x2F;i_analy...</a>