HackerNews中文版

深入研究在人际研究中是个麻烦。我正在尝试构建一个专门为人们进行深入研究的产品。目标是：进行深入研究 -> 将其转化为简短易懂的内容，就像Instagram的短视频，但只是关于一些有趣事物的文字摘要。这对于咖啡聊天、约会等场合非常有用（我想这将是Cluely发布视频中的API，链接：https://www.youtube.com/watch?v=Rz3LD7u2KX8，专注于约会人群的搜索）。我想讨论一些新颖、独特的方法来实现这一目标。以下是我想到的所有想法：使其复杂化的因素有： 1) 人们可能有相同的名字，因此互联网上可能没有相关的文章。 2) 对人们“有趣”事物的分类。 a) 显而易见的方法是使用大型语言模型（LLMs），但即使LLMs的成本降低，速率限制也不会降低，真正有趣的见解所需的链接数量限制了这一点。 b) （不是广告），使用分层文本分类工具 https://www.trytaylor.ai。逻辑是，特定句子填充的信息桶越多，它就越“有趣/多维”。我目前考虑的流程是： 1) 获取该人的前20个搜索结果并进行抓取（如果你知道如何有效利用这些信息，你会惊讶于这能提供多少关于一个人的信息）。 2) 抓取（如果某些网站如LinkedIn较难，可以使用BrightData，如果是YouTube则获取转录文本）。 3) 建立一个个人事实存储 -> 简单的数组。假设我知道约翰是我想找到的人，并且他有一个LinkedIn账户。由于我有他的LinkedIn，我的个人事实存储将包含以下详细信息： [ 1) 头像 2) 摩根大通实习 3) 宾夕法尼亚大学学士 ] 等等。 4) 数据验证。解决匹配名字的问题。假设我现在找到了一篇关于约翰的文章，但我不知道这是否是指的那个约翰。假设文章是： “摩根大通实习生约翰喜欢寿司。” 这里有两种方法： a) 我知道他的头像，所以尝试查看他的LinkedIn头像与这篇文章中的头像是否匹配。如果文章中的图片匹配，那么我们可以添加他喜欢寿司的事实，从而丰富我们的数组。或者 b) 我知道他是摩根大通的实习生，这是我们的基本事实，因此我们可以假设他喜欢寿司的偏好也是真的。 5) 通过 https://www.trytaylor.ai 进行处理，我正在尝试获取产品访问权限，但他们的认证系统有问题，我现在根本无法使用。本质上，它将把我收集的所有信息分类到不同的桶和子桶中。将结果缩小到符合某些标准的前5%，因为我现在有了这些标签。

查看原文

Deep research sucks for people research. I'm trying to build a product that does deep research for people specifically.The goal is: Do deep research -> turn it into bite size swipeables, like insta reels but just text blurbs about something interesting. This could be super useful for coffee chats, dates (I imagine it would be the API that the cluely launch vid https://www.youtube.com/watch?v=Rz3LD7u2KX8 dating people search.)I wanna discuss novel, out of the box approaches to doing this.Here are all the ideas I've come up with:What makes it complex is:1) People can have same names so articles on internet may not exist 2) Classifying "interesting" things about people.a) The obvious approach is to use LLMs but even if the cost of the LLMs go down, the rate limits arent going to go down and the volume of links to crawl for truly interesting insights of people limits this.b) (not a promo), using hierchical text classification with https://www.trytaylor.ai/Ig the logic is, the more the buckets of info that a specific sentence fills, the more "interesting/dimensional" it is.Current flow i'm thinking about is:1) Get the first 20 search results of the person and scrape, (ud be surprised how much info this gives on a person if you know how to use it well).2) Scrape (if some websites like linkedin are hard, use brightdata, if youtube then get transcript).3) Have a person fact store -> Simple Array.Assume I know that John is the person I wanna find and he has a linkedin.Since I have his linkedin, my person fact store will have the following details:[ 1) Profile Pic 2) JP morgan Internship 3) UPenn Bachelors ]etc.4) Data Validation.Solving the problem of matching names.Lets say I now go and find an article about John but I dont know if this is the john that its referring to.Lets say the article is:John the JP morgan intern likes sushi.There are two approaches herea) I know his profile pic so try to see if theres a match with his linkedin pfp and this one. If the pic in the article matches, then we can append the fact that he likes sushi. So this enriches our array.ORb) I know that he is a JP morgan intern from our ground truth, so we can assume sushi pref is also true.5) Running it through https://www.trytaylor.ai, im trying to get access to product but their auth is fucked, I literally can't use it rn.Essentially it would classify all this information that I collected into buckets and sub-buckets.Narrow it down to the top 5% of results that fit a certain criteria since I now have the tags for this.

为人们提供深入研究服务