HackerNews中文版

你们这些拥有几乎所有数据和廉价计算资源的人工智能研究者，已经想到了这一点。那么，如何可视化人类所有知识呢？ 1. 为所有内容标记日期（可能已经完成）。在较长的段落中，根据发表和撰写的速度估算日期。 2. 一旦每个句子或短语都有了日期，创建几个区块链。一个用于短小的想法（128个标记），更多的链用于较长的想法（3072个以上标记）。 3. 对每个标记化的想法运行大型语言模型（LLM）嵌入的余弦相似度或其他更好的度量，设定一个阈值，比如75%。如果没有找到，则按自然标点符号进行标记化。 4. 只有新的内容会被存储在区块中，如果是新颖的内容则进行链接。再次记录短小和长篇的想法。从向量中提取坐标。 5a. 我记得曾经读到，攻读博士学位就像是在戳破气球的内部，我非常喜欢这个比喻。每一个新颖的想法都在不断扩大的知识球体内部戳破。 5b. 有没有人尝试过将人类知识以三维形式进行映射？ 5c. 或者，可以考虑一个随着时间推移而生长的树或根系。从向量中分支。

查看原文

One of you AI researchers with almost all data and cheap compute has already thought of this. What would it look like to visualize all human knowledge?<p>1. Tag all content with a date (probably almost done). In long passages, estimate a date based on published + written rate. 2. Once every sentence or phrase has a date, create a few blockchains. One for short thoughts (128 tokens), and more chains for longer thoughts (3072+ tokens). 3. Run an LLM embeddings cosine similarity or some better metric for each tokenized idea with a threshold of say 75%, tokenized by natural punctuation if not found. 4. Only NEWish content gets stored in the block, chained if novel. Again, short and long thoughts recorded. Coordinates from the vectors. 5a. I remember reading once that working through a PhD was like pricking the inside of a balloon, I love that visualization. Every novel idea pricks the inside of a growing sphere of knowledge over time. 5b. Has anyone ever tried to map human knowledge in three dimensions? 5c. Alternatively, a tree or root system growing over time. Branching from the vectors.

请问HN：如何可视化所有人类思想