请问HN:如何可视化所有人类思想
你们这些拥有几乎所有数据和廉价计算资源的人工智能研究者,已经想到了这一点。那么,如何可视化人类所有知识呢?
1. 为所有内容标记日期(可能已经完成)。在较长的段落中,根据发表和撰写的速度估算日期。
2. 一旦每个句子或短语都有了日期,创建几个区块链。一个用于短小的想法(128个标记),更多的链用于较长的想法(3072个以上标记)。
3. 对每个标记化的想法运行大型语言模型(LLM)嵌入的余弦相似度或其他更好的度量,设定一个阈值,比如75%。如果没有找到,则按自然标点符号进行标记化。
4. 只有新的内容会被存储在区块中,如果是新颖的内容则进行链接。再次记录短小和长篇的想法。从向量中提取坐标。
5a. 我记得曾经读到,攻读博士学位就像是在戳破气球的内部,我非常喜欢这个比喻。每一个新颖的想法都在不断扩大的知识球体内部戳破。
5b. 有没有人尝试过将人类知识以三维形式进行映射?
5c. 或者,可以考虑一个随着时间推移而生长的树或根系。从向量中分支。
查看原文
One of you AI researchers with almost all data and cheap compute has already thought of this. What would it look like to visualize all human knowledge?<p>1. Tag all content with a date (probably almost done). In long passages, estimate a date based on published + written rate.
2. Once every sentence or phrase has a date, create a few blockchains. One for short thoughts (128 tokens), and more chains for longer thoughts (3072+ tokens).
3. Run an LLM embeddings cosine similarity or some better metric for each tokenized idea with a threshold of say 75%, tokenized by natural punctuation if not found.
4. Only NEWish content gets stored in the block, chained if novel. Again, short and long thoughts recorded. Coordinates from the vectors.
5a. I remember reading once that working through a PhD was like pricking the inside of a balloon, I love that visualization. Every novel idea pricks the inside of a growing sphere of knowledge over time.
5b. Has anyone ever tried to map human knowledge in three dimensions?
5c. Alternatively, a tree or root system growing over time. Branching from the vectors.