超级智能的数据管道始于你的屏幕

1作者: Nadav--Shanun1 天前原帖
超智能的数据管道始于你的屏幕 有一个问题困扰着人工智能研究数十年:如何构建一个真正理解世界的系统? 不是一个仅仅预测下一个标记的系统,也不是一个能够通过基准测试的系统,而是一个真正模拟现实的系统——就像人类一样,像生态系统一样,像城市一样。 我认为答案的起点在于此,而这并不是大多数人所关注的地方。 地球上最密集的数据结构 人类大脑存储大约2.5拍字节的信息。这相当于250万吉字节的信息被压缩在1.4千克的组织中。一项来自索尔克研究所的研究发现,大脑的125万亿个突触每个可以在26个不同层次上存储约4.7位信息——是科学家们之前认为的十倍。 为了让这个概念更清晰:雅虎的整个数据仓库,每天处理240亿个事件,存储的数据量还不及一个人类大脑。追踪3亿美国人的国税局数据库?大约150太字节。你大脑中的信息量大约是这个的17倍。 在自然界中几乎看不到这种信息密度。热带雨林复杂得令人震惊,但每立方厘米的数据量远远不及。珊瑚礁、城市、沙漠——都包含着非凡的信息,但没有一个能接近你耳边这块三磅重的器官的压缩比。 这对人工智能的未来至关重要。 如果你想要模拟一切,从大脑开始 这里是论点:超智能的最佳数据管道,首先是人类大脑的数据管道。 想想看。如果我们能够创建一个真正的人类认知数字双胞胎——不是一个基于文本训练的语言模型,而是一个活生生的模型,展示一个特定人类如何思考、决策和行动——那么我们就破解了地球上最艰难的数据问题。已知宇宙中最密集、最复杂的信息结构被数字化了。 一旦你建立了模拟这一点的基础设施,同样的技术可以向外延伸。团队的数字双胞胎、组织的数字双胞胎、供应链和城市的数字双胞胎,最终是生态系统的数字双胞胎——丛林、海洋、沙漠、整个国家。每一个都是在不确定性下做出决策的相互作用的代理系统。大脑只是单位空间内数据密度最高的一个。 因此,如果你认真考虑构建超智能,甚至只是构建真正理解世界的人工智能系统,你就不应该从更多的文本数据开始,而是应该从人类大脑开始。 屏幕是观察大脑的窗口 现在,实际的问题是:你如何实际观察一个人类大脑的运作? 你可以尝试神经成像技术,比如功能性磁共振成像(fMRI)、脑电图(EEG)、脑机接口。这些技术有前景,但也有限——成本高、侵入性强、日常使用时分辨率低。 或者,你可以看看大脑在大多数清醒时间里已经坐在面前的东西。
查看原文
The Data Pipeline for Superintelligence Starts With Your Screen<p>There’s a question that’s been haunting AI research for decades: how do you build a system that truly understands the world?<p>Not one that predicts the next token. Not one that passes benchmarks. One that actually models reality — the way a human does, the way an ecosystem does, the way a city does.<p>I think I know where the answer starts. And it’s not where most people are looking.<p>The Densest Data Structure on Earth<p>The human brain stores approximately 2.5 petabytes of information. That’s 2.5 million gigabytes packed into 1.4 kilograms of tissue. A Salk Institute study found that each of the brain’s 125 trillion synapses can hold about 4.7 bits of information across 26 distinct levels — ten times more than scientists previously believed.<p>To put that in perspective: Yahoo’s entire data warehouse, processing 24 billion events per day, holds less data than a single human brain. The IRS database tracking 300 million Americans? About 150 terabytes. Your brain holds roughly 17 times that.<p>You almost don’t see this kind of information density anywhere else in nature. A rainforest is staggeringly complex, but the data per cubic centimeter doesn’t come close. A coral reef, a city, a desert — all contain extraordinary information. But none of them approach the compression ratio of the three-pound organ sitting between your ears.<p>This matters enormously for the future of AI.<p>If You Want to Model Everything, Start With the Brain<p>Here’s the thesis: the best data pipeline for superintelligence is, first, a data pipeline for the human brain.<p>Think about it. If we could create a true digital twin of human cognition — not a language model trained on text, but a living model of how a specific human thinks, decides, and acts — we’d have cracked the hardest data problem on the planet. The densest, most complex information structure in the known universe, digitized.<p>And once you’ve built the infrastructure to model that, the same technology extends outward. Digital twins of teams. Of organizations. Of supply chains and cities. Eventually, of ecosystems — jungles, oceans, deserts, entire countries. Each one is a system of interacting agents making decisions under uncertainty. The brain is just the one with the highest data density per unit of space.<p>So if you’re serious about building toward superintelligence — or even just toward AI systems that truly understand the world — you don’t start with more text data. You start with the human brain.<p>The Screen Is the Window Into the Brain<p>Now here’s the practical question: how do you actually observe a human brain in action?<p>You could try neuroimaging. fMRIs, EEGs, brain-computer interfaces. They’re promising but limited — expensive, invasive, low-resolution for everyday use.<p>Or you could look at what’s already sitting in front of the brain for most of its waking hours.