我们提供数据来训练人工智能模型,却没有得到任何回报。
我对被人工智能取代的担忧较少,更让我感到沮丧的是,企业正在窃取我们的数据来训练他们从中获利的人工智能模型,这可能会随着时间的推移使我们变得不再有价值。
无论你是:
- 编写干净、可重用函数或内部工具的程序员,
- 制作教程或产品演示的用户生成内容创作者,
- 进行精确标注的数据标注员……
……所有这些劳动都创造了知识产权,最终用于训练人工智能模型。
但问题在于:我们并不拥有这些知识产权,尽管没有我们,它们根本不会存在。
他们以各种手段获取我们的数据,训练模型,并从中提取巨大的价值,而我们却得不到任何报酬,充其量只是一小笔一次性费用。
是的,企业确实发挥着重要作用。但他们正在利用我们的工作来取代我们或贬低我们的价值。因此,我们有充分的理由要求更多。
仔细想想,数据挖掘就像矿产开采——正如企业从地球中提取黄金或钻石等有价值的资源,往往剥削劳动力和治理不善的地区,数据挖掘则是从一个管理不善的人群及其数据中提取价值,常常是在他们未完全知情或未同意的情况下。
我认为,现在是建立更公平的数据系统的合适时机——版税?数据工会?公司内部贡献的开放所有权?
这种商业模式并不新鲜——一些数据来源和收集公司不仅收取一次性费用,还会在每次使用数据时收取基于使用量的费用。
这样做不仅是为了使数据供应链公平,也是为了改善人工智能。我们都知道,人工智能的性能与计算能力成正比,而利用不断增加的计算能力的最佳方式就是将其应用于新数据。因此,如果我们希望人工智能继续改进,就需要一个合适的数据供应链。如果我们希望在更复杂的任务中获得高质量的数据,就必须确保每个人都得到公平的报酬。
期待听到你的想法。
查看原文
I’m less worried about being replaced by AI and more frustrated that companies are stealing our data to train AI models they profit from with potential to make us less valuable over time.<p>Whether you’re:<p>- A coder writing clean, reusable functions or internal tooling,<p>- A UGC creator making tutorials or product demos,<p>- A data labeller doing precise annotations...<p>…all of that labor creates intellectual property that ends up training AI models.<p>But here’s the problem: we don’t own any of it, even though it wouldn’t exist without us.<p>They take our data—by hook or by crook—train a model, and extract massive value from it, while paying us nothing or, at best, a small one-time fee.<p>Yes, companies do play a valuable role. But they are using our work to replace us or devalue our work. So we have every right to ask for more.<p>If you really think about it, data mining is much like mineral mining — just as companies extract valuable resources like gold or diamonds from the earth, often exploiting labor and poorly governed regions, data mining extracts value from a poorly managed pool of people and their data, frequently without their full knowledge or consent regarding how it will be used.<p>I think now is the right time to build fairer systems around data for everyone—royalties? data unions? open ownership of internal contributions within companies?<p>This business model isn't new—some data sourcing and collection companies charge not only a one-time fee but also a usage-based fee each time the data is used.<p>Doing this is not only necessary to make the data supply chain fair, but also to improve AI. We all know that AI performance scales with compute, and the best way to leverage increasing compute is by applying it to new data. So, if we want AI to continue improving, we need a proper data supply chain. And if we want high-quality data for more complex tasks, we must ensure that everyone is paid fairly.<p>Would love to hear your thoughts on this.