请问HN:人工智能训练模型是否有可能开始在个人文件上进行训练?
最近我一直在整理我在谷歌上的内容。备份和迁移Gmail和Google Drive相对简单,但Google Photos就显得有些棘手。通过Google Takeout的过程,我收到了近500个2GB的压缩文件夹,附加数据文件中的元数据也很混乱,这需要花费一些时间来整理。这完全是我自己的错,长期以来一直坚持使用同一个平台,而我在早期Google Pixel手机的“无限存储”时代就已经上瘾了。
我开始下载和删除存储文件的原因是,我对我的个人照片可能被用来训练AI模型感到(或许是合理的)担忧。某个扩散模型可能会重现我妻子、家人、朋友或我自己的偏见图像,或者引用我的任何文件或文档,以及这些可能被用于什么(商业或其他用途),让我感到不安。
谷歌是我唯一存放个人照片的地方。我从未考虑过任何公共平台,并且相信私人云存储服务会始终保持私密。因此,在我的情况下,谷歌是确保数据主权的唯一选择。
有没有人认为谷歌(以及其他公司)可能很快就会开始扫描我们在他们存储设施中保存的个人文件?这对他们来说是合法的可能性吗?
在我看来,这是一大批新鲜的训练数据,他们不可避免地会想要获取。而考虑到他们已经训练了多少数据,从商业角度来看,这似乎是下一个合乎逻辑的步骤。
显然,他们需要改变隐私政策和服务条款,并通知用户这些变化。是否有可能他们会在没有太多通知的情况下悄悄进行这种改变?
我还想知道是否有人能提供一些关于如何安全地离线备份的最佳策略。我不想只是把我的家庭照片从一个公司转移到另一个公司,让商业高管们训练自己的模型。最近有人处理过这个问题吗?
查看原文
I've been sorting through my content on Google recently. Backing up and moving off of Gmail and Google Drive was relatively simple, but Google Photos is a bit more daunting. The Google Takeout process has delivered me almost 500 2GB zip folders, with scrambled metadata in supplemental data files, which is going to take a while to sort through. It's my own fault for sticking with one platform for so long, and I got hooked during the "unlimited storage" days of early Google Pixel phones.<p>The reason I've begun downloading and removing stored files is because I'm (maybe justifiably or not) concerned about the prospect of my personal photos being used to train AI models. The chance that some diffusion model might end up recreating a heavily biased image of my wife, family, friends, or myself, or referencing any of my files or documents and what that all may be used for (commercially or otherwise) concerns me.<p>Google is the only place I've ever put my personal photos. I've never bothered with anything public facing and trusted that a private cloud storage service would always stay private. So in my case, Google would be the sole place to leave to ensure data sovereignty.<p>Does anybody believe Google (and other companies) might soon start scanning personal files we hold on their storage facilities? Is that a legal possibility for them?<p>It seems to me that it's a huge pool of fresh training data that they would inevitably want to get their hands on. And given how much they have already trained on, it seems the next logical step from a business standpoint.<p>Clearly they would need to change their privacy policies and terms of agreements and inform users of these changes. Is it possible they could slip this sort of change in without much notice?<p>I was also wondering if anybody might have pointers for the best strategy to securely backup offline. I don't want to just shift my family photos from one company to another where business execs are training their own model. Anybody else handled this recently?