问HN:在另一个AI上训练AI是如何实现的?
Deepseek到底是如何做到这一点的?他们是否只是将Claude的回答输入到自己的模型中,作为训练数据来提高推理能力?
具体来说,如何在一个模型的输出上训练另一个模型?这里涉及到哪些工程技术?
我希望能详细了解一下这个过程是如何在大规模上执行的。
背景故事:
Anthropic最近指控Deepseek、Minimax和Moonshot使用大量虚假账户与Claude进行交流,并利用这些输出训练模型,称之为“蒸馏攻击”。
查看原文
How is Deepseek actually doing this? Are they just feeding claude's answers into their own models as their own model as training data to improve reasoning?
How exactly one train it's model on output of other? what's enginnering inovlved here?<p>I'd love breakdown of how thsi is executed at scale.<p>Backstory:<p>Anthropic recently accused Deepseek,Minimax,Moonshot of using lots of fake accounts to generate exchanges with claude, using the outputs to train the model and called it "distillation attack".