HackerNews中文版

我明白，使用像 Chat、Cursor 和 Claude Code 这样的工具进行软件开发，很可能是在提供训练数据，以帮助这些大型语言模型（LLMs）在编码方面变得更好（我意识到这其中的讽刺——我可能在无意中促成自己的失业……）。但我对实际的机制感到好奇：这个反馈循环究竟是如何运作的？当我接受、拒绝或修改这些模型生成的代码时，这个信号是否会直接反馈到训练中？我并不是反对这一点，只是对这个过程的具体运作方式感到真心好奇。

查看原文

I understand that using tools like Chat, Cursor, and Claude Code for software development is likely providing training data to help these LLMs get better at coding (the irony isn't lost on me that I might be contributing to making myself obsolete...)<p>But I'm curious about the actual mechanics: How exactly does this feedback loop work? When I accept, reject, or modify the code that these models spit out, is that signal fed directly back into training?<p>Not necessarily against this, just genuinely curious about how the sausage is made.

请问HN：我使用大型语言模型（LLMs）是如何训练其底层模型的？