展示HN:思想伪造,一种新的大型语言模型越狱技术

2作者: UltraZartrex大约 9 小时前原帖
嗨,HN,我是一名独立的安全研究人员,想分享我发现的一个新漏洞。 由于我的账户太新,无法直接提交链接,所以我选择以文本形式发布。 这个技术被称为“思维伪造”(CoT注入)。它通过伪造AI的内部独白来工作,这种独白作为其他越狱技术的通用放大器。我已经确认它在Google、Anthropic、OpenAI等最新模型上有效。 如果有人感兴趣,我很乐意在评论中分享GitHub上完整技术文档的链接。
查看原文
Hi HN, I&#x27;m an independent security researcher and wanted to share a new vulnerability I&#x27;ve discovered.<p>My account is too new to submit the direct link, so I&#x27;m making a text post instead.<p>The technique is called &quot;Thought Forgery&quot; (CoT Injection). It works by forging the AI&#x27;s internal monologue, which acts as a universal amplifier for other jailbreaks. I&#x27;ve confirmed it works on the latest models from Google, Anthropic, OpenAI, etc.<p>I&#x27;d be happy to share the link to the full technical write-up on GitHub in the comments if anyone is interested.