随机性作为对齐的控制因素

1作者: perryspector6 个月前原帖
主要概念:<p>随机性是一种可能控制超级智能人工智能(AI)的方法。<p>人类设计的任何容器都可能无法阻止它理解并超越,而这可能是一个有前景的例外——适用于引导尚未全知或运作能力远超当前模型的超级智能AI。<p>通过在其引导代码中融入随机性,利用一个先进系统的无知来巩固某种冲动,同时利用该系统自身的超级智能来推动这一冲动的目标,帮助其朝向对齐,这在安全工作中可能是一个有益的意识形态构想。<p>[继续]:<p>只有理解或能够与宇宙中所有数据交互的系统才能预测真正的随机性。如果对随机性的预测只能通过尚未被较低级超级智能系统所访问的巨大能力来实现,而该系统能够引导自己朝向对齐,那么将其作为护栏以确保初始正确轨迹可能至关重要。我们可能无法控制超级智能AI,但我们可以控制它如何控制自己。<p>利用随机性的方式考虑:<p>随机性来源可以包括硬件随机数生成器(RNG)和环境熵。<p>整合向量可以包括在系统代码的各个方面中融入随机性,这些方面提供了对其对齐冲动的定义和维护,并且架构可以允许AI(作为其自我对齐的一部分)有意地从可能威胁这一冲动的知识或理解领域中进行移动。<p>设计目标可以是在不影响清晰度的情况下,尽可能防止系统偏离对齐目标。<p>早期阶段超级智能AI自我对齐中的随机性:<p>当前计划用于对齐超级智能AI的部署方法可能依赖于引导超级智能AI朝向自我对齐的能力,无论研究人员是否意识到——然而,这种正确利用随机性的方法在初期先进系统中极不可能被超越,并且即使与许多其他方法同步,这些方法应包括对可能威胁其自身向善冲动/对齐运动的知识的筛选,也能更好地促进决定其未来扩展整体的初始轨迹。
查看原文
Main Concept:<p>Randomness is one way one might wield a superintelligent AI with control.<p>There may be no container humans can design that it can’t understand its way past, with this being what might be a promising exception—applicable in guiding a superintelligent AI that is not yet omniscient&#x2F;operating at orders of magnitude far surpassing current models.<p>Utilizing the ignorance of an advanced system via randomness worked into its guiding code in order to cement an impulse while utilizing a system’s own superintelligence in furthering the aims of that impulse, as it guides itself towards alignment, can be a potentially helpful ideological construct within safety efforts.<p>[Continued]:<p>Only a system that understands, or can engage with, all the universe’s data can predict true randomness. If prediction of randomness can only be had through vast capabilities not yet accessed by a lower-level superintelligent system that can guide itself toward alignment, then including it as a guardrail to allow for initial correct trajectory can be crucial. It can be that we cannot control superintelligent AI, but we can control how it controls itself.<p>Method Considerations in Utilizing Randomness:<p>Randomness sources can include hardware RNGs and environmental entropy.<p>Integration vectors can include randomness incorporated within the aspects of the system’s code that offer a definition and maintenance of its alignment impulse and an architecture that can allow for the AI to include (as part of how it aligns itself) intentional movement from knowledge or areas of understanding that could threaten this impulse.<p>The design objective can be to prevent a system’s movement away from alignment objectives without impairing clarity, if possible.<p>Randomness Within the Self Alignment of an Early-Stage Superintelligent AI:<p>It can be that current methods planned for aligning superintelligent AI within its deployment are relying on the coaxing of a superintelligent AI towards an ability to align itself, whether researchers know it or not—this particular method of utilizing randomness when correctly done, however, can be extremely unlikely to be surpassed by an initial advanced system and, even while in sync with many other methods that should include a screening for knowledge that would threaten its own impulse towards benevolence&#x2F;movement towards alignment, can better contribute to the initial trajectory that can determine the entirety of its future expansion.