Verdic – AI系统的意图治理层

1作者: kundan_s__r大约 1 个月前原帖
我们创建Verdic是因为在将大型语言模型(LLMs)投入生产时,反复遇到同样的问题:大多数人工智能失败并不是关于内容安全,而是关于意图漂移。 随着模型变得更加自主,输出往往会悄然从描述性行为转变为规定性行为——而没有任何明确的信号表明系统现在实际上正在采取行动。在这种情况下,关键词过滤器和基于规则的保护措施很快就会失效。 Verdic是一个意图治理层,位于模型与应用程序之间。它不是检查主题或关键词,而是评估: - 输出是否将未来的选择压缩为特定的行动方案 - 响应是否施加了规范性压力(引导行为与解释之间的区别) 我们的目标不是进行内容审核,而是实现行为控制:检测人工智能系统是否在超出其部署意图的情况下运行,特别是在受监管或决策关键的工作流程中。 Verdic目前作为API运行,具有可配置的允许/警告/阻止结果。我们正在对自主工作流程和长时间运行的链条进行测试,因为在这些情况下意图漂移最难以检测。 这是一个早期版本。我主要希望从在生产中部署LLMs的人那里获得反馈,特别是在以下方面: - 自主系统 - 人工智能治理 - 风险与合规 - 我们可能遗漏的失败模式 很高兴回答问题或分享更多关于该方法的细节。
查看原文
We built Verdic after repeatedly running into the same issue while deploying LLMs in production: most AI failures aren’t about content safety, they’re about intent drift.<p>As models become more agentic, outputs often shift quietly from descriptive to prescriptive behavior — without any explicit signal that the system is now effectively taking action. Keyword filters and rule-based guardrails break down quickly in these cases.<p>Verdic is an intent governance layer that sits between the model and the application. Instead of checking topics or keywords, it evaluates:<p>whether an output collapses future choices into a specific course of action<p>whether the response exerts normative pressure (directing behavior vs explaining)<p>The goal isn’t moderation, but behavioral control: detecting when an AI system is operating outside the intent it was deployed for, especially in regulated or decision-critical workflows.<p>Verdic currently runs as an API with configurable allow &#x2F; warn &#x2F; block outcomes. We’re testing it on agentic workflows and long-running chains where intent drift is hardest to detect.<p>This is an early release. I’m mainly looking for feedback from people deploying LLMs in production, especially around:<p>agentic systems<p>AI governance<p>risk &amp; compliance<p>failure modes we might be missing<p>Happy to answer questions or share more details about the approach.