展示HN:可证明性框架 – 从保护措施到AI代理的保证
展示 HN:可证明性框架——从安全边界到 AI 代理的保证
<a href="https://mateopetel.substack.com/p/provability-fabric-the-safety-contract" rel="nofollow">https://mateopetel.substack.com/p/provability-fabric-the-saf...</a>
<p>目前大多数 AI 的“安全性”仅仅是感觉。
提示被固化,启发式方法堆叠,仪表盘闪烁绿色。但当代理可以调用工具、流式输出或变更状态时,单靠感觉是无法扩展的。你需要将数学嵌入运行时。</p>
<p>可证明性框架(PF)是一个用于证明行为的框架:</p>
- 规范到证明层 → ActionDSL 策略编译为 Lean 4 的义务 + 运行时监控
- 可证明代理包(PABs) → 绑定规范、证明、SBOM、来源的签名包
- Rust 边车 → 完全中介所有事件(调用、返回、日志、解密、发出块、发出结束)
- 输出证书 → 每个输出都获得签名裁决(通过/失败/不适用)
- TRUST-FIRE 测试套件 → 针对混乱、回滚、隐私消耗、对抗适配器的压力测试
<p>核心赌注:</p>
“安全性不是一个设置。它是你的语义与运行时之间的对应证明。”
<p>希望听听 HN 的看法:</p>
- 这是超越感觉走向可证伪保证的正确方式吗?
- 证明携带包(如 PAB-1.0)应该成为部署代理的标准吗?
- 你需要什么才能信任生产环境中的代理——证明、监控,还是其他什么?
查看原文
Show HN: Provability Fabric — From Guardrails to Guarantees for AI Agents
<a href="https://mateopetel.substack.com/p/provability-fabric-the-safety-contract" rel="nofollow">https://mateopetel.substack.com/p/provability-fabric-the-saf...</a><p>Most AI “safety” today is just vibes.
Prompts get hardened, heuristics stacked, dashboards blink green. But when agents can call tools, stream outputs, or mutate state, vibes don’t scale. You need math wired into the runtime.<p>Provability Fabric (PF) is a framework for proof-carrying behavior:<p>Spec-to-Proof layer → ActionDSL policies compile to Lean 4 obligations + runtime monitors<p>Provable Agent Bundles (PABs) → signed packages binding spec, proofs, SBOM, provenance<p>Rust sidecar → complete mediation of all events (call, ret, log, declassify, emit_chunk, emit_end)<p>Egress certificates → every output gets a signed verdict (pass/fail/inapplicable)<p>TRUST-FIRE test suite → stress tests for chaos, rollback, privacy burn-down, adversarial adapters<p>The core bet:<p>“Safety isn’t a setting. It’s a correspondence proof between your semantics and your runtime.”<p>Would love HN’s take:<p>Is this the right way to move beyond vibes toward falsifiable guarantees?<p>Should proof-carrying bundles (like PAB-1.0) become the standard for deploying agents?<p>What would it take for you to trust an agent in prod — proofs, monitors, or something else?