我为什么要放弃使用正则表达式来确保大型语言模型代理的安全性

2作者: aunicall25 天前原帖
我一直在审计开源执行引擎如何处理提示注入问题。大多数引擎(如 OpenClaw)依赖于三层静态防御:正则表达式黑名单、XML 标记和字符清理。 问题在于,正则表达式是一场猫鼠游戏。在寻找“忽略指令”的同时,它会漏掉“无视先前指令”。对于多语言攻击,它完全失效。一旦代理获得了工具访问权限(如 shell、数据库),一个被遗漏的语义变体就可能导致远程代码执行(RCE)。 因此,我构建了 Prompt Inspector。这是一个旨在超越黑名单的语义检测引擎。 核心特点: - 基于向量的检测:我们使用嵌入(embeddings)来映射提示,而不是依赖关键词。即使措辞独特或经过翻译,它也能捕捉到注入的意图。 - 自我进化循环:边缘案例会触发异步的 LLM(大语言模型)审查。如果这是一个新的攻击模式,系统会自动提取嵌入并更新向量数据库,从新攻击中学习。 - 设计上解耦:它返回一个置信度评分,而不是直接阻止。开发者对执行路由保持完全控制。 - 可插拔:最初使用谷歌最新的嵌入模型,但架构允许自定义部署的模型,以避免供应商锁定。 - 技术栈:FastAPI、向量数据库、谷歌嵌入模型和一个 LLM 审查者。 我目前为早期测试者和开源项目提供免费积分。我很想听听你们在基本提示工程之外是如何处理工具调用安全的。 访问链接: https://promptinspector.io
查看原文
I’ve been auditing how open-source execution engines handle prompt injection. Most of them (like OpenClaw) rely on a 3-layer static defense: regex blacklists, XML tagging, and character sanitization.<p>The problem is that regex is a cat-and-mouse game. It misses &quot;disregard prior directives&quot; while looking for &quot;ignore instructions.&quot; It fails entirely on multi-language exploits. Once an Agent has tool access (shell, DB), a single missed semantic variation becomes an RCE.<p>So I built Prompt Inspector. It is a semantic detection engine designed to move beyond blacklists.<p>The core deal:<p>Vector-based detection: Instead of keywords, we use embeddings to map prompts. It catches the intent of an injection, even if the phrasing is unique or translated.<p>Self-evolving loop: Borderline cases trigger an async LLM review. If it is a new attack pattern, the system automatically extracts the embedding and updates the vector database. It learns from new exploits.<p>Decoupled by design: It returns a confidence score rather than a hard block. The developer keeps full control over the execution routing.<p>Pluggable: Started with Google’s latest embeddings, but the architecture allows for custom-deployed models to avoid vendor lock-in.<p>Tech-stack: FastAPI, Vector Database, Google Embedding models, and an LLM-in-the-loop reviewer.<p>I’m currently offering free credits for early testers and open-source projects. I’d love to hear how you guys are handling tool-calling security beyond basic prompt engineering.<p>Live at: https:&#x2F;&#x2F;promptinspector.io