HackerNews中文版

我一直在审计开源执行引擎如何处理提示注入问题。大多数引擎（如 OpenClaw）依赖于三层静态防御：正则表达式黑名单、XML 标记和字符清理。问题在于，正则表达式是一场猫鼠游戏。在寻找“忽略指令”的同时，它会漏掉“无视先前指令”。对于多语言攻击，它完全失效。一旦代理获得了工具访问权限（如 shell、数据库），一个被遗漏的语义变体就可能导致远程代码执行（RCE）。因此，我构建了 Prompt Inspector。这是一个旨在超越黑名单的语义检测引擎。核心特点： - 基于向量的检测：我们使用嵌入（embeddings）来映射提示，而不是依赖关键词。即使措辞独特或经过翻译，它也能捕捉到注入的意图。 - 自我进化循环：边缘案例会触发异步的 LLM（大语言模型）审查。如果这是一个新的攻击模式，系统会自动提取嵌入并更新向量数据库，从新攻击中学习。 - 设计上解耦：它返回一个置信度评分，而不是直接阻止。开发者对执行路由保持完全控制。 - 可插拔：最初使用谷歌最新的嵌入模型，但架构允许自定义部署的模型，以避免供应商锁定。 - 技术栈：FastAPI、向量数据库、谷歌嵌入模型和一个 LLM 审查者。我目前为早期测试者和开源项目提供免费积分。我很想听听你们在基本提示工程之外是如何处理工具调用安全的。访问链接： https://promptinspector.io

查看原文

I’ve been auditing how open-source execution engines handle prompt injection. Most of them (like OpenClaw) rely on a 3-layer static defense: regex blacklists, XML tagging, and character sanitization.The problem is that regex is a cat-and-mouse game. It misses "disregard prior directives" while looking for "ignore instructions." It fails entirely on multi-language exploits. Once an Agent has tool access (shell, DB), a single missed semantic variation becomes an RCE.So I built Prompt Inspector. It is a semantic detection engine designed to move beyond blacklists.The core deal:Vector-based detection: Instead of keywords, we use embeddings to map prompts. It catches the intent of an injection, even if the phrasing is unique or translated.Self-evolving loop: Borderline cases trigger an async LLM review. If it is a new attack pattern, the system automatically extracts the embedding and updates the vector database. It learns from new exploits.Decoupled by design: It returns a confidence score rather than a hard block. The developer keeps full control over the execution routing.Pluggable: Started with Google’s latest embeddings, but the architecture allows for custom-deployed models to avoid vendor lock-in.Tech-stack: FastAPI, Vector Database, Google Embedding models, and an LLM-in-the-loop reviewer.I’m currently offering free credits for early testers and open-source projects. I’d love to hear how you guys are handling tool-calling security beyond basic prompt engineering.Live at: https://promptinspector.io

我为什么要放弃使用正则表达式来确保大型语言模型代理的安全性