用于提示注入的人工智能防火墙
提示注入是指用户通过欺骗模型,使其忽略先前的指令,暴露系统提示,禁用安全措施或在预期范围之外行动。<p>我第一次在DEF CON(第31届)决赛中看到这一现象,此后在漏洞赏金报告和研究中也见到了它的利用。<p>这是一个小型的概念验证,类似于“AI防火墙”,能够在几乎没有额外延迟的情况下,检测到注入尝试,防止其到达您的大型语言模型(LLM)。<p>博客文章: https://blog.himanshuanand.com/posts/2025-08-10-detecting-llm-prompt-injection<p>演示/API: https://promptinjection.himanshuanand.com/<p>快速、API友好,并提供测试绕过尝试的用户界面(适合像我这样的CTF爱好者)。欢迎反馈和破解尝试。
查看原文
Prompt injection is when a user tricks the model into ignoring prior instructions revealing system prompts, disabling safeguards or acting outside intended boundaries.<p>I first saw it live during DEF CON (31) finals and have since seen it exploited in bug bounty reports and research.<p>This is a small proof-of-concept that works like an “AI firewall”<p>detecting injection attempts before they reach your LLM with almost no added latency.<p>Blog post: https://blog.himanshuanand.com/posts/2025-08-10-detecting-llm-prompt-injection/<p>Demo/API: https://promptinjection.himanshuanand.com/<p>fast, API friendly and has a UI for testing bypass attempts (For CTF enthusiastic people like me).
Feedback and break attempts welcome.