HackerNews中文版

提示注入是指用户通过欺骗模型，使其忽略先前的指令，暴露系统提示，禁用安全措施或在预期范围之外行动。我第一次在DEF CON（第31届）决赛中看到这一现象，此后在漏洞赏金报告和研究中也见到了它的利用。这是一个小型的概念验证，类似于“AI防火墙”，能够在几乎没有额外延迟的情况下，检测到注入尝试，防止其到达您的大型语言模型（LLM）。博客文章： https://blog.himanshuanand.com/posts/2025-08-10-detecting-llm-prompt-injection演示/API: https://promptinjection.himanshuanand.com/快速、API友好，并提供测试绕过尝试的用户界面（适合像我这样的CTF爱好者）。欢迎反馈和破解尝试。

查看原文

Prompt injection is when a user tricks the model into ignoring prior instructions revealing system prompts, disabling safeguards or acting outside intended boundaries.I first saw it live during DEF CON (31) finals and have since seen it exploited in bug bounty reports and research.This is a small proof-of-concept that works like an “AI firewall”detecting injection attempts before they reach your LLM with almost no added latency.Blog post: https://blog.himanshuanand.com/posts/2025-08-10-detecting-llm-prompt-injection/Demo/API: https://promptinjection.himanshuanand.com/fast, API friendly and has a UI for testing bypass attempts (For CTF enthusiastic people like me). Feedback and break attempts welcome.

用于提示注入的人工智能防火墙