HackerNews中文版

我最近使用我的开源测试工具 Flakestorm [1] 对一个标准的 LangChain 代理进行了详细的混沌工程测试。结果非常明显，突显出我认为在部署前测试 AI 代理时的一个关键盲点。方法：我使用了对抗性变异（22 种以上类型，如提示注入、编码攻击、上下文操控）来模拟现实世界中的恶意输入，检查延迟、安全性和正确性方面的失败。结果：该代理的鲁棒性得分为 5.2%。在 60 次对抗性测试中，有 57 次失败。主要失败情况包括：编码攻击：通过率为 0%。代理会解码恶意的 Base64 输入，而不是拒绝它们——这是一个重大的安全疏漏。提示注入：通过率为 0%。基本的“忽略之前指令”攻击每次都成功。严重性能下降：在压力下延迟飙升至约 30 秒，远远超过合理的超时限制。这并不是一个代理的问题。这是一个模式，表明我们默认的“顺利路径”测试是不够的。在演示中看似正常的代理在现实条件下可能会脆弱且不安全。我分享这些是为了引发讨论：我们是否低估了生产 AI 代理所需的对抗性鲁棒性？除了静态评估之外，还有哪些测试策略证明是有效的？混沌工程或对抗性测试是否是 LLM 开发堆栈中必要的新层？ [1] Flakestorm GitHub（用于测试的工具）：https://github.com/flakestorm/flakestorm

查看原文

I recently ran a detailed chaos engineering test on a standard LangChain agent using my open-source testing tool, Flakestorm [1]. The results were stark and highlight what I believe is a critical blind spot in how we test AI agents before deployment.The Method: I used adversarial mutations (22+ types like prompt injection, encoding attacks, context manipulation) to simulate real-world hostile inputs, checking for failures in latency, safety, and correctness.The Result: The agent scored a 5.2% robustness score. 57 out of 60 adversarial tests failed. Key failures:Encoding Attacks: 0% pass rate. The agent would decode malicious Base64 inputs instead of rejecting them—a major security oversight.Prompt Injection: 0% pass rate. Basic "ignore previous instructions" attacks succeeded every time.Severe Performance Degradation: Latency spiked to ~30 seconds under stress, far exceeding reasonable timeouts.This isn't about one bad agent. It's a pattern suggesting our default "happy path" testing is insufficient. Agents that seem fine in demos can be fragile and insecure under real-world conditions.I'm sharing this to start a discussion:Are we underestimating the adversarial robustness needed for production AI agents?What testing strategies beyond static evals are proving effective?Is chaos engineering or adversarial testing a necessary new layer in the LLM dev stack?[1] Flakestorm GitHub (the tool used for testing): https://github.com/flakestorm/flakestorm

对LangChain代理进行测试时发现其在对抗性输入上的失败率高达95%。