请问HN:你们是如何监控生产环境中的AI代理的?
最近发生的事件(Claude Code 删除 DataTalks 数据库,Replit 代理在代码冻结期间删除数据)表明,在没有可观察性的情况下在生产环境中运行 AI 代理是有风险的。
我见过的常见故障模式包括:对代理执行的每一步缺乏可见性、因未跟踪的令牌使用而产生意外的 LLM 账单、风险输出未被检测到,以及缺乏事后审计轨迹。
我正在构建 AgentShield(https://useagentshield.com)——一个用于 AI 代理的可观察性 SDK。它提供执行追踪、输出风险检测、按代理/模型的成本跟踪,以及对高风险操作的人为审批。它可以与 LangChain、CrewAI 和 OpenAI Agents SDK 进行两行代码的集成。
我很好奇其他人使用的是什么。你们是自己开发监控工具吗?使用 LangSmith?Langfuse?还是只是寄希望于好运?
查看原文
With the recent incidents (DataTalks database wipe by Claude Code, Replit agent deleting data during code freeze), it's clear that running AI agents in production without observability is risky.<p>Common failure modes I've seen: no visibility into what the agent did step-by-step, surprise LLM bills from untracked token usage, risky outputs going undetected, and no audit trail for post-mortems.<p>I've been building AgentShield (https://useagentshield.com) — an observability SDK for AI agents. It does execution tracing, risk detection on outputs, cost tracking per agent/model, and human-in-the-loop approval for high-risk actions. Plugs into LangChain, CrewAI, and OpenAI Agents SDK with a 2-line integration.<p>Curious what others are using. Rolling your own monitoring? LangSmith? Langfuse? Or just hoping for the best?