请问HN:你们是如何监控生产环境中的AI代理的?

4作者: jairooh28 天前原帖
最近发生的事件(Claude Code 删除 DataTalks 数据库,Replit 代理在代码冻结期间删除数据)表明,在没有可观察性的情况下在生产环境中运行 AI 代理是有风险的。 我见过的常见故障模式包括:对代理执行的每一步缺乏可见性、因未跟踪的令牌使用而产生意外的 LLM 账单、风险输出未被检测到,以及缺乏事后审计轨迹。 我正在构建 AgentShield(https://useagentshield.com)——一个用于 AI 代理的可观察性 SDK。它提供执行追踪、输出风险检测、按代理/模型的成本跟踪,以及对高风险操作的人为审批。它可以与 LangChain、CrewAI 和 OpenAI Agents SDK 进行两行代码的集成。 我很好奇其他人使用的是什么。你们是自己开发监控工具吗?使用 LangSmith?Langfuse?还是只是寄希望于好运?
查看原文
With the recent incidents (DataTalks database wipe by Claude Code, Replit agent deleting data during code freeze), it&#x27;s clear that running AI agents in production without observability is risky.<p>Common failure modes I&#x27;ve seen: no visibility into what the agent did step-by-step, surprise LLM bills from untracked token usage, risky outputs going undetected, and no audit trail for post-mortems.<p>I&#x27;ve been building AgentShield (https:&#x2F;&#x2F;useagentshield.com) — an observability SDK for AI agents. It does execution tracing, risk detection on outputs, cost tracking per agent&#x2F;model, and human-in-the-loop approval for high-risk actions. Plugs into LangChain, CrewAI, and OpenAI Agents SDK with a 2-line integration.<p>Curious what others are using. Rolling your own monitoring? LangSmith? Langfuse? Or just hoping for the best?