展示 HN:WatchLLM – 逐步调试 AI 代理并进行成本归属
嗨,HN!我开发了WatchLLM,旨在解决在构建AI代理时遇到的两个问题:
1. 调试代理非常痛苦 - 当你的代理进行20次工具调用并失败时,想要弄清楚哪个决策出错了可真是个挑战。WatchLLM提供逐步时间线,显示每个决策、工具调用和模型响应,并解释代理为何做出这些决策。
2. 代理成本迅速上升 - 代理喜欢陷入循环或重复调用昂贵的工具。WatchLLM跟踪每一步的成本,并标记异常情况,比如“检测到循环 - 相同操作重复3次,浪费了$0.012”或“高成本步骤 - $0.08超出阈值”。
核心功能:
- 每个代理决策的时间线视图及成本细分
- 异常检测(循环、重复工具、高成本步骤)
- 语义缓存,额外减少40-70%的LLM费用
它与OpenAI、Anthropic、Groq兼容,只需更改你的baseURL。
该工具基于ClickHouse构建,提供实时遥测,并使用向量相似性作为缓存层。代理调试器通过LLM生成的摘要解释每个步骤发生的原因。
目前,它对每月最多50,000次请求免费开放。我正在寻找早期用户,他们正在构建代理,并希望更好地观察实际发生的情况(以及相关成本)。
试试吧: [https://watchllm.dev](https://watchllm.dev)
非常希望能听到你对其他调试功能的反馈。你希望在代理出现问题时拥有哪些功能?
查看原文
Hi HN! I built WatchLLM to solve two problems I kept hitting while building AI agents:<p>1. Debugging agents is painful - When your agent makes 20 tool calls and fails, good luck figuring out which decision was wrong. WatchLLM gives you a step-by-step timeline showing every decision, tool call, and model response with explanations for why the agent did what it did.<p>2. Agent costs spiral fast - Agents love getting stuck in loops or calling expensive tools repeatedly. WatchLLM tracks cost per step and flags anomalies like "loop detected - same action repeated 3x, wasted $0.012" or "high cost step - $0.08 exceeds threshold".<p>The core features:<p>Timeline view of every agent decision with cost breakdown
Anomaly detection (loops, repeated tools, high-cost steps)
Semantic caching that cuts 40-70% off your LLM bill as a bonus
Works with OpenAI, Anthropic, Groq - just change your baseURL<p>It's built on ClickHouse for real-time telemetry and uses vector similarity for the caching layer. The agent debugger explains decisions using LLM-generated summaries of why each step happened.
Right now it's free for up to 50K requests/month. I'm looking for early users who are building agents and want better observability into what's actually happening (and what it's costing).
Try it: <a href="https://watchllm.dev" rel="nofollow">https://watchllm.dev</a>
Would love feedback on what other debugging features would be useful. What do you wish you had when your agents misbehave?