展示 HN:WatchLLM – 逐步调试 AI 代理并进行成本归属

1作者: Kaadz22 天前原帖
嗨,HN!我开发了WatchLLM,旨在解决在构建AI代理时遇到的两个问题: 1. 调试代理非常痛苦 - 当你的代理进行20次工具调用并失败时,想要弄清楚哪个决策出错了可真是个挑战。WatchLLM提供逐步时间线,显示每个决策、工具调用和模型响应,并解释代理为何做出这些决策。 2. 代理成本迅速上升 - 代理喜欢陷入循环或重复调用昂贵的工具。WatchLLM跟踪每一步的成本,并标记异常情况,比如“检测到循环 - 相同操作重复3次,浪费了$0.012”或“高成本步骤 - $0.08超出阈值”。 核心功能: - 每个代理决策的时间线视图及成本细分 - 异常检测(循环、重复工具、高成本步骤) - 语义缓存,额外减少40-70%的LLM费用 它与OpenAI、Anthropic、Groq兼容,只需更改你的baseURL。 该工具基于ClickHouse构建,提供实时遥测,并使用向量相似性作为缓存层。代理调试器通过LLM生成的摘要解释每个步骤发生的原因。 目前,它对每月最多50,000次请求免费开放。我正在寻找早期用户,他们正在构建代理,并希望更好地观察实际发生的情况(以及相关成本)。 试试吧: [https://watchllm.dev](https://watchllm.dev) 非常希望能听到你对其他调试功能的反馈。你希望在代理出现问题时拥有哪些功能?
查看原文
Hi HN! I built WatchLLM to solve two problems I kept hitting while building AI agents:<p>1. Debugging agents is painful - When your agent makes 20 tool calls and fails, good luck figuring out which decision was wrong. WatchLLM gives you a step-by-step timeline showing every decision, tool call, and model response with explanations for why the agent did what it did.<p>2. Agent costs spiral fast - Agents love getting stuck in loops or calling expensive tools repeatedly. WatchLLM tracks cost per step and flags anomalies like &quot;loop detected - same action repeated 3x, wasted $0.012&quot; or &quot;high cost step - $0.08 exceeds threshold&quot;.<p>The core features:<p>Timeline view of every agent decision with cost breakdown Anomaly detection (loops, repeated tools, high-cost steps) Semantic caching that cuts 40-70% off your LLM bill as a bonus Works with OpenAI, Anthropic, Groq - just change your baseURL<p>It&#x27;s built on ClickHouse for real-time telemetry and uses vector similarity for the caching layer. The agent debugger explains decisions using LLM-generated summaries of why each step happened. Right now it&#x27;s free for up to 50K requests&#x2F;month. I&#x27;m looking for early users who are building agents and want better observability into what&#x27;s actually happening (and what it&#x27;s costing). Try it: <a href="https:&#x2F;&#x2F;watchllm.dev" rel="nofollow">https:&#x2F;&#x2F;watchllm.dev</a> Would love feedback on what other debugging features would be useful. What do you wish you had when your agents misbehave?