团队如何防止重复的LLM API调用和令牌浪费?
我很好奇在生产环境中,使用大型语言模型(LLM)应用的团队是如何处理重复或冗余的API调用的。
在实验LLM API时,我注意到相同的提示有时会在应用的不同部分反复发送,这导致了不必要的令牌使用和更高的API成本。
对于在生产环境中使用OpenAI、Anthropic或类似API的团队:
你们目前是如何检测或防止重复提示或冗余调用的?
你们是依赖日志和仪表板、缓存层、内部代理服务,还是其他方法?
还是说这通常被认为是一个小问题,大多数团队只是将其视为正常使用的一部分?
查看原文
I'm curious how teams running LLM-heavy applications handle duplicate or redundant API calls in production.<p>While experimenting with LLM APIs, I noticed that the same prompt can sometimes be sent repeatedly across different parts of an application, which leads to unnecessary token usage and higher API costs.<p>For teams using OpenAI, Anthropic, or similar APIs in production:
How do you currently detect or prevent duplicate prompts or redundant calls?
Do you rely on logging and dashboards, caching layers, internal proxy services, or something else?
Or is this generally considered a minor issue that most teams just accept as part of normal usage?