光标与反重力:一周实际使用后的对比

1作者: okaris26 天前原帖
在2026年的第一周,我意外地连续使用了Cursor和Google Antigravity,并不是计划中的,而是因为我比预期更快地耗尽了两个Cursor Ultra订阅,决定尝试Antigravity的免费版本。 我的正常使用费用大约是每月60到100美元。然而,几天内费用飙升至500美元以上,仪表板预测每月约1600美元。最大模式是关闭的,用户界面始终显示200k的上下文窗口。 我最终拼凑出的事实是,Cursor维护了一个大型的隐藏提示状态。除了可见的对话历史外,这还包括工具痕迹、代理状态、推理框架和大量的代码库上下文。这个状态是通过Claude的缓存功能进行提示缓存的,每次请求时,完整的缓存前缀都会被重放。 Anthropic在每次读取缓存时都会收费,即使这些内容在实际推理之前被总结或截断。 以下是我日志中的一个具体例子: - 实际用户输入约4000个token - 缓存读取token约2100万个 - 总计收费token约2200万个 - 单次调用费用约12美元 这并不仅限于Opus。我在Sonnet中也看到了相同的模式。 支持团队解释说,这与底层API的计费方式相匹配。在我看来,问题不在于准确性,而在于可见性。费用已经与我在产品中能看到或推理的任何内容脱钩。我无法检查缓存大小、理解重放行为或设置有意义的保护措施。 于是我取消了订阅,并借此机会尝试Google Antigravity。 免费版本比我预期的更易用。它提供对Opus 4.5的访问,这仍然是我在非琐碎编码工作中的首选模型。对于简单到中等复杂度的任务,它通常能顺利完成。限制不够透明(免费→专业→超高级的描述非常抽象),但当你达到限制时,至少会收到明确的冷却消息,告诉你Opus何时会再次可用。 当Opus用尽时,Antigravity会回退到Gemini模型。这对于比较很有用。在处理混乱、不断演变的代码库时,Gemini Flash、Pro和Thinking在架构决策上始终失去一致性,产生的临时更改不尊重现有约束。 其中一些是模型质量的问题,但并非全部。Cursor的代理在收集相关代码库状态、形成计划和一致执行方面做得更好。Antigravity的代理感觉较薄弱,我发现自己花更多时间审查和纠正差异,以保持不变性。 还有一些小问题。标签补全被宣传为无限制,但我无法可靠地使用它。回到基本的自动补全让我意识到我的工作流程是多么依赖良好的标签补全。在用户体验方面,Antigravity感觉更慢。也许不是原始延迟,但响应的流动和动画方式并没有让我保持在Cursor的同一循环中。 总体结果是:Antigravity的免费版本是一个不错的起步和实验选项,尤其是当预算有限时。我还不会为此付费。Cursor仍然是一个强大的产品,但不透明的缓存和计费行为使得在大规模使用时很难推理成本。 为了提供一些背景,我在inference.sh上自己构建一个代理运行时,专注于显式状态、持久执行和可靠的深度代理与复杂工具使用。因此,我可能比大多数人对代理编排、隐藏状态以及这些设计选择如何影响成本的差异更加敏感。 这整个经历强化了我已经相信的一点:在代理系统中,隐藏状态是危险的。隐藏状态与不透明的计费结合在一起更糟。如果用户无法看到状态,他们就无法推理成本。如果他们无法推理成本,他们就不会信任系统。 现在,如果你正在进行一些没有做过一千次的工作,这些编码代理都不是“设置后就可以忘记”的。你仍然需要保持掌控。
查看原文
in the first week of 2026 i ended up using cursor and google antigravity back to back, not by plan but because i burned through two cursor ultra subscriptions faster than expected and decided to try antigravity on the free tier.<p>my normal usage is ~$60–100&#x2F;month. within a few days it jumped to $500+, with the dashboard projecting ~$1.6k&#x2F;month. max mode was off, and the ui consistently showed a 200k context window.<p>what i eventually pieced together is that cursor maintains a large hidden prompt state. beyond visible conversation history, this includes tool traces, agent state, reasoning scaffolding, and large chunks of repo context. that state is prompt-cached using claude’s cache feature, and on every request the full cached prefix is replayed.<p>anthropic bills cache reads every time this happens, even if that content is later summarized or truncated before actual inference.<p>one concrete example from my logs: • actual user input ~4k tokens • cache read tokens ~21 million • total billed tokens ~22 million • cost for a single call ~$12<p>this wasn’t limited to opus. i saw the same pattern with sonnet.<p>support explained that this matched how the underlying api is billed. from my perspective, the issue wasn’t correctness but visibility. cost had become decoupled from anything i could see or reason about in the product. i had no way to inspect cache size, understand replay behavior, or set meaningful guardrails.<p>i canceled and treated it as an excuse to try google antigravity.<p>the free tier was more usable than i expected. it gives access to opus 4.5, which is still my preferred model for nontrivial coding work. for easy to moderate complexity tasks, it usually finishes cleanly. limits are opaque (free → pro → ultra is described in very abstract terms), but when you hit a limit you at least get a clear cooldown message telling you when opus will be available again.<p>when opus is exhausted, antigravity falls back to gemini models. that was useful for comparison. for real coding work on a messy, evolving codebase, gemini flash, pro, and thinking consistently lost architectural decisions and produced one-off changes that didn’t respect existing constraints.<p>some of that is model quality, but not all of it. cursor’s agent does a better job gathering relevant repo state, forming a plan, and executing it coherently. antigravity’s agent feels thinner, and i found myself spending more time reviewing and correcting diffs to preserve invariants.<p>there were also smaller papercuts. tab completion is advertised as unlimited, but i couldn’t get it working reliably. going back to basic autocomplete was a reminder of how dependent my workflow is on good tab completion. ux-wise, antigravity feels slower. maybe not raw latency, but the way responses stream and animate doesn’t keep me in the same loop cursor does.<p>net result: antigravity free is a solid option for starting out and experimenting, especially if budget matters. i wouldn’t pay for it yet. cursor is still a strong product, but opaque caching and billing behavior makes it hard to reason about cost at scale.<p>for context, i’m building an agent runtime myself at inference.sh, focused on explicit state, durable execution, and reliable deep agents with complex tool use. because of that, i’m probably more sensitive than most to differences in agent orchestration, hidden state, and how cost emerges from those design choices.<p>this whole experience reinforced something i already believed: hidden state is dangerous in agent systems. hidden state combined with opaque billing is worse. if users can’t see state, they can’t reason about cost. and if they can’t reason about cost, they won’t trust the system.<p>right now, none of these coding agents are “set and forget” if you’re doing work that hasn’t been done a thousand times before. you still have to stay in charge.