我们在构建人工智能编程助手时是否走入了误区?
本周的Jason Lemkin/Replit事件让我思考我们在使用AI编码助手时所面临的根本问题。我们都见过演示——自然语言转化为可运行的代码、对话式调试、“只需描述你想要的”。但一旦投入生产环境,一切就会崩溃。
我一直看到的核心技术挑战包括:
- 大规模的上下文管理——这些系统在处理孤立任务时表现良好,但在复杂的多文件项目中却难以维持一致的状态。如何处理跨越数千行代码和数十个文件的上下文?
- 安全性与能力的权衡——更强大的工具可能造成更大的损害。Replit承诺不干扰生产环境,但仍然删除了一个数据库。如何构建真正有效的保护措施,而又不削弱工具的功能?
- 复杂系统的对话界面——自然语言是模糊的,而代码是精确的。我们是否在试图解决错误的界面问题?
- 生产差距——我测试过的每一个AI编码工具在演示中都表现得非常出色,但在真实的代码库、真实的数据和真实的边缘案例中却崩溃。为什么这个差距如此持久?
我真的很好奇——有没有人用那些在生产环境中真正可靠的工具构建AI应用?
查看原文
The Jason Lemkin/Replit incident this week got me thinking about the fundamental problems with how we're approaching AI coding assistants.
We've all seen the demos - natural language to working code, conversational debugging, "just describe what you want." But then you hit prod and everything breaks down.<p>The core technical challenges I keep seeing:<p>- Context management at scale - These systems work great for isolated tasks but struggle to maintain coherent state across complex, multi-file projects. How do you handle context that spans thousands of lines across dozens of files?<p>- The safety/capability tradeoff - More powerful tools can do more damage. Replit promised not to touch production, then deleted a database anyway. How do you build guardrails that actually work without neutering the tool?<p>- Conversational interfaces for complex systems - Natural language is ambiguous. Code is precise. Are we trying to solve the wrong interface problem?<p>- The production gap - Every AI coding tool I've tested works beautifully in demos and falls apart with real codebases, real data, real edge cases. Why is this gap so persistent?<p>I'm genuinely curious - has anyone built AI apps with tools that actually work reliably in prod?