HackerNews中文版

在耗费了数千个积分用于AI构建工具后，我们不断遇到同样的问题：在演示中有效的应用程序在生产环境中却崩溃。核心问题并不在于构建过程，而是大多数AI应用的准确率停留在60-70%，这使得它们对真实用户来说不可用。我们意识到这些实际上并不是“AI应用构建工具”，而是带有ChatGPT外壳的网站构建工具。根本的架构问题包括： - 上下文遗忘：大多数构建工具遭遇对话状态丢失，迫使用户重复信息，并在迭代周期中浪费积分。 - 静态提示膨胀：应用构建工具试图通过将所有内容塞入庞大的五页提示中来处理边缘案例，这实际上使大型语言模型（LLMs）感到困惑，降低了性能。 - 黑箱优化：对各个组件缺乏细粒度控制，且没有透明的性能指标。我们的技术方法集中在动态AI响应优化架构上： 1. 上下文工程：持久的对话记忆和智能上下文发现消除了重复和迭代的问题。 2. 实时提示选择：我们不使用一个庞大的提示，而是维护专业的提示家族，并根据输入特征动态选择最佳提示（例如，旅行聊天机器人会根据洛杉矶和多伦多的上下文自动切换）。 3. 个别任务优化：对每个工作流组件进行细粒度控制，并提供透明的评分指标（例如，您可以单独优化工资查询和人力资源政策）。这 consistently achieves 98% accuracy vs industry 60-70% - and we can demonstrate this live with side-by-side comparisons. 但仅仅解决准确性还不够。我们还需要完整的生产基础设施： - 完整的AI堆栈：RAG、LLM操作、具有动态优化的真实后端（不仅仅是托管演示）。 - 生产部署：Docker容器、GitHub集成、本地部署选项。 - 性能透明度：可见的质量评分、边缘案例识别、系统优化。结果是：技术团队可以在没有专门的机器学习专业知识的情况下构建生产就绪的AI应用，同时保持业务关键部署所需的控制和可见性。技术创始人和开发者：请访问 <a href="https://builder.empromptu.ai" rel="nofollow">https://builder.empromptu.ai</a> 试用。我们非常希望听到HN社区的反馈，特别是如果您遇到类似的生产可靠性问题或对架构方法有想法。

查看原文

After burning through thousands of credits on AI builders, we kept hitting the same wall: applications that worked in demos but crashed in production. The core issue isn't the building process - it's that most AI applications plateau at 60-70% accuracy, which makes them unusable for real users.We realized these aren't actually "AI app builders" - they're website builders with ChatGPT wrappers. The fundamental architecture problems:- Context Amnesia: Most builders suffer from conversation state loss, forcing users to repeat information and burning credits on iteration cycles. - Static Prompt Bloat: App Builders try to handle edge cases by cramming everything into massive 5-page prompts, which actually confuses LLMs and degrades performance. - Black Box Optimization: No granular control over individual components or transparent performance metrics.Our technical approach centers on dynamic AI response optimization architecture:1. Context Engineering: Persistent conversation memory with intelligent context discovery eliminates the repeat-and-iterate problem2. Real-time Prompt Selection: Instead of one massive prompt, we maintain specialized prompt families and dynamically select optimal ones based on input characteristics (travel chatbot automatically switches between LAX context for LA vs Pearson for Toronto)3. Individual Task Optimization: Granular control over each workflow component with transparent scoring metrics (you can optimize payroll queries separately from HR policies)This consistently achieves 98% accuracy vs industry 60-70% - and we can demonstrate this live with side-by-side comparisons.But solving accuracy alone wasn't enough. We also needed complete production infrastructure:Full AI Stack: RAG, LLM operations, real backends with dynamic optimization (not just hosted demos)Production Deployment: Docker containers, GitHub integration, on-premise optionsPerformance Transparency: Visible quality scores, edge case identification, systematic optimizationThe result: Technical teams can build production-ready AI applications without dedicated ML expertise, while maintaining the control and visibility needed for business-critical deployments.Technical founders and developers: Try it at <a href="https://builder.empromptu.ai" rel="nofollow">https://builder.empromptu.ai</a>We'd love feedback from the HN community, especially if you've hit similar production reliability problems or have thoughts on the architectural approach.

展示HN：Empromptu.ai – 解决人工智能生产可靠性危机