问HN:是什么阻碍了你的AI代理超越概念验证阶段?
我们一直在研究决策自动化技术,这项技术主要用于企业中构建像领域专家一样行为的系统。可以想象,这些模型基于结构化逻辑和知识,可以被查询以提供可审计和可解释的决策。
最近,我们开始思考这项技术是否可以帮助解决另一种问题:将基于大型语言模型(LLM)的智能体投入生产。
根据我们所见(以及我们自己的经验),使用像LangChain、AutoGen或CrewAI这样的工具来实现智能体原型相对容易,但将其转化为足够可靠和可信的实际应用则要困难得多。
我们感受到的一些问题包括:
- 智能体在相同输入下做出不同的决策
- 难以调试或信任的不透明推理
- 在演示中有效但在边缘案例中失效的工具使用
- 在生产环境中不成立的幻觉或不完整的决策
- 在行动前收集缺失信息的能力有限
这让我们思考:如果一个智能体能够收集数据,然后调用一个工具(我们的系统),使用一个定制的符号模型(由您创建),该模型能够进行推理、提出后续问题(供AI智能体或人类回答),并提供确定性、可解释和可重复的结果,这是否能帮助弥合与生产之间的差距?这样是否会更值得信赖?
我们试图了解这种方法在实际智能体实现中是否真的有用,如果有,适用于哪些类型的决策或工作流程。
非常希望听到任何在基于智能体系统方面有经验的人的分享:
- 你们构建了什么?
- 你们是否将任何东西投入生产?
- 在这个过程中最困难的是什么?
- 你认为确定性、一致性或可解释性最重要的地方在哪里?
我们并不想推销任何东西,因为我们还有很多工作要做,以使产品对开发者更友好,只是想知道这个想法是否可行,并向构建智能体的人学习。
感谢任何愿意分享经验的人。
查看原文
We’ve been working on decision automation tech that’s mostly been used in enterprise for building systems that behave like domain experts. Think models based on structured logic and knowledge, which can be queried to provide decisions that are auditable and explainable.
Recently, we’ve started wondering whether this could help with a different kind of problem: getting LLM-based agents into production.<p>From what we’ve seen (and experienced ourselves), it’s relatively easy to get an agent prototype working with tools like LangChain, AutoGen, or CrewAI, but much harder to move that into something reliable and trustworthy enough for real use.<p>Some of the issues we’ve felt:<p>-Agents making different decisions from the same input<p>-Opaque reasoning that’s hard to debug or trust<p>-Tool use that works in demos but fails under edge cases<p>-Hallucinated or incomplete decisions that don’t stand up in production<p>-Limited ability to gather missing info before acting<p>It’s got us thinking: if an agent could collate data, then call a tool (our system) with a bespoke symbolic model (that you created) that could reason, ask follow-up questions (for an AI agent or human to answer) and provides results that are deterministic, explainable, and repeatable, would that help bridge the gap to production? Would this be more trustworthy?<p>We’re trying to understand whether this kind of approach would actually be useful in real-world agent implementations, and if so, for what kinds of decisions or workflows.<p>Would really appreciate hearing from anyone who’s been working on agent-based systems:<p>-What have you built?<p>-Have you shipped anything to production?<p>-What’s been hardest about that process?<p>-Where do you think determinism, consistency, or explainability would matter most?<p>Not selling anything, as we’d have lots of work to do to make the product more developer friendly anyway, just want to know whether the idea has legs and to learn from people building agents.<p>Thanks in advance to anyone willing to share.