请问HN:你们是如何在生产系统中防止大型语言模型(LLM)产生幻觉的?

1作者: kundan_s__r24 天前原帖
嗨,HN, 对于那些在真实生产环境中运行大型语言模型(尤其是自主或工具使用系统)的人:你们采取了哪些有效措施来防止自信但错误的输出? 提示工程和基本过滤器确实有帮助,但我们仍然遇到过一些情况,响应看起来流畅、结构合理且合乎逻辑——但却违反了业务规则、领域边界或下游假设。 我很好奇: 你们是否依赖严格的模式或类型化输出? 使用二次验证模型或规则引擎吗? 对于某些类别的操作,是否有人参与其中? 在执行前是否有严格的约束(例如,允许/拒绝列表)? 哪些方法对你们来说失败了,哪些在规模和真实用户行为下仍然有效? 我对实际经验教训和事后分析更感兴趣,而不是理论。
查看原文
Hi HN,<p>For those running LLMs in real production environments (especially agentic or tool-using systems): what’s actually worked for you to prevent confident but incorrect outputs?<p>Prompt engineering and basic filters help, but we’ve still seen cases where responses look fluent, structured, and reasonable — yet violate business rules, domain boundaries, or downstream assumptions.<p>I’m curious:<p>Do you rely on strict schemas or typed outputs?<p>Secondary validation models or rule engines?<p>Human-in-the-loop for certain classes of actions?<p>Hard constraints before execution (e.g., allow&#x2F;deny lists)?<p>What approaches failed for you, and what held up under scale and real user behavior?<p>Interested in practical lessons and post-mortems rather than theory.