请问HN:你们是如何在生产系统中防止大型语言模型(LLM)产生幻觉的?
嗨,HN,
对于那些在真实生产环境中运行大型语言模型(尤其是自主或工具使用系统)的人:你们采取了哪些有效措施来防止自信但错误的输出?
提示工程和基本过滤器确实有帮助,但我们仍然遇到过一些情况,响应看起来流畅、结构合理且合乎逻辑——但却违反了业务规则、领域边界或下游假设。
我很好奇:
你们是否依赖严格的模式或类型化输出?
使用二次验证模型或规则引擎吗?
对于某些类别的操作,是否有人参与其中?
在执行前是否有严格的约束(例如,允许/拒绝列表)?
哪些方法对你们来说失败了,哪些在规模和真实用户行为下仍然有效?
我对实际经验教训和事后分析更感兴趣,而不是理论。
查看原文
Hi HN,<p>For those running LLMs in real production environments (especially agentic or tool-using systems): what’s actually worked for you to prevent confident but incorrect outputs?<p>Prompt engineering and basic filters help, but we’ve still seen cases where responses look fluent, structured, and reasonable — yet violate business rules, domain boundaries, or downstream assumptions.<p>I’m curious:<p>Do you rely on strict schemas or typed outputs?<p>Secondary validation models or rule engines?<p>Human-in-the-loop for certain classes of actions?<p>Hard constraints before execution (e.g., allow/deny lists)?<p>What approaches failed for you, and what held up under scale and real user behavior?<p>Interested in practical lessons and post-mortems rather than theory.