HackerNews中文版

嗨，HN，对于那些在真实生产环境中运行大型语言模型（尤其是自主或工具使用系统）的人：你们采取了哪些有效措施来防止自信但错误的输出？提示工程和基本过滤器确实有帮助，但我们仍然遇到过一些情况，响应看起来流畅、结构合理且合乎逻辑——但却违反了业务规则、领域边界或下游假设。我很好奇：你们是否依赖严格的模式或类型化输出？使用二次验证模型或规则引擎吗？对于某些类别的操作，是否有人参与其中？在执行前是否有严格的约束（例如，允许/拒绝列表）？哪些方法对你们来说失败了，哪些在规模和真实用户行为下仍然有效？我对实际经验教训和事后分析更感兴趣，而不是理论。

查看原文

Hi HN,For those running LLMs in real production environments (especially agentic or tool-using systems): what’s actually worked for you to prevent confident but incorrect outputs?Prompt engineering and basic filters help, but we’ve still seen cases where responses look fluent, structured, and reasonable — yet violate business rules, domain boundaries, or downstream assumptions.I’m curious:Do you rely on strict schemas or typed outputs?Secondary validation models or rule engines?Human-in-the-loop for certain classes of actions?Hard constraints before execution (e.g., allow/deny lists)?What approaches failed for you, and what held up under scale and real user behavior?Interested in practical lessons and post-mortems rather than theory.

请问HN：你们是如何在生产系统中防止大型语言模型（LLM）产生幻觉的？