HackerNews中文版

嗨，HN，我是蒂娜。我一直在探索如何提高大型语言模型的可靠性。其中一个持续存在的问题是“幻觉”——模型可能会给出自信的答案，但这些答案在事实层面上是错误的，或者基于不存在的来源。这在金融、法律或研究等对准确性要求较高的领域尤其危险。为了解决这个问题，我正在构建CompareGPT，旨在使人工智能的输出更加可信。我们正在进行的一些关键更新包括： - 置信评分：每个答案都会显示其可靠性。 - 来源验证：突出显示数据是否可以通过参考文献支持。 - 多模型比较：提出一个问题，查看不同模型的并排响应。在这里试试：<a href="https://comparegpt.io/home" rel="nofollow">https://comparegpt.io/home</a> 目前，它在基于知识的查询（如金融、法律、科学）中效果最佳。我们仍在解决一些局限性——例如，目前不支持图像输入。我很想听听你的想法，特别是它在哪些方面表现不佳或在哪些方面最有用。欢迎给出尖锐的反馈。谢谢！

查看原文

Hi HN, I’m Tina I’ve been exploring how to make large language models more reliable. One persistent issue is hallucinations — models can produce confident answers that are factually wrong or based on non-existent sources. This is especially risky for fields like finance, law, or research where accuracy matters. To address this, I’ve been building CompareGPT, which focuses on making AI outputs more trustworthy. Key updates we’ve been working on: Confidence scoring: every answer shows how reliable it is. Source validation: highlights whether data can be backed by references. Multi-model comparison: ask one question, see how different models respond side by side. Try it here: <a href="https://comparegpt.io/home" rel="nofollow">https://comparegpt.io/home</a> It currently works best with knowledge-based queries (finance, law, science). We’re still ironing out limitations — for example, image input isn’t supported yet. I’d love to hear what you think, especially where it fails or where it could be most useful. Brutal feedback welcome Thanks!

展示HN：CompareGPT – 可信赖的AI答案，带有信心和来源