HackerNews中文版

我已经非常习惯使用“大型”语言模型来分析PDF文件。现在，llama.cpp支持视觉功能；我在本地（通过LM Studio）尝试了PDF文件，但结果并没有我预期的那么好。有一次，它坚持说无法进行“光学字符识别”（OCR），但却给了我一个数据可能是什么样子的示例——实际上就是数据本身。另一个主要问题是，有时PDF实际上是由图像组成的；在处理这些文件时，它也变得非常困惑。鉴于这一切都是如此新颖，我很难找到任何可以简化这个过程的工具。

查看原文

I've got very used to using the "big" LLMs for analysing PDFs<p>Now llama.cpp has vision support; I tried out PDFs with it locally (via LM Studio) but the results weren't as good as I hoped for. One time it insisted it couldn't do "OCR", but gave me an example of what the data _could_ look like - which was the data.<p>The other major problem is sometimes PDFs are actually made up of images; and it got super confused on those as well.<p>Given this is so new I'm struggling to find any tools which make this easier.

请问HN：用于处理PDF的最佳本地大型语言模型工具是什么？