我构建了一个基于开源大型语言模型的收据生成器——这是我这么做的原因

2作者: maxime_wellapp8 个月前原帖
如果你曾经使用人工智能模型解析现实世界的文档,比如发票或收据,你会知道一个真相:好的测试数据是非常难以找到的。<p>真实的收据杂乱多样,且通常涉及隐私。PDF模板脆弱且过于整洁。OCR输出结果不一致。而一旦超出英语或简单格式,情况会变得更加复杂。<p>这就是我构建这个工具的原因:<p>GitHub: WellApp-ai/ai-receipt-generator 示例输出: imgur.com/a/YtFSodj<p>接下来有什么计划?<p>目前它支持: - OpenAI模型(通过API) - 通过Faker进行本地生成 - YAML配置的生成流程<p>即将推出: - 支持Claude、Gemini、Mistral等 - 更多内置的模式预设 - 预定义的提示模板(按地区、行业、语言)<p>我们还计划在内部进行自用,以自动评估我们自己的解析引擎。
查看原文
If you’ve ever worked with AI models to parse real-world documents like invoices or receipts, you know one truth: good test data is painfully hard to find.<p>Real receipts are noisy, diverse, and often private. PDF templates are brittle and too clean. OCR outputs are inconsistent. And once you move beyond English or simple formats, it gets even messier.<p>That’s why I built this:<p>GitHub: WellApp-ai&#x2F;ai-receipt-generator Example output: imgur.com&#x2F;a&#x2F;YtFSodj<p>What’s Next?<p>Right now it supports: - OpenAI models (via API) - Local generation via Faker - YAML-configured generation flows<p>Coming soon: - Support for Claude, Gemini, Mistral, etc - More built-in schema presets - Predefined prompt templates (by region, industry, language)<p>We’re also planning to dogfood this internally for auto-evaluations of our own parsing engine.