请问HN:我们如何对数据提取、分析及后续的丰富进行基准测试?
我一直在开发一个系统,该系统需要从一系列结构化和非结构化的数据源中提取数据,分析数据的相关性和正确性(抱歉这部分有些模糊),并利用获得的数据填写表格和撰写报告。我不想盲目使用任何大型语言模型(LLM)。我希望在这三个(大致上)步骤中都有明确的改进依据。我们该如何做到这一点?如果有相关的基准测试,哪些系统在这些基准测试中处于领先地位?
查看原文
I have been working on this system that essentially has to extract data from a bunch or structured and unstructured sources, analyse relevance and correctness of the data (im sorry this part is vague) and using the data obtained to fill forms and create write-ups. i do not want to mindlessly use any llm. i want to improve with proof for each of those (broadly, 3) steps. how do we do that? if there are any benchmarks for all these, what systems are leading these benchmarks?