您会对经纪人导入管道强制执行哪些不变性?

2作者: julien_devv27 天前原帖
我正在为零售投资组合开发一个经纪人导入管道,但我低估了实际导出数据的复杂性。<p>我遇到的问题包括: - 同一经纪人的CSV、JSON和PDF导出格式 - 欧盟/美国数字格式的差异 - 日期优先与月份优先的歧义 - ISIN/代码/名称不匹配 - 重复行和部分头寸 - 错误解析导致成本基础的静默损坏<p>我目前的做法是优先考虑确定性: 1. 本地解析结构化导出数据 2. 仅在解析失败时使用大型语言模型(LLM)作为备选 3. 规范化符号并拒绝无效行 4. 在持久化之前要求人工审核 5. 保守地应用导入,以避免成本基础漂移<p>我正在努力清晰思考这个系统应该强制执行的不可变性。<p>对于那些在金融导入、会计系统或安全关键数据管道方面有经验的人: - 您绝对会强制执行哪些不可变性? - 您会在哪里划定确定性逻辑与LLM提取之间的界限? - 您会记录哪些内容以便重放/调试/审计?<p>如果有用,我很乐意分享实施细节。
查看原文
I’m working on a broker import pipeline for retail portfolios, and I underestimated how messy exports are in practice.<p>Problems I’ve seen: - CSV, JSON and PDF exports for the same broker - EU&#x2F;US number formats - date-first vs month-first ambiguity - ISIN&#x2F;ticker&#x2F;name mismatches - duplicate rows and partial positions - bad parses silently corrupting cost basis<p>My current approach is deterministic-first: 1. parse structured exports locally 2. only use an LLM fallback when parsing fails 3. normalize symbols and reject invalid rows 4. require human review before persistence 5. apply imports conservatively to avoid cost-basis drift<p>I’m trying to think clearly about the invariants this system should enforce.<p>For those who’ve worked on financial imports, accounting systems, or safety-critical data pipelines: - what invariants would you absolutely enforce? - where would you draw the boundary between deterministic logic and LLM extraction? - what would you log for replay&#x2F;debug&#x2F;auditability?<p>Happy to share implementation details if useful.