Form16x – 简化报税季节:来自表格16的JSON输出和税制比较

1作者: taxedo5 个月前原帖
我厌倦了每年手动将 Form 16 PDF 中的数字复制到印度的税务申报门户中。 于是我开发了 *Form16x*,这是一个 Python 命令行工具和库,可以将这些 PDF 解析为结构化的 JSON。<p>除了提取数据,它还可以: - 如果你换了工作,可以合并多个 Form 16 - 在两种税制下计算税款 → 推荐更优的方案 - 直接在终端显示薪资/扣除明细(树状视图,彩色输出) - 提供税务优化建议(如 80C、80D、NPS 等) - 提供一个 Python API(`TaxCalculationAPI`),包含多年的税务规则(2020-2025 财年)<p>*代码库:* https://github.com/ri-sh/Form16x<p>Form 16 类似于美国的 W-2 或加拿大的 T4 — 是一种半结构化的 PDF,布局不一致。申报通常意味着手动输入数据。 Form16x 旨在使这一过程结构化并实现自动化。<p>希望能得到 HN 的反馈 — 无论是关于技术方案(PDF 解析 + 结构化提取),还是这种方法是否可以扩展到其他国家的税务表格。
查看原文
I got tired of manually copying numbers from Form 16 PDFs into India’s tax filing portal every year. So I built *Form16x*, a Python CLI + library that parses these PDFs into structured JSON.<p>Beyond extraction, it can: - Consolidate multiple Form 16s if you switched jobs - Calculate taxes under both regimes → recommends the better one - Show salary&#x2F;deduction breakdowns directly in the terminal (tree view, colored output) - Suggest tax optimizations (80C, 80D, NPS, etc.) - Provide a Python API (`TaxCalculationAPI`) with multi-year tax rules (AY 2020–2025)<p>*Repo:* https:&#x2F;&#x2F;github.com&#x2F;ri-sh&#x2F;Form16x<p>Form 16 is similar to a W-2 in the US or a T4 in Canada — semi-structured PDFs with inconsistent layouts. Filing usually means manual data entry. Form16x tries to make that structured and automatable.<p>Would love feedback from HN — both on the technical approach (PDF parsing + structured extraction) and whether this approach could extend to other countries’ tax forms.