Form16x – 简化报税季节:来自表格16的JSON输出和税制比较
我厌倦了每年手动将 Form 16 PDF 中的数字复制到印度的税务申报门户中。
于是我开发了 *Form16x*,这是一个 Python 命令行工具和库,可以将这些 PDF 解析为结构化的 JSON。<p>除了提取数据,它还可以:
- 如果你换了工作,可以合并多个 Form 16
- 在两种税制下计算税款 → 推荐更优的方案
- 直接在终端显示薪资/扣除明细(树状视图,彩色输出)
- 提供税务优化建议(如 80C、80D、NPS 等)
- 提供一个 Python API(`TaxCalculationAPI`),包含多年的税务规则(2020-2025 财年)<p>*代码库:* https://github.com/ri-sh/Form16x<p>Form 16 类似于美国的 W-2 或加拿大的 T4 — 是一种半结构化的 PDF,布局不一致。申报通常意味着手动输入数据。
Form16x 旨在使这一过程结构化并实现自动化。<p>希望能得到 HN 的反馈 — 无论是关于技术方案(PDF 解析 + 结构化提取),还是这种方法是否可以扩展到其他国家的税务表格。
查看原文
I got tired of manually copying numbers from Form 16 PDFs into India’s tax filing portal every year.
So I built *Form16x*, a Python CLI + library that parses these PDFs into structured JSON.<p>Beyond extraction, it can:
- Consolidate multiple Form 16s if you switched jobs
- Calculate taxes under both regimes → recommends the better one
- Show salary/deduction breakdowns directly in the terminal (tree view, colored output)
- Suggest tax optimizations (80C, 80D, NPS, etc.)
- Provide a Python API (`TaxCalculationAPI`) with multi-year tax rules (AY 2020–2025)<p>*Repo:* https://github.com/ri-sh/Form16x<p>Form 16 is similar to a W-2 in the US or a T4 in Canada — semi-structured PDFs with inconsistent layouts. Filing usually means manual data entry.
Form16x tries to make that structured and automatable.<p>Would love feedback from HN — both on the technical approach (PDF parsing + structured extraction) and whether this approach could extend to other countries’ tax forms.