展示HN:SJT - 一种轻量级结构化JSON表格格式,用于API

1作者: yukiakai5 个月前原帖
嗨,HN, 我创建了一种名为 SJT(结构化 JSON 表)的实验性格式,以优化 API 中的数据传输。这个想法很简单:SJT 将结构(表头)与值分开,而不是在每一行中重复对象键。这使得数据更加紧凑,也更易于流式传输。 例如,使用 Discord 的 /messages 端点: 原始 JSON 负载:约 50,110 字节 使用 SJT 编码的相同数据:约 26,494 字节 因此,您可以获得约 50% 的大小减少,同时仍然能够逐条解码(记录逐条)。令人惊讶的是,解码的速度甚至可能比普通 JSON 更快,因为字符串解析的开销更小。 快速基准测试: | 格式 | 大小 (KB) | 编码时间 | 解码时间 | | ------------ | --------- | ----------- | ----------- | | JSON | 3849.34 | 41.81 ms | 51.86 ms | | JSON + Gzip | 379.67 | 55.66 ms | 39.61 ms | | MessagePack | 2858.83 | 51.66 ms | 74.53 ms | | SJT (json) | 2433.38 | 36.76 ms | 42.13 ms | | SJT + Gzip | 359.00 | 69.59 ms | 46.82 ms | 测试条件: 数据集:合成的表格数据集,包含 50,000 条记录,具有混合的基本字段、嵌套数组和嵌套对象(代表大型 REST API 负载)。 运行环境:Node.js 20(V8 引擎)。 实现语言:JavaScript(通过 sjt.js)。 大小 (KB):未压缩的大小,以千字节为单位(对二进制格式的估算)。 编码/解码 (ms):序列化/反序列化整个数据集的平均时间(毫秒)。 规格: [https://github.com/SJTF/SJT](https://github.com/SJTF/SJT) JS 实现:[https://github.com/yukiakai212/SJT.js](https://github.com/yukiakai212/SJT.js) 希望听到曾与 JSON 重型 API、流式传输或紧凑数据格式(如 CSV、Parquet 等)打过交道的人的反馈。
查看原文
Hi HN, I built a small experimental format called SJT (Structured JSON Table) to optimize data transport in APIs. The idea is simple: instead of repeating object keys for every row, SJT separates the structure (headers) from the values. This makes it both more compact and easier to stream.<p>For example, with Discord’s &#x2F;messages endpoint:<p>Raw JSON payload: ~50,110 bytes<p>Same data encoded with SJT: ~26,494 bytes<p>So you get about a 50% reduction in size, while still being able to decode incrementally (record by record). Surprisingly, decoding can even be faster than plain JSON, because there’s less string parsing overhead.<p>Quick benchmark:<p>| Format | Size (KB) | Encode Time | Decode Time |<p>| ----------- | --------- | ----------- | ----------- | | JSON | 3849.34 | 41.81 ms | 51.86 ms |<p>| JSON + Gzip | 379.67 | 55.66 ms | 39.61 ms |<p>| MessagePack | 2858.83 | 51.66 ms | 74.53 ms |<p>| SJT (json) | 2433.38 | 36.76 ms | 42.13 ms |<p>| SJT + Gzip | 359.00 | 69.59 ms | 46.82 ms |<p>Test conditions:<p>Dataset: Synthetic tabular dataset containing 50,000 records with mixed primitive fields, nested arrays, and nested objects (representative of large REST API payloads).<p>Runtime: Node.js 20 (V8 engine).<p>Implementation: JavaScript (via sjt.js).<p>Size (KB): Uncompressed size in kilobytes (estimated for binary formats).<p>Encode &#x2F; Decode (ms): Average time in milliseconds to serialize&#x2F;deserialize the entire dataset.<p>Spec: <a href="https:&#x2F;&#x2F;github.com&#x2F;SJTF&#x2F;SJT" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;SJTF&#x2F;SJT</a><p>JS implementation: <a href="https:&#x2F;&#x2F;github.com&#x2F;yukiakai212&#x2F;SJT.js" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;yukiakai212&#x2F;SJT.js</a><p>Curious to hear feedback from people who have worked with JSON-heavy APIs, streaming, or compact data formats (CSV, Parquet, etc.).