展示HN:CCCP – 一种可编程的、上下文感知的压缩协议(早期阶段)
我一直在琢磨一个我称之为 CCCP 的想法——上下文感知可组合压缩协议。
大多数压缩格式将这个过程视为一个黑箱:你输入字节,输出字节。我希望有一个可编程和可组合的格式,能够适应不同的领域——甚至可以由不同的供应商进行定制。
到目前为止,CCCP 具有一些有趣的特性:
- 可组合:可以组合多个查找表(LUT)和编码阶段。
- 上下文感知:解码过程由显式元数据指导,而不仅仅是原始字节流。
- 可回溯的中间表示:中间表示可以在最终的二进制压缩之前重建原始逻辑。
- 可编程:供应商可以插入自己的 LUT、编码器和解码器。
这仍然处于非常早期和实验阶段。如果有人见过类似的方法,或者在实际使用中可能出现的问题,我非常想听听。
代码库:
- [https://github.com/brucekaushik/cccp](https://github.com/brucekaushik/cccp)
- [https://github.com/brucekaushik/cccp-python-poc](https://github.com/brucekaushik/cccp-python-poc)
查看原文
I have been tinkering with an idea I call CCCP — Context-Aware Composable Compression Protocol.<p>Most compression formats treat the process as a black box: you feed bytes in, you get bytes out.
I wanted something programmable and composable, where the format itself can be adapted to different domains — and even customized by different vendors.<p>So far, CCCP has a few interesting properties:<p>Composable: Multiple LUTs (look-up tables) and encoding phases can be combined.<p>Context-aware: Decoding is guided by explicit metadata, not just raw byte streams.<p>Round-trippable IR: The intermediate representation can reconstruct the original logic before final binary compression.<p>Programmable: Vendors can plug in their own LUTs, encoders, and decoders.<p>It is still very early and experimental. Would love to hear if anyone has seen similar approaches, or where this might break down in real-world usage.<p>Repos:<p><a href="https://github.com/brucekaushik/cccp" rel="nofollow">https://github.com/brucekaushik/cccp</a><p><a href="https://github.com/brucekaushik/cccp-python-poc" rel="nofollow">https://github.com/brucekaushik/cccp-python-poc</a>