展示HN:带有近似匹配的Unix压缩算法
LZW是一种用于压缩和GIF格式的算法。它是一种优雅而简单的算法(基于学习一个词典并将源数据编码为其索引),在极限情况下收敛于源的香农熵。
在2013年,我在学习生物信息学时,产生了一个将序列比对和编辑脚本应用于压缩的想法,而不仅仅是像LZW那样在字符串末尾进行添加。因此,LZW-X的构想早在很久以前就诞生了,但直到最近,借助人工智能的力量,我才能够正确地实现和测试它。
这是一个正确的实现,它揭示了我直觉到的:使用这种方法可以获得收益。我将其视为一个起点,为进一步探索奠定基础。
请查看:<a href="https://github.com/BrowserBox/LZW-X" rel="nofollow">https://github.com/BrowserBox/LZW-X</a>
查看原文
LZW is the algorithm used in compress and also in GIF. It is a beautifully elegant and simple algorithm (based on learning a dictionary of words, and encoding the source as their indices) that converges in the limit on the Shannon entropy of the source.<p>In 2013, I was studying bioinformatics and had an idea to apply something like sequence alignment and edit scripts to compression instead of just, as LZW, addition at the end of the string. So, the idea for LZW-X was born long ago, but it wasn't until recently, by the power of AI, that I could implement and test it properly.<p>This is that proper implementation and it reveals what I intuited: that there are gains to be had using a method like this. I consider this a first rung, a starting point for further exploration.<p>Check it out: <a href="https://github.com/BrowserBox/LZW-X" rel="nofollow">https://github.com/BrowserBox/LZW-X</a>