Modern LZ Compression (2019)
dang 2021-08-18 00:18:19 +0000 UTC [ - ]
Modern LZ Compression - https://news.ycombinator.com/item?id=19064791 - Feb 2019 (4 comments)
nathell 2021-08-18 07:45:51 +0000 UTC [ - ]
nayuki 2021-08-18 17:19:19 +0000 UTC [ - ]
This can in fact be solved directly and optimally: https://en.wikipedia.org/wiki/Package-merge_algorithm ; https://en.wikipedia.org/wiki/Huffman_coding#Length-limited_...
nickdothutton 2021-08-18 10:45:42 +0000 UTC [ - ]
akkartik 2021-08-18 01:44:35 +0000 UTC [ - ]
adzm 2021-08-18 07:39:33 +0000 UTC [ - ]
pwrrr 2021-08-18 20:13:54 +0000 UTC [ - ]
meiji163 2021-08-18 02:15:37 +0000 UTC [ - ]
hakmad 2021-08-18 12:01:37 +0000 UTC [ - ]
[0] - https://en.wikipedia.org/wiki/Compress [1] - https://en.wikipedia.org/wiki/GIF#Unisys_and_LZW_patent_enfo...
jltsiren 2021-08-18 03:42:36 +0000 UTC [ - ]
zinekeller 2021-08-18 06:19:14 +0000 UTC [ - ]
I think that's misidentifying where's credit is due: LZ might not be the most efficient compression format when it comes to size, but it's very efficient computation-wise when compressing and decompressing, and this still holds true today even when you consider newer algorthms like Zstandard (which uses a derivative of LZ). Sure, if you need to archive data into its compactest representation at any cost, LZ and derivatives won't deliver it, but unless there's a significant change to how computing works it's still in LZ's favour. At the time DEFLATE was designed, LZ just beat out every competition since realistically you can't run decompression at real-time using minimal memory.
hexxagone 2021-08-18 03:47:21 +0000 UTC [ - ]
axiolite 2021-08-18 03:10:28 +0000 UTC [ - ]
Did you perhaps mean LZW (gif & compress), which was based on LZ78?
zlib isn't really dominant, today. lzma seems to have overtaken it for anything destined for public distribution.
wolf550e 2021-08-18 03:37:06 +0000 UTC [ - ]
axiolite 2021-08-18 04:21:48 +0000 UTC [ - ]
I never understood zstd. It's basically lzma+gzip+lz4 packed together. Show me any zstd level that has significantly different speed and data size than one of the levels of those three can't match.
lifthrasiir 2021-08-18 07:27:53 +0000 UTC [ - ]
In case you are not just trolling, I had some very large JSON file with a large amount of text in my SSD around and the compression time was as follows [1]:
8,826,654,133 original
4,763,212,322 0:29 lz4 -1
3,815,508,500 0:52 brotli -1
3,715,002,172 2:09 lz4 -3
3,668,204,232 2:27 gzip -1
3,159,659,113 1:59 brotli -2
3,118,316,529 10:55 lzma -0 -T4
3,025,746,073 1:21 zstd -3 -T1 (default)
Zstandard is not just lzma+gzip+lz4. It is better than everything you've mentioned at least for my test case. In fact I first compressed with zstd and tried to match the compression time for others, because zstd is very fast and using anything else as a reference point would have taken me much more time. It does have enough reason to claim itself to be "standard".[1] Tested with i7-7700 3.60GHz and 48 GiB of RAM (but no algorithm tested use more than several MBs of memory anyway). I'm using a pretty fast SSD here so I/O speed is not much of concern. Also note that every algorithm except for lzma is single-threaded.
axiolite 2021-08-18 07:53:10 +0000 UTC [ - ]
I duped /usr/share/dict/words into an 8GB file and did a couple tests on the old system I'm on:
8589934592 original
3075315128 pigz -1 1m0.44s
2926825272 zstdmt -3 2m37.52s
2877549999 pigz -3 1m28.94s
lifthrasiir 2021-08-18 07:56:55 +0000 UTC [ - ]
8,826,654,133 original
3,033,695,892 0:22 zstd 1.3.3 -3 -T4
3,025,746,073 1:21 zstd 1.3.3 -3 -T1 (default)
3,017,972,162 1:05 zstd 1.5.0 -3 -T1 (default)
3,013,860,663 1:23 zstd 1.5.0 -3 --single-thread
It is just no match.EDIT: axiolite wanted to see --single-thread and while my Linux box only has an older zstd (1.3.3) without that option I realized I do have a Windows executable for recent zstd (1.5.0). Both executables have run in the same machine but I can't guarantee anything about the 1.5.0 binary.
axiolite 2021-08-18 07:59:18 +0000 UTC [ - ]
I think you need to try pigz...
lifthrasiir 2021-08-18 08:10:54 +0000 UTC [ - ]
3,664,854,169 0:42 pigz -1 -p4
3,664,854,169 0:28 pigz -1 -p8 (default, also probably the max possible with my box)
Now I'm very curious where you got your copy of zstdmt. (I'm using stock Ubuntu packages for all of them.)[EDIT] Was "it doesn't make compression itself multi-threaded", which can falsely imply that --single-thread seemingly enables multi-threaded compression but it doesn't ;)
axiolite 2021-08-18 08:21:17 +0000 UTC [ - ]
The difference in our results certainly is curious.
lifthrasiir 2021-08-18 08:31:05 +0000 UTC [ - ]
dnr 2021-08-18 05:54:07 +0000 UTC [ - ]
(Yes, I know that one parameter controls multiple different settings internally, so there are multiple dimensions if you're willing to dig that deep.)
Anyway, my recollection from looking at benchmarks a while ago: zstd used at similar ratios to lzma compresses in similar time but decompresses much faster, and it's also faster than gzip when set to comparable ratios to that. lz4 is still faster than the fastest zstd modes, and lzma at the most extreme settings still gets better ratios that the best zstd can do. But there's a huge wide swath in the middle where zstd beats them all, and that's quite valuable.
axiolite 2021-08-18 08:08:38 +0000 UTC [ - ]
I honestly can't see how remembering several ranges of zstd levels and their time/speed trade-off is any easier than remembering lz4, pigz, xz, which all happen to have good default setting.
ahofmann 2021-08-18 08:33:36 +0000 UTC [ - ]
I can't see why I should ever in my life use another program for compressing things than zstd.
Jasper_ 2021-08-18 06:06:08 +0000 UTC [ - ]
wolf550e 2021-08-18 11:45:39 +0000 UTC [ - ]
rurban 2021-08-18 03:31:17 +0000 UTC [ - ]
user-the-name 2021-08-18 00:23:55 +0000 UTC [ - ]
hexxagone 2021-08-18 01:39:02 +0000 UTC [ - ]
retrac 2021-08-18 02:11:46 +0000 UTC [ - ]
Edit: It uses a variety of entropy encodings for different data structures, Huffman is one of them. My apologies for the confusing initial comment.
chriswarbo 2021-08-18 00:51:13 +0000 UTC [ - ]
user-the-name 2021-08-18 00:57:48 +0000 UTC [ - ]
hexxagone 2021-08-18 01:43:17 +0000 UTC [ - ]
vardump 2021-08-18 09:24:58 +0000 UTC [ - ]
For low entropy (lots of high probability symbols, like zeroes), Huffman is about 2-3x faster. But on the flipside, FSE achieves markedly higher compression ratio.
I think FSE is worth the speed tradeoff vs Huffman in most cases.
hexxagone 2021-08-18 15:49:39 +0000 UTC [ - ]
user-the-name 2021-08-18 10:15:25 +0000 UTC [ - ]
hexxagone 2021-08-18 15:56:41 +0000 UTC [ - ]
user-the-name 2021-08-18 18:07:12 +0000 UTC [ - ]
hexxagone 2021-08-19 00:27:13 +0000 UTC [ - ]
retrac 2021-08-18 06:48:38 +0000 UTC [ - ]
https://www.brutaldeluxe.fr/products/crossdevtools/lz4/index... LZ4 Data Compression - a rather long and in-depth article looking at LZ4 on the 65816 for the Apple IIgs with a decompressor that exploits that processor's block copy.
https://github.com/emmanuel-marty/lzsa - LZSA - a LZ4-like modern LZ that's more efficient both in speed and compression to LZ4 (at least on the 8 bitters it targets) - includes a neat chart of speed/compression trade-offs on a ZX Spectrum with a variety of algorithms