Hugo Hacker News

Modern LZ Compression (2019)

retrac 2021-08-18 06:48:38 +0000 UTC [ - ]

There's quite a lot of retro modern LZ activity too! LZ turns out to be amazing on old machines, often only a couple times slower than a block copy. Optimal compressors and control over the algorithm have led to some very tight demos.

https://www.brutaldeluxe.fr/products/crossdevtools/lz4/index... LZ4 Data Compression - a rather long and in-depth article looking at LZ4 on the 65816 for the Apple IIgs with a decompressor that exploits that processor's block copy.

https://github.com/emmanuel-marty/lzsa - LZSA - a LZ4-like modern LZ that's more efficient both in speed and compression to LZ4 (at least on the 8 bitters it targets) - includes a neat chart of speed/compression trade-offs on a ZX Spectrum with a variety of algorithms

dang 2021-08-18 00:18:19 +0000 UTC [ - ]

Discussed at the time:

Modern LZ Compression - https://news.ycombinator.com/item?id=19064791 - Feb 2019 (4 comments)

nathell 2021-08-18 07:45:51 +0000 UTC [ - ]

"Managing Gigabytes" by Witten et al is _the_ book that got me into the field of compression back in the day. Granted, it’s a bit dated now as there’s been a lot of progress in the field, but I’d still heartily recommend it to newcomers.

nayuki 2021-08-18 17:19:19 +0000 UTC [ - ]

> First, we just shorten any symbols that are longer than our maximum code length — 11 — to that value. This means that our tree will no longer be a Huffman tree, so we need to do a fixup pass to redistribute the error we've introduced.

This can in fact be solved directly and optimally: https://en.wikipedia.org/wiki/Package-merge_algorithm ; https://en.wikipedia.org/wiki/Huffman_coding#Length-limited_...

nickdothutton 2021-08-18 10:45:42 +0000 UTC [ - ]

Only partly related, since it deals with a specific type of data (English language Wikipedia), some of you might enjoy reading about the Hutter Prize: https://en.wikipedia.org/wiki/Hutter_Prize

akkartik 2021-08-18 01:44:35 +0000 UTC [ - ]

Does anyone have recommendations for learning the xz format? Particularly as used by the ZIM archival format: https://en.wikipedia.org/wiki/ZIM_(file_format)

adzm 2021-08-18 07:39:33 +0000 UTC [ - ]

lzmautils / xz utils is probably your best bet. They have some documentation for sure but it's a complicated format so the source is invaluable.

pwrrr 2021-08-18 20:13:54 +0000 UTC [ - ]

I just made a huffman encoder on the c64, for fun. I need understand how you go from the variable length codes of huffman, to suddenly fixed lenght codes, because you don't want the codes to be above a certain length. hmm...

meiji163 2021-08-18 02:15:37 +0000 UTC [ - ]

I wonder if LZ would still be standard, if not for the inertia of gzip/zip? There are surely better and comparably fast algorithms (paq, ppm, etc.)

hakmad 2021-08-18 12:01:37 +0000 UTC [ - ]

This might not apply to all LZ based programs/algorithms but compress (an old Unix utility based on the Lempel-Ziv-Welch algorithm) fell out of favour after it ran afoul of patent issues (you can read more about it on the compress Wikipedia page [0] and a section of the GIF Wikipedia page [1] which covers more about the enforcement of the patent). From what I can gather though, it enjoyed considerable success and was pretty much the standard utility for compressing data on Unix. I think eventually though it would have been replaced because other algorithms (bzip2, gzip, etc.) have slightly better compression ratios (even if they are more computationally expensive).

[0] - https://en.wikipedia.org/wiki/Compress [1] - https://en.wikipedia.org/wiki/GIF#Unisys_and_LZW_patent_enfo...

jltsiren 2021-08-18 03:42:36 +0000 UTC [ - ]

LZ is popular because it's computationally efficient. When random memory access is slow but sequential access is fast, you want to encode substrings instead of individual symbols. Explicit higher-order probabilistic models require a lot of memory and generate too many cache misses. Implicit models based on the Burrows-Wheeler transform also have poor memory access patterns.

zinekeller 2021-08-18 06:19:14 +0000 UTC [ - ]

> if not for the inertia of gzip/zip

I think that's misidentifying where's credit is due: LZ might not be the most efficient compression format when it comes to size, but it's very efficient computation-wise when compressing and decompressing, and this still holds true today even when you consider newer algorthms like Zstandard (which uses a derivative of LZ). Sure, if you need to archive data into its compactest representation at any cost, LZ and derivatives won't deliver it, but unless there's a significant change to how computing works it's still in LZ's favour. At the time DEFLATE was designed, LZ just beat out every competition since realistically you can't run decompression at real-time using minimal memory.

wolf550e 2021-08-18 03:52:47 +0000 UTC [ - ]

ppm and paq are a LOT slower than LZ77.

hexxagone 2021-08-18 03:47:21 +0000 UTC [ - ]

PAQ and PPM are not fast, certainly not in the same category as LZ codecs.

axiolite 2021-08-18 03:10:28 +0000 UTC [ - ]

zip/gzip are LZ algorithms, as mentioned near the top of the article. Based on LZ77 to be exact.

Did you perhaps mean LZW (gif & compress), which was based on LZ78?

zlib isn't really dominant, today. lzma seems to have overtaken it for anything destined for public distribution.

wolf550e 2021-08-18 03:37:06 +0000 UTC [ - ]

lzma is slow to decompress. Unless the bandwidth / storage costs are the primary concern, zstd has good enough compression ratio and the very fast decompression makes everything much nicer. Built-in threaded compression, long range compression, binary diff, dictionary compression, etc. are a bonus.

axiolite 2021-08-18 04:21:48 +0000 UTC [ - ]

> zstd has good enough compression ratio

I never understood zstd. It's basically lzma+gzip+lz4 packed together. Show me any zstd level that has significantly different speed and data size than one of the levels of those three can't match.

lifthrasiir 2021-08-18 07:27:53 +0000 UTC [ - ]

> Show me any zstd level that has significantly different speed and data size than one of the levels of those three can't match.

In case you are not just trolling, I had some very large JSON file with a large amount of text in my SSD around and the compression time was as follows [1]:

    8,826,654,133       original
    4,763,212,322  0:29 lz4 -1
    3,815,508,500  0:52 brotli -1
    3,715,002,172  2:09 lz4 -3
    3,668,204,232  2:27 gzip -1
    3,159,659,113  1:59 brotli -2
    3,118,316,529 10:55 lzma -0 -T4
    3,025,746,073  1:21 zstd -3 -T1 (default)
Zstandard is not just lzma+gzip+lz4. It is better than everything you've mentioned at least for my test case. In fact I first compressed with zstd and tried to match the compression time for others, because zstd is very fast and using anything else as a reference point would have taken me much more time. It does have enough reason to claim itself to be "standard".

[1] Tested with i7-7700 3.60GHz and 48 GiB of RAM (but no algorithm tested use more than several MBs of memory anyway). I'm using a pretty fast SSD here so I/O speed is not much of concern. Also note that every algorithm except for lzma is single-threaded.

axiolite 2021-08-18 07:53:10 +0000 UTC [ - ]

Sounds like you're comparing multi-threaded zstd to other utilities that are single-threaded (e.g. gzip instead of pigz).

I duped /usr/share/dict/words into an 8GB file and did a couple tests on the old system I'm on:

  8589934592  original
  3075315128  pigz -1       1m0.44s
  2926825272  zstdmt -3     2m37.52s
  2877549999  pigz -3       1m28.94s

lifthrasiir 2021-08-18 07:56:55 +0000 UTC [ - ]

Zstd by default is single-threaded. Just in case though...

    8,826,654,133       original
    3,033,695,892  0:22 zstd 1.3.3 -3 -T4
    3,025,746,073  1:21 zstd 1.3.3 -3 -T1 (default)
    3,017,972,162  1:05 zstd 1.5.0 -3 -T1 (default)
    3,013,860,663  1:23 zstd 1.5.0 -3 --single-thread
It is just no match.

EDIT: axiolite wanted to see --single-thread and while my Linux box only has an older zstd (1.3.3) without that option I realized I do have a Windows executable for recent zstd (1.5.0). Both executables have run in the same machine but I can't guarantee anything about the 1.5.0 binary.

axiolite 2021-08-18 07:59:18 +0000 UTC [ - ]

zstd is NOT single-threaded by default. It's dual threaded. You have to pass the --single-thread option to make it single threaded.

I think you need to try pigz...

lifthrasiir 2021-08-18 08:10:54 +0000 UTC [ - ]

--single-thread probably doesn't do what you think, it forces zstd to serialize I/O with the compression job and it doesn't mean compression itself is multi-threaded with -T1 [EDIT]. I can't try that anyway because my zstd version is slightly lower (1.3.3), but I can try pigz:

    3,664,854,169  0:42 pigz -1 -p4
    3,664,854,169  0:28 pigz -1 -p8 (default, also probably the max possible with my box)
Now I'm very curious where you got your copy of zstdmt. (I'm using stock Ubuntu packages for all of them.)

[EDIT] Was "it doesn't make compression itself multi-threaded", which can falsely imply that --single-thread seemingly enables multi-threaded compression but it doesn't ;)

axiolite 2021-08-18 08:21:17 +0000 UTC [ - ]

Using stock RHEL7/EPEL packages: zstd-1.5.0-1.el7.x86_64 On an old Athlon II X4 615e right now.

The difference in our results certainly is curious.

lifthrasiir 2021-08-18 08:31:05 +0000 UTC [ - ]

Very interesting. It might be the case that zstd optimizes more for recent machines; zstd famously uses four different compression streams to maximize instruction-level parallelism and that might not work well in older machines. I haven't seen any machine where zstd is significantly slower than it should, but those machines I could test came from 2013 or later. Or either the RHEL package might have been optimized for recent machines. It would be interesting to test a binary optimized for the current machine (-march=native -mtune=native).

hexxagone 2021-08-18 15:58:50 +0000 UTC [ - ]

pigz is no match for zstd...

dnr 2021-08-18 05:54:07 +0000 UTC [ - ]

Isn't that exactly the point, and why it's so great? It's a single algorithm that's tune-able across that wide range of speed/ratio trade-offs using a single parameter. So you can just use one thing for almost every application instead of picking between three different things.

(Yes, I know that one parameter controls multiple different settings internally, so there are multiple dimensions if you're willing to dig that deep.)

Anyway, my recollection from looking at benchmarks a while ago: zstd used at similar ratios to lzma compresses in similar time but decompresses much faster, and it's also faster than gzip when set to comparable ratios to that. lz4 is still faster than the fastest zstd modes, and lzma at the most extreme settings still gets better ratios that the best zstd can do. But there's a huge wide swath in the middle where zstd beats them all, and that's quite valuable.

axiolite 2021-08-18 08:08:38 +0000 UTC [ - ]

> So you can just use one thing for almost every application instead of picking between three different things.

I honestly can't see how remembering several ranges of zstd levels and their time/speed trade-off is any easier than remembering lz4, pigz, xz, which all happen to have good default setting.

ahofmann 2021-08-18 08:33:36 +0000 UTC [ - ]

I use zstd once a year, and it is very easy: 1. the default (3 on my machine) is fast and compresses well. 2. if you need to get the best compression level for your use case, use the benchmark "zstd -b1 -e15 yourfile" and after a few minutes you have your answer.

I can't see why I should ever in my life use another program for compressing things than zstd.

Jasper_ 2021-08-18 06:06:08 +0000 UTC [ - ]

It's really not lzma+gzip+lz4, except by if you mean "it's an LZ-alike". It's a modern LZ with repcodes (kinda LZMA-ish, but more like LZX honestly), and it uses a novel arith coder system (Huffman/ANS) instead of a range/arith coder (LZMA) or none (LZ4)

wolf550e 2021-08-18 11:45:39 +0000 UTC [ - ]

Here are some DB dump compression tests done years ago, look at the pareto frontier and see that zstd wins between LZ4 and LZMA (xz): https://zeevt.github.io/compress_pareto.html

rurban 2021-08-18 03:31:17 +0000 UTC [ - ]

I had to implement recently an oldstyle lz77 en/decoder to handle an old fileformat, and it was surprisingly simple. Even the encoder

user-the-name 2021-08-18 00:23:55 +0000 UTC [ - ]

"Modern", but uses Huffman coding? No LZ implementation aiming for high compression in the last decade has used Huffman.

hexxagone 2021-08-18 01:39:02 +0000 UTC [ - ]

You mean like zstd and brotli ? Is there any new LZ codec not using Huffman ?

retrac 2021-08-18 02:11:46 +0000 UTC [ - ]

It was my understanding Zstd used neither Huffman or AC but something else: https://en.wikipedia.org/wiki/Asymmetric_numeral_systems

Edit: It uses a variety of entropy encodings for different data structures, Huffman is one of them. My apologies for the confusing initial comment.

chriswarbo 2021-08-18 00:51:13 +0000 UTC [ - ]

> For our goal of a high compression LZ variant, we will want to Huffman code our symbols (zstd actually uses FSE, a variant of arithmetic coding to do this, but we will cover that in a future post).

user-the-name 2021-08-18 00:57:48 +0000 UTC [ - ]

But why? Why spend that much effort introducing a method that is basically entirely outdated at this point, and is also fairly complicated?

hexxagone 2021-08-18 01:43:17 +0000 UTC [ - ]

Huffmann is rather simple and hard to beat in decompression speed.

vardump 2021-08-18 09:24:58 +0000 UTC [ - ]

For high entropy data (=somewhat random data), FSE is quite comparable to Huffman in compression speed.

For low entropy (lots of high probability symbols, like zeroes), Huffman is about 2-3x faster. But on the flipside, FSE achieves markedly higher compression ratio.

I think FSE is worth the speed tradeoff vs Huffman in most cases.

hexxagone 2021-08-18 15:49:39 +0000 UTC [ - ]

"FSE achieves markedly higher compression ratio". I do not thin it is true, FSE/ANS achieves slightly better ratios in general. Zstd uses both Huffman (for large alphabets) and FSE (for small alphabets).

user-the-name 2021-08-18 10:15:25 +0000 UTC [ - ]

Arithmetic coding is much, much simpler. And decompression speed is not a limiting factor in most applications of data compression like this.

hexxagone 2021-08-18 15:56:41 +0000 UTC [ - ]

"Arithmetic coding is much, much simpler." Let us agree to disagree. "And decompression speed is not a limiting factor in most applications of data compression like this". It depends on the application. Zstd and Brotli are certainly aiming at the fastest decompression speed possible.

user-the-name 2021-08-18 18:07:12 +0000 UTC [ - ]

Arithmetic coding can be implemented in as little as maybe ten lines of code. It is far simpler than Huffman coding.

hexxagone 2021-08-19 00:27:13 +0000 UTC [ - ]

The Huffman encoding loop is 2 lines and decoding loop is 4 lines of branchless code. Do you have an example of branchless arithmetic encoder or decoder ?