This blog post is an update to our last post discussing database compression methods, and how they stack up against each other.
When Vadim and I wrote about Evaluating Database Compression Methods last month, we claimed that evaluating database compression algorithms was easy these days because there are ready-to-use benchmark suites such as lzbench.
As easy as it was to do an evaluation with this tool, it turned out it was also easy to make a mistake. Due to a bug in the benchmark we got incorrect results for the LZ4 compression algorithm, and as such made some incorrect claims and observations in the original article. A big thank you to Yann Collet for reporting the issue!
In this post, we will restate and correct the important observations and recommendations that were incorrect in the last post. You can view the fully updated results in this document.
As you can see above, there was little change in compression performance. LZ4 is still the fastest, though not as fast after correcting the issue.
The compression ratio is where our results changed substantially. We reported LZ4 achieving a compression ratio of only 1.89 — by far lowest among compression engines we compared. In fact, after our correction, the ratio is 3.89 — better than Snappy and on par with QuickLZ (while also having much better performance).
LZ4 is a superior engine in terms of the compression ratio achieved versus the CPU spent.
The compression versus decompression graph now shows LZ4 has the highest ratio between compression and decompression performance of the compression engines we looked at.
The compression speed was not significantly affected by the LZ4 block size, which makes it great for compressing both large and small objects. The highest compression speed achieved was with a block size of 64KB — not the highest size, but not the smallest either among the sizes tested.
We saw some positive impact on the compression ratio by increasing the block size, However, increasing the block size over 64K did not substantially improve the compression ratio, making 64K an excellent block for LZ4, where it had the best compression speed and about as-good-as-it-gets compression. A 64K block size works great for other data as well, though we can’t say how universal it is.
Scatterplot with compression speed vs compression ratio
Most of our recommendations still stand after reviewing the updated results, with one important change. If you’re looking for a fast compression algorithm that has decent compression, consider LZ4. It offers better performance as well as a better compression ratio, at least on the data sets we tested.