Evaluating Database Compression Methods: Update

Database Compression MethodsThis blog post is an update to our last post discussing database compression methods, and how they stack up against each other. 

When Vadim and I wrote about Evaluating Database Compression Methods last month, we claimed that evaluating database compression algorithms was easy these days because there are ready-to-use benchmark suites such as lzbench.

As easy as it was to do an evaluation with this tool, it turned out it was also easy to make a mistake. Due to a bug in the benchmark we got incorrect results for the LZ4 compression algorithm, and as such made some incorrect claims and observations in the original article. A big thank you to Yann Collet for reporting the issue!

In this post, we will restate and correct the important observations and recommendations that were incorrect in the last post. You can view the fully updated results in this document.

Compression Method

As you can see above, there was little change in compression performance. LZ4 is still the fastest, though not as fast after correcting the issue.

Compression Ratio

The compression ratio is where our results changed substantially. We reported LZ4 achieving a compression ratio of only 1.89 — by far lowest among compression engines we compared. In fact, after our correction, the ratio is 3.89 — better than Snappy and on par with QuickLZ (while also having much better performance).  

LZ4 is a superior engine in terms of the compression ratio achieved versus the CPU spent.

Compression vs Decompression

The compression versus decompression graph now shows LZ4 has the highest ratio between compression and decompression performance of the compression engines we looked at.

Compression Speed vs Block Size

The compression speed was not significantly affected by the LZ4 block size, which makes it great for compressing both large and small objects. The highest compression speed achieved was with a block size of 64KB — not the highest size, but not the smallest either among the sizes tested.

Compression Speed vs Block Size

We saw some positive impact on the compression ratio by increasing the block size, However, increasing the block size over 64K did not substantially improve the compression ratio, making 64K an excellent block for LZ4, where it had the best compression speed and about as-good-as-it-gets compression. A 64K block size works great for other data as well, though we can’t say how universal it is.

Scatterplot with compression speed vs compression ratio



Updated Recommendations

Most of our recommendations still stand after reviewing the updated results, with one important change. If you’re looking for a fast compression algorithm that has decent compression, consider LZ4.  It offers better performance as well as a better compression ratio, at least on the data sets we tested.


Share this post

Comments (3)

  • Nikolay Sholevski

    Will lz4 be enabled in percona tokudb in near future. I really want to have this option as a compression ?

    September 17, 2016 at 5:30 am
  • Peter Zaitsev

    Hi Nikolay,

    Yes LZ4 and ZSTD are on the roadmap to be supported as compression methods for TokuDB

    September 18, 2016 at 1:47 pm
  • Vladimir

    Could you please clarify which block size you are talking about in the research: filesystem block size, key_block_size or tokudb_block_size?

    December 5, 2018 at 4:39 pm

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.