A key feature of our new TokuDB v6.0 release, which I have been blogging about this week, is compression. Compression is always on in TokuDB, and the compression we’ve achieved in the past has been quite good. See a previous post on the 18x compression achieved by TokuDB v5.0 on one benchmark. In our latest release, we’ve updated the way compression works and got 50% improvement on compression.
I decided to present numbers on the same set of data as the old post, so see that post for experimental details.
But first, what are the changes? TokuDB compresses large blocks of data — on the order of MB, rather than the 16KB that InnoDB uses — which is a big part of why we can get better compression. For InnoDB, compression is attempted on 16KB pieces, with inefficiencies if the block compresses too little or too much. InnoDB’s compression woes are well documented.
In TokuDB v6.0, you can choose between two types of compression by setting the ROW_FORMAT in the CREATE TABLE or ALTER TABLE commands. One compression setting, “standard,” uses less CPU. The other setting, “aggressive,” uses more CPU but usually does a better job of compressing, sometimes much better.
Let’s look at the numbers (benchmark details here).
In this case, we’ve achieved 29x compression!
So when should you use the standard compressor and when should you use the more aggressive compressor? Compression is all done in the background, so it basically depends on the number of cores you have. If you have enough idle cores, the aggressive compressor will not slow down your database — in fact, the following graph shows that you can use TokuDB’s aggressive compressor to improve your overall database performance.
If you don’t have enough spare cores, then the standard compressor may be better, since in that case, the compressor may contend with other parts of the system for CPU resources. The exact cutoff depends on the particulars of your system, but an easy rule of thumb might be to use standard if you have 6 or fewer cores, and otherwise use aggressive.
In either case, you get great compression. Compression performance is strongly affected by many factors, and we are always on the lookout for interesting use cases, so please post any interesting results you might get with the two settings.
To learn more about TokuDB: