Compression Benchmarking: Size vs. Speed (I want both)

Posted on:



Share Button

I’m creating a library of benchmarks and test suites that will run as part of a Continuous Integration (CI) process here at Tokutek. My goal is to regularly measure several aspects of our storage engine over time: performance, correctness, memory/CPU/disk utilization, etc. I’ll also be running tests against InnoDB and other databases for comparative analysis. I plan on posting a series of blog entries as my CI framework evolves, for now I have the results of my first benchmark.

Compression is an always-on feature of TokuDB. There are no server/session variables to enable compression or change the compression level (one goal of TokuDB is to have as few tuning parameters as possible). My compression benchmark uses iiBench to measure the insert performance and compression achieved by TokuDB and InnoDB. I tested InnoDB compression with two values of key_block_size (4k and 8k) and with compression disabled.

As you can see in the above graph, compression allows for the database to use significantly less disk space. TokuDB achieved 51% compression, InnoDB achieved 50% for key_block_size=4 and and 47% compression for key_block_size=8. [Note: The random nature of iiBench makes it difficult to compress]

Traditionally there is a “size versus speed” trade-off when compressing data. Data compression utilities have long offered variable levels of aggressiveness, spending more time compressing files usually results in smaller files. The InnoDB benchmarks bear this out, as the compression level increases the insert performance declines. On the other hand, TokuDB achieves the highest level of compression while out-performing InnoDB in all scenarios, even InnoDB without compression. TokuDB is running 33.4x faster than InnoDB configured to achieve similar levels of compression. Note, “Inserts per Second” was measured as the exit velocity of the benchmark run (the average of the last million inserts).

How much compression can be achieved?

To answer this I decided to load some web application performance data (log style data with stored procedure names, database instance names, begin and ending execution timestamps, duration row counts, and parameter values). TokuDB achieved 18x compression, far more than InnoDB. It also loaded the data much faster but that is a blog entry for another day…

Benchmark details


  • iiBench, insert 25mm rows, 1000 rows per commit


  • Intel Core-i7/920 @ 3.6GHz, 12GB DDR3 @ 1600MHz, 2 x SATA II
  • Ubuntu 11.04, TokuDB 5.0.4, MySQL 5.1.52, InnoDB plug-in 1.0.13

Server/Session Variables

  • unique_checks=1
  • tokudb_commit_sync=0
  • tokudb_cache_size=2G
  • innodb_buffer_pool_size=2G
  • innodb_flush_method=O_DIRECT
  • innodb_doublewrite=false
  • innodb_flush_log_at_trx_commit=0
  • innodb_log_file_size=1000M
  • innodb_file_per_table=true
  • innodb_log_buffer_size=16M
  • innodb_file_format=barracuda
Share Button


, , , , ,

Tokutek, TokuView

  • Hi Tim,

    I guess I’m a bit late in adding a comment to this post but given there’s been quite a bit of noise regarding TokuDB at the moment I thought I’d have a look.

    I’m surprised with benchmarks like this why “safe” settings are not used.

    Of the values mentioned the following settings stand out as not being safe (or _possibly_ not being safe):


    and to some extent I think that partially invalidates the numbers provided. A database has to be safe. A DBA wants the data to be on disk and recoverable, so disabling features to get better numbers, even if you partially do this on both engines, seems like something which is not a good idea.

    I also noticed when looking at some of the configuration settings that if you are running in a replication environment you made need to use tokudb_pk_insert_mode = 2 as otherwise the settings are not replication safe. I don’t see mention of this requirement, and that obviously slows things down. This setting is mentioned as being the slowest setting.

    Maybe there are other blogs where this is mentioned but it looks like a real environment may behave significantly differently to what’s implied from a blog post like this.

    All that said the post is interesting: it does make TokuDB look attractive and worth evaluating. I’ll have to see how the numbers vary on the hardware I’m using and if the gains match what you imply.

Leave a Reply