Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Contact Us

Compression Benchmarking: Size vs. Speed (I want both)

September 15, 2011

Author

Tim.Callaghan

MySQL

Share this Post:

I’m creating a library of benchmarks and test suites that will run as part of a Continuous Integration (CI) process here at Tokutek. My goal is to regularly measure several aspects of our storage engine over time: performance, correctness, memory/CPU/disk utilization, etc. I’ll also be running tests against InnoDB and other databases for comparative analysis. I plan on posting a series of blog entries as my CI framework evolves, for now I have the results of my first benchmark.

Compression is an always-on feature of TokuDB. There are no server/session variables to enable compression or change the compression level (one goal of TokuDB is to have as few tuning parameters as possible). My compression benchmark uses iiBench to measure the insert performance and compression achieved by TokuDB and InnoDB. I tested InnoDB compression with two values of key_block_size (4k and 8k) and with compression disabled.

As you can see in the above graph, compression allows for the database to use significantly less disk space. TokuDB achieved 51% compression, InnoDB achieved 50% for key_block_size=4 and and 47% compression for key_block_size=8. [Note: The random nature of iiBench makes it difficult to compress]

Traditionally there is a “size versus speed” trade-off when compressing data. Data compression utilities have long offered variable levels of aggressiveness, spending more time compressing files usually results in smaller files. The InnoDB benchmarks bear this out, as the compression level increases the insert performance declines. On the other hand, TokuDB achieves the highest level of compression while out-performing InnoDB in all scenarios, even InnoDB without compression. TokuDB is running 33.4x faster than InnoDB configured to achieve similar levels of compression. Note, “Inserts per Second” was measured as the exit velocity of the benchmark run (the average of the last million inserts).

How much compression can be achieved?

To answer this I decided to load some web application performance data (log style data with stored procedure names, database instance names, begin and ending execution timestamps, duration row counts, and parameter values). TokuDB achieved 18x compression, far more than InnoDB. It also loaded the data much faster but that is a blog entry for another day…

Benchmark details

Application

iiBench, insert 25mm rows, 1000 rows per commit

Environment

Intel Core-i7/920 @ 3.6GHz, 12GB DDR3 @ 1600MHz, 2 x SATA II

Ubuntu 11.04, TokuDB 5.0.4, MySQL 5.1.52, InnoDB plug-in 1.0.13

Server/Session Variables

unique_checks=1

tokudb_commit_sync=0

tokudb_cache_size=2G

innodb_buffer_pool_size=2G

innodb_flush_method=O_DIRECT

innodb_doublewrite=false

innodb_flush_log_at_trx_commit=0

innodb_log_file_size=1000M

innodb_file_per_table=true

innodb_log_buffer_size=16M

innodb_file_format=barracuda

0 0 votes

Article Rating

Subscribe

1 Comment

Oldest

Newest Most Voted

Simon Mudd

11 years ago

Hi Tim,

I guess I’m a bit late in adding a comment to this post but given there’s been quite a bit of noise regarding TokuDB at the moment I thought I’d have a look.

I’m surprised with benchmarks like this why “safe” settings are not used.

Of the values mentioned the following settings stand out as not being safe (or _possibly_ not being safe):

tokudb_commit_sync=0
innodb_doublewrite=false
innodb_flush_log_at_trx_commit=0

and to some extent I think that partially invalidates the numbers provided. A database has to be safe. A DBA wants the data to be on disk and recoverable, so disabling features to get better numbers, even if you partially do this on both engines, seems like something which is not a good idea.

I also noticed when looking at some of the configuration settings that if you are running in a replication environment you made need to use tokudb_pk_insert_mode = 2 as otherwise the settings are not replication safe. I don’t see mention of this requirement, and that obviously slows things down. This setting is mentioned as being the slowest setting.

Maybe there are other blogs where this is mentioned but it looks like a real environment may behave significantly differently to what’s implied from a blog post like this.

All that said the post is interesting: it does make TokuDB look attractive and worth evaluating. I’ll have to see how the numbers vary on the hardware I’m using and if the gains match what you imply.

0

Reply