Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

InnoDB and TokuDB on AWS

February 1, 2016

Author

Fernando Ipar

Benchmarks

Share this Post:

In a recent post, Vadim compared the performance of Amazon Aurora and Percona Server on AWS. This time, I am comparing write throughput for InnoDB and TokuDB, using the same workload (sysbench oltp/update/update_non_index) and a similar set-up (r3.xlarge instance, with general purpose ssd, io2000 and io3000 volumes) to his experiments.

All the runs used 16 threads for sysbench, and the following MySQL configuration files for InnoDB and TokuDB respectively:

[mysqld]

table-open-cache-instances=32
table_open_cache=8000


innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 16G
innodb-flush-log-at-trx-commit = 1
innodb_log_compressed_pages     =0

innodb-file-per-table          = 1
innodb-buffer-pool-size        = 20G

innodb_write_io_threads        = 8
innodb_read_io_threads         = 32
innodb_open_files              = 1024

innodb_old_blocks_pct           =10
innodb_old_blocks_time          =2000

innodb_checksum_algorithm = crc32

innodb_file_format              =Barracuda

innodb_io_capacity=1500
innodb_io_capacity_max=2000
metadata_locks_hash_instances=256
innodb_max_dirty_pages_pct=90
innodb_flush_neighbors=1
innodb_buffer_pool_instances=8
innodb_lru_scan_depth=4096
innodb_sync_spin_loops=30
innodb-purge-threads=16

[mysqld]

table-open-cache-instances=32

table_open_cache=8000

innodb-flush-method = O_DIRECT

innodb-log-files-in-group = 2

innodb-log-file-size = 16G

innodb-flush-log-at-trx-commit = 1

innodb_log_compressed_pages =0

innodb-file-per-table = 1

innodb-buffer-pool-size = 20G

innodb_write_io_threads = 8

innodb_read_io_threads = 32

innodb_open_files = 1024

innodb_old_blocks_pct =10

innodb_old_blocks_time =2000

innodb_checksum_algorithm = crc32

innodb_file_format =Barracuda

innodb_io_capacity=1500

innodb_io_capacity_max=2000

metadata_locks_hash_instances=256

innodb_max_dirty_pages_pct=90

innodb_flush_neighbors=1

innodb_buffer_pool_instances=8

innodb_lru_scan_depth=4096

innodb_sync_spin_loops=30

innodb-purge-threads=16

[mysqld]

tokudb_read_block_size=16K
tokudb_fanout=128

table-open-cache-instances=32
table_open_cache=8000

metadata_locks_hash_instances=256

[mysqld_safe]
thp-setting=never

[mysqld]

tokudb_read_block_size=16K

tokudb_fanout=128

table-open-cache-instances=32

table_open_cache=8000

metadata_locks_hash_instances=256

[mysqld_safe]

thp-setting=never

You can see the full set of graphs here, and the complete results here.

Let me start illustrating the results with this summary graph for the io2000 volume, showing how write throughput varies over time, per engine and workload (for all graphs, size is in 1k rows, so 1000 is actually 1M):

We can see a few things already:

- InnoDB has better throughput for smaller table sizes.

- The reverse is true as size becomes big enough (after 10M rows here).

- TokuDB’s advantage is not noticeable on the oltp workload, though it is for InnoDB.

Let’s dig in a bit more and look at the extreme ends in terms of table size, starting with 1M rows:

and ending in 50M:

In the first case, we can see that not only does InnoDB show better write throughput, it also shows less variance. In the second case, we can confirm that the difference does not seem significant for oltp, but it is for the other workloads.

This should come as no surprise, as one of the big differences between TokuDB’s Fractal trees and InnoDB’s B-tree implementation is the addition of message buffers to nodes, to handle writes (the other big difference, for me, is node size). For write-intensive workloads, TokuDB needs to do a lot less tree traversing than InnoDB (in fact, this is done only to validate uniqueness constraints when required, otherwise writes are just injected into the message buffer and the buffer is flushed to lower levels of the tree asynchronously. I refer you to this post for more details).

For oltp, InnoDB is at advantage at smaller table sizes, as it does not need to scan message buffers all across the search path when reading (nothing is free in life, and this is the cost for TokuDB’s advantage for writes). I suspect this advantage is lost for high enough table sizes because at that point, either engine will be I/O bound anyway.

My focus here was write throughput, but as a small example see how this is reflected on response time if we pick the 50M table size and drop oltp from the mix:

At this point, you may be wondering why I focused on the io2000 results (and if you’re not, bear with me please!). The reason is the results for io3000 and the general purpose ssd showed characteristics that I attribute to latency on the volumes. You can see what I mean by looking at the io3000 graph:

I say “I attribute” because, unfortunately, I do not have any metrics other than sysbench’s output to go with (an error I will amend on future benchmarks!). I have seen the same pattern while working on production systems on AWS, and in those cases I was able to correlate it with increases in stime and/or qtime on diskstats. The fact that this is seen on the lower and higher capacity volumes for the same workload, but not the io2000 one, increases my confidence in this assumption.

Conclusion

I would not consider TokuDB a general purpose replacement for InnoDB, by which I mean I would never blindly suggest someone to migrate from one to the other, as the performance characteristics are different enough to make this risky without a proper assessment.

That said, I believe TokuDB has great advantages for the right scenarios, and this test highlights some of its strengths:

- It has a significant advantage over InnoDB on slower devices and bigger data sets.

- For big enough data sets, this is even the case on fast devices and write intensive workloads, as the B-tree becomes I/O bound much faster

Other advantages of TokuDB over InnoDB, not directly evidenced from these results, are:

- Better compression (helped by the much larger block size).

- Better SSD lifetime, due to less and more sequential writes (sequential writes have, in theory at least, no write amplification compared to random ones, so even though the sequential/random difference should not matter for SSDs for performance, it does for lifetime).