EmergencyEMERGENCY? Get 24/7 Help Now!

Benchmarking Percona Server TokuDB vs InnoDB

 | May 7, 2013 |  Posted In: Benchmarks, MySQL, Percona Server


After compiling Percona Server with TokuDB, of course I wanted to compare InnoDB performance vs TokuDB.
I have a particular workload I’m interested in testing – it is an insert-intensive workload (which is TokuDB’s strong suit) with some roll-up aggregation, which should produce updates in-place (I will use INSERT .. ON DUPLICATE KEY UPDATE statements for that), so it will produce all good amount of reads.

A few words about the hardware: I am going to use new the Dell PowerEdge R420 with two Intel(R) Xeon(R) CPU E5-2450 0 @ 2.10GHz, 48GB of RAM and SATA SSD: Kingston HyperX 3K 240 GB.

Workload: I will use two different schemas. The first schema is from sysbench, and the table looks like:

CREATE INDEX k on sbtest$I(k)

and tables sbtest$I_r10, sbtest$I_r100, sbtest$I_r1000, with roll-up sum for 10, 100, 1000 records in the main table.

and transactions for this workload are:

So this workload produces SEQUENTIAL inserts into Primary Key, that this is quite suitable for InnoDB, and in it we have random inserts into SECONDARY KEYS (k),
which is not so good for InnoDB.

So let’s see what results we have. The results are in TPS (more is better) and we start with empty tables.

Now, before looking at the graph, please do not jump to conclusions, as the graph is MISLEADING.


So we see that InnoDB performance steadily declines from 24000 tps to 18000 tps, but InnoDB can’t make 5h run. After 3h the disk is full, and InnoDB data size is about 210GB with 234.238.440 inserted records.
While TokuDB averages around 14000 tps mark with some periodical drops into 10000 tps area.
TokuDB datasize after 5h of run is about 50GB with 276.934.863 records.

So why do I say that the graph is misleading?
Obviously we can say that InnoDB is faster, but you should look into the steady decline of InnoDB throughput. Eventually it will drop to the level 14000 tps and below. I do not have enough space on this SSD to run this experiment that long. So there we see the strong side of TokuDB: it has more than 4x data compression on this dataset.
We can easily fill TokuDB tables with 1bln of rows on this SSD, and projected InnoDB performance on this size will be the same or worse, but will require 1TB in size.

Now to see this point of intersection, let’s review different workload (which actually is closer to what I need).

Tables looks like:

and transactions are:

That is, our PRIMARY KEY is not sequential anymore, which is bad for InnoDB, but this is what I need for my setup (I still can have synthetic auto_inc PK, but in that case I still will need SECONDARY KEY (hid,mid,id) ).

Also please note the transaction produces 4 INSERTs and workload is very write intensive.

So what are results in this case:


InnoDB gradually declines as data growth (which is expected) and by the end of 5 hours averages at 2700 tps.
With TokuDB we also see a drop, and by end of 5 hours the average throughput is 7800 tps.
Something to take into account: TokuDB results are not quite stable, that is why I also show 5-minute moving averages to TokuDB.

So TokuDB shows about 2.8x better throughput, and on data size:

  • InnoDB table: 58GB and 244.980.192 records
  • TokuDB table: 15GB and 232.927.460 records

So TokuDB looks better in this workload, however the sparse throughput is worrisome to me. Let’s zoom in to 10 min intervals and see throughput:
We can see periodical drops, which I believe are related to 60-sec checkpoint interval, as TokuDB does time-based checkpoints.
These drops are quite concerning, and it might be a problem for some users.

Now, I understand that my PRIMARY KEY (hid,mid,id) where id is sequential, and hid,mid is low selectivity is not good for fast inserts, but it is suitable for range selects by id. However it will interesting how both InnoDB and TokuDB performs if PK is (id,hid,mid). This also will affect select performance, so we will need to measure that also.

And, if you want to repeat this benchmark, the sysbench code is on Launchpad lp:~vadim-tk/sysbench/insert-roll-2,
command line to run:

and InnoDB options are:

TokuDB-related options are all defaults, as I understand from documentation TokuDB comes with good settings out-of-box, but I am ready to tune something if there are suggestions.

Vadim Tkachenko

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products. He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition.


  • The point at 685 sec when InnoDB performance drops, is that when data doesn’t fit all in RAM anymore? I would in general expect to see benefits from TokuDB when workload is diskbound. You could artificially reduce InnoDB buffer pool (and corresponding Tokutek setting) to have more SSD vs RAM.

  • The main performance win of TokuDB is indexes. If you add multiple multi-column indexes the difference with InnoDB will be even more notable. Also it allows multiple clustering indexes that give additional huge win on selects.

    We used TokuDB in the past (versions 3.x and 4.x). It really shines on large tables and index scan like queries.

  • Strikes me as odd that they actually take a previous InnoDB checkpoint benchmark from you as a reference for having outstanding checkpoint performance (eg http://www.tokutek.com/2012/04/tokudb-v6-0-frequent-checkpoints-with-no-performance-hit )

    Also, which TokuDB row compressor was used? Default or lzma? If you used the default one it would be highly interesting to see the performance with lzma compression.

  • In your my.cnf conf I don’t see barracuda file format and did you create your table using row_format = compressed with InnoDB which enables tablespace compression ?

  • Then why comparing ompressed tables and uncompressed table on size ? That’s clearly not relevant !
    Could you redo the same test with innodb compression ?


  • Gregory,

    I am comparing workload that I am interested in.
    I am not going to use InnoDB compression in production, as it introduces more problems than it solves.
    So this comparison is relevant for me.

  • Vadim,

    Thanks for a great performance report.

    Not sure I understand why InnoDB used 210 GB for 234M rows when loaded in PK order. What am I doing wrong with my hand-wavy estimates?

    234M * (200 + 20) for PK index +
    234M * (12 + 7) * 1.5 for fragmented secondary index +
    234M * (12 + 20) * 1.5 for fragmented sbtest_r10

    I assume:
    * 20 bytes for per-row metadata overhead in PK rows
    * PK index pages are 15/16 full as they are loaded in order
    * 7 bytes per-row metadata overhead for secondary index entries
    * fragmented pages are 2/3 full

    So my guess is:
    * 50G for PK index
    * 6G for secondary index on base table
    * 11G for sbtest_r10
    * much less for other rollup tables

  • Mark,

    In this case roll-up tables are not really roll-up… I need to fix workload, and I will show numbers with real roll-up tables.
    so in this case the size of roll-up table is almost the same as of main table.

  • 1) What concurrency did you use for the INSERT test? Don’t know if TokuDB handle concurrent workload as well as InnoDB

    2) Can you also run benchmark on SELECT by PK or some range query by a secondary index? I heard that TokuDB is slower than InnoDB on SELECT but it’d be interesting to see by how much.

  • Hello Vadim,

    TokuDB supports an upsert optimization that is applicable to your second test. Typically, the “insert on dup key update” statements do a point query to read the row. If the row exists, then the row is updated. Otherwise, the row is inserted. TokuDB has an optimization in which the point query is avoided and replaced by the injection of an upsert message into the root of the fractal tree. This can result in dramatic performance increase since the cost of reads is amortized.

    TokuDB currently detected update expressions like x=x+c, where c is a constant. We could support x=x+value(x). Let me know if you would like to run this test.

  • Rich,

    Was this upsert optimization used in Vadim’s tests? If not how do you turn this optimization on?

  • RIch,

    Like I said, unfortunately upsert part did not work in this benchmark, so it was mostly INSERT benchmark.
    I fixed workload and right now I am running a different test, which uses “x=x+value(x)” a lot.
    So I am very interested to have it optimized and would like to run it.

  • 2 svar:

    I can confirm that I never was able to get good results with InnoDB compression. Usually it was worse than without compression.
    So far I can say that I’ve seen only one specific use case useful with InnoDB compression: i.e. you have a separate table with TEXT column, which stores data, like this comment. In this case it might be helpful to compress this table to save some storage and do not have a big performance hit.

    That’s why Facebook has their own InnoDB compression, which is available in MySQL 5.6. So we may test MySQL 5.6 more to see what kind of performance it shows with compression.

  • Vadim,

    TokuDB does not use DirectIO with default setting. Hence, it maybe use more RAM than InnoDB in your benchmark.

  • Thanks Vadim,

    I did the test on 5.6 and load data infile concurrently results for two threads is 20 times slower then 1 file after the other. This could not be related to primary key worth distribution, memory or CPU as repeating the same test on uncompressed table scale as expected.

    Mark did you cover this case in your patches ? Add my vote on compression as we are facing more and more big data challenges where innoDB and TokuDB will have to be compare with LevelDB at some point

  • Vadim,

    > unfortunately upsert part did not work in this benchmark, so it was mostly INSERT benchmark

    What did you mean by that? In your tests you ran many INSERT .. ON DUPLICATE KEY UPDATE queries. That’s upsert, no?

    TokuDB seems to work pretty well (better than InnoDB in the 2nd test) for these upsert queries. So why did you say the upsert part did not work in this benchmark?

  • TokuDB supports an optimization for upserts that avoids the point query in some cases. This optimization was not used in the benchmark. Since TokuDB does not currently support the VALUES update expression, the optimization was turned off. We are adding this to our code and will let you know when it is supported.

  • svar,
    I have done some tests with concurrent load but did not see performance drops. It is reasonable that N threads doing concurrent load will be a lot slower than 1 for some N. Eventually you change from having all index blocks for all tables in memory to not and then disk read stalls occur. Not sure that was your problem or something else. PMP stacks or perf schema output might help debut it.

  • Mark thanks for the tip, this give me the motivation to put more efforts to explain what i saw. The same data load inside uncrompressed was good so PMP result need to follow, the only difference is the compression and range partitions. I’ll keep posting

  • Hi Vadim,

    I can confirm the steady downfall of InnoDB throughput down to ~1500 insert/sec (and counting) while TokuDB remained steadily around 20000 insert/sec.

    I inserted around 100Million rows using Tokutek’s iibench.py for the benchmark on a 2-way Supermicro server with 2 Xeon X5650 @2.7GHz, 96GB of RAM and a very fast LSI MegaRaid backed storage with SSD Cache.

    Aurimas worked on that machine a month ago.

  • Does Percona have improvements to keep the InnoDB insert buffer from getting full (as in do background reads for ibuf merges when it is getting full)? That is the key to sustaining good throughput for InnoDB on iibench.

  • From looking at the command line options for sysbench I’m assuming you modified the source. Can I get the modified sysbench source?

  • Vadim,

    I hadn’t heard of launchpad, but I figured it out. I’m running some TokuDB vs InnoDB performance tests in a sharded environment. This should be a big help. Thanks.

  • I don’t know how you’re getting those kinds of TPS numbers with either database. I’m benchmarking InnoDB (XtraDB) running on Percona 5.6 and getting only 2100 TPS, and that even using sysbench in read-only mode (no inserts).

    The server is a Dell PowerEdge R610, 12 x 3GHz Cores, 64GB RAM, 6 x Samsung SSD 840EVO 1TB drives (RAID 5).

    Check out this non-awesomeness…

    # sysbench –test=oltp –oltp-table-size=1000000 –mysql-db=test_innodb –db-driver=mysql –mysql-table-engine=innodb prepare
    # sysbench –test=oltp –oltp-table-size=1000000 –max-time=60 –oltp-read-only=on –max-requests=0 –num-threads=8 –db-driver=mysql –mysql-db=test_innodb run

    sysbench 0.4.12: multi-threaded system evaluation benchmark

    Running the test with following options:
    Number of threads: 8

    Doing OLTP test.
    Running mixed OLTP test
    Doing read-only test
    Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases)
    Using “BEGIN” for starting transactions
    Using auto_inc on the id column
    Threads started!
    Time limit exceeded, exiting…
    (last message repeated 7 times)

    OLTP test statistics:
    queries performed:
    read: 1782676
    write: 0
    other: 254668
    total: 2037344
    transactions: 127334 (2122.15 per sec.)
    deadlocks: 0 (0.00 per sec.)
    read/write requests: 1782676 (29710.12 per sec.)
    other operations: 254668 (4244.30 per sec.)

    Test execution summary:
    total time: 60.0023s
    total number of events: 127334
    total time taken by event execution: 479.3998
    per-request statistics:
    min: 1.96ms
    avg: 3.76ms
    max: 851.53ms
    approx. 95 percentile: 4.28ms

    Threads fairness:
    events (avg/stddev): 15916.7500/209.42
    execution time (avg/stddev): 59.9250/0.00

Leave a Reply