October 20, 2014

Benchmarking Percona Server TokuDB vs InnoDB

After compiling Percona Server with TokuDB, of course I wanted to compare InnoDB performance vs TokuDB.
I have a particular workload I’m interested in testing – it is an insert-intensive workload (which is TokuDB’s strong suit) with some roll-up aggregation, which should produce updates in-place (I will use INSERT .. ON DUPLICATE KEY UPDATE statements for that), so it will produce all good amount of reads.

A few words about the hardware: I am going to use new the Dell PowerEdge R420 with two Intel(R) Xeon(R) CPU E5-2450 0 @ 2.10GHz, 48GB of RAM and SATA SSD: Kingston HyperX 3K 240 GB.

Workload: I will use two different schemas. The first schema is from sysbench, and the table looks like:

CREATE TABLE sbtest$I (
id BIGINT UNSIGNED NOT NULL,
k INTEGER UNSIGNED DEFAULT '0' NOT NULL,
c CHAR(120) DEFAULT '' NOT NULL,
pad CHAR(60) DEFAULT '' NOT NULL,
PRIMARY KEY (id)
)
CREATE INDEX k on sbtest$I(k)

and tables sbtest$I_r10, sbtest$I_r100, sbtest$I_r1000, with roll-up sum for 10, 100, 1000 records in the main table.

and transactions for this workload are:

So this workload produces SEQUENTIAL inserts into Primary Key, that this is quite suitable for InnoDB, and in it we have random inserts into SECONDARY KEYS (k),
which is not so good for InnoDB.

So let’s see what results we have. The results are in TPS (more is better) and we start with empty tables.

Now, before looking at the graph, please do not jump to conclusions, as the graph is MISLEADING.

TokuDB-1

So we see that InnoDB performance steadily declines from 24000 tps to 18000 tps, but InnoDB can’t make 5h run. After 3h the disk is full, and InnoDB data size is about 210GB with 234.238.440 inserted records.
While TokuDB averages around 14000 tps mark with some periodical drops into 10000 tps area.
TokuDB datasize after 5h of run is about 50GB with 276.934.863 records.

So why do I say that the graph is misleading?
Obviously we can say that InnoDB is faster, but you should look into the steady decline of InnoDB throughput. Eventually it will drop to the level 14000 tps and below. I do not have enough space on this SSD to run this experiment that long. So there we see the strong side of TokuDB: it has more than 4x data compression on this dataset.
We can easily fill TokuDB tables with 1bln of rows on this SSD, and projected InnoDB performance on this size will be the same or worse, but will require 1TB in size.

Now to see this point of intersection, let’s review different workload (which actually is closer to what I need).

Tables looks like:

and transactions are:

That is, our PRIMARY KEY is not sequential anymore, which is bad for InnoDB, but this is what I need for my setup (I still can have synthetic auto_inc PK, but in that case I still will need SECONDARY KEY (hid,mid,id) ).

Also please note the transaction produces 4 INSERTs and workload is very write intensive.

So what are results in this case:

TokuDB-2

InnoDB gradually declines as data growth (which is expected) and by the end of 5 hours averages at 2700 tps.
With TokuDB we also see a drop, and by end of 5 hours the average throughput is 7800 tps.
Something to take into account: TokuDB results are not quite stable, that is why I also show 5-minute moving averages to TokuDB.

So TokuDB shows about 2.8x better throughput, and on data size:

  • InnoDB table: 58GB and 244.980.192 records
  • TokuDB table: 15GB and 232.927.460 records

So TokuDB looks better in this workload, however the sparse throughput is worrisome to me. Let’s zoom in to 10 min intervals and see throughput:
TokuDB-3
We can see periodical drops, which I believe are related to 60-sec checkpoint interval, as TokuDB does time-based checkpoints.
These drops are quite concerning, and it might be a problem for some users.

Now, I understand that my PRIMARY KEY (hid,mid,id) where id is sequential, and hid,mid is low selectivity is not good for fast inserts, but it is suitable for range selects by id. However it will interesting how both InnoDB and TokuDB performs if PK is (id,hid,mid). This also will affect select performance, so we will need to measure that also.

And, if you want to repeat this benchmark, the sysbench code is on Launchpad lp:~vadim-tk/sysbench/insert-roll-2,
command line to run:

and InnoDB options are:

TokuDB-related options are all defaults, as I understand from documentation TokuDB comes with good settings out-of-box, but I am ready to tune something if there are suggestions.

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. The point at 685 sec when InnoDB performance drops, is that when data doesn’t fit all in RAM anymore? I would in general expect to see benefits from TokuDB when workload is diskbound. You could artificially reduce InnoDB buffer pool (and corresponding Tokutek setting) to have more SSD vs RAM.

  2. Alexander Zaitsev says:

    The main performance win of TokuDB is indexes. If you add multiple multi-column indexes the difference with InnoDB will be even more notable. Also it allows multiple clustering indexes that give additional huge win on selects.

    We used TokuDB in the past (versions 3.x and 4.x). It really shines on large tables and index scan like queries.

  3. Michael Schmidt says:

    Strikes me as odd that they actually take a previous InnoDB checkpoint benchmark from you as a reference for having outstanding checkpoint performance (eg http://www.tokutek.com/2012/04/tokudb-v6-0-frequent-checkpoints-with-no-performance-hit )

    Also, which TokuDB row compressor was used? Default or lzma? If you used the default one it would be highly interesting to see the performance with lzma compression.

  4. Michael,

    I still see quite drops during checkpoints, so I think the issue is not fixed fully yet.
    I used default TokuDB compression in this case.

  5. Gregory Schiano says:

    In your my.cnf conf I don’t see barracuda file format and did you create your table using row_format = compressed with InnoDB which enables tablespace compression ?

  6. Gregory,

    I do not use compression for InnoDB.

  7. Gregory Schiano says:

    Then why comparing ompressed tables and uncompressed table on size ? That’s clearly not relevant !
    Could you redo the same test with innodb compression ?

    Thanks

  8. Gregory,

    I am comparing workload that I am interested in.
    I am not going to use InnoDB compression in production, as it introduces more problems than it solves.
    So this comparison is relevant for me.

  9. Vadim,

    Thanks for a great performance report.

    Not sure I understand why InnoDB used 210 GB for 234M rows when loaded in PK order. What am I doing wrong with my hand-wavy estimates?

    234M * (200 + 20) for PK index +
    234M * (12 + 7) * 1.5 for fragmented secondary index +
    234M * (12 + 20) * 1.5 for fragmented sbtest_r10

    I assume:
    * 20 bytes for per-row metadata overhead in PK rows
    * PK index pages are 15/16 full as they are loaded in order
    * 7 bytes per-row metadata overhead for secondary index entries
    * fragmented pages are 2/3 full

    So my guess is:
    * 50G for PK index
    * 6G for secondary index on base table
    * 11G for sbtest_r10
    * much less for other rollup tables

  10. Mark,

    In this case roll-up tables are not really roll-up… I need to fix workload, and I will show numbers with real roll-up tables.
    so in this case the size of roll-up table is almost the same as of main table.

  11. The Storage part is interesting, but what about the access part. What happens when you change the ratios to include writes into the mix?

  12. Andy says:

    1) What concurrency did you use for the INSERT test? Don’t know if TokuDB handle concurrent workload as well as InnoDB

    2) Can you also run benchmark on SELECT by PK or some range query by a secondary index? I heard that TokuDB is slower than InnoDB on SELECT but it’d be interesting to see by how much.

  13. Rich Prohaska says:

    Hello Vadim,

    TokuDB supports an upsert optimization that is applicable to your second test. Typically, the “insert on dup key update” statements do a point query to read the row. If the row exists, then the row is updated. Otherwise, the row is inserted. TokuDB has an optimization in which the point query is avoided and replaced by the injection of an upsert message into the root of the fractal tree. This can result in dramatic performance increase since the cost of reads is amortized.

    TokuDB currently detected update expressions like x=x+c, where c is a constant. We could support x=x+value(x). Let me know if you would like to run this test.

  14. Andy says:

    Rich,

    Was this upsert optimization used in Vadim’s tests? If not how do you turn this optimization on?

  15. Andy,

    I used 32 concurrent threads to insert data.
    I plan benchmarks for SELECT, but it is later.

  16. RIch,

    Like I said, unfortunately upsert part did not work in this benchmark, so it was mostly INSERT benchmark.
    I fixed workload and right now I am running a different test, which uses “x=x+value(x)” a lot.
    So I am very interested to have it optimized and would like to run it.

  17. svar says:

    Vadim, can you confirm negative scalability with InnoDB compression and multi value inserts ?

  18. 2 svar:

    I can confirm that I never was able to get good results with InnoDB compression. Usually it was worse than without compression.
    So far I can say that I’ve seen only one specific use case useful with InnoDB compression: i.e. you have a separate table with TEXT column, which stores data, like this comment. In this case it might be helpful to compress this table to save some storage and do not have a big performance hit.

    That’s why Facebook has their own InnoDB compression, which is available in MySQL 5.6. So we may test MySQL 5.6 more to see what kind of performance it shows with compression.

  19. A big use-case for compression is to reduce the amount of fast storage you need to buy when fast storage is expensive.

  20. …so don’t expect it to not decrease performance but if a small loss of performance reduces your storage cost by more than half this can be a good compromise.

  21. Mark,

    So this exactly why I am looking for TokuDB.
    There we see compression much better than 2x with potentially the same or even better performance.

  22. Better compression and write-patterns that let you get longer lifetimes from lower-endurance (less expensive) flash devices — TokuDB can make a big difference

  23. David says:

    Vadim,

    TokuDB does not use DirectIO with default setting. Hence, it maybe use more RAM than InnoDB in your benchmark.

  24. svar says:

    Thanks Vadim,

    I did the test on 5.6 and load data infile concurrently results for two threads is 20 times slower then 1 file after the other. This could not be related to primary key worth distribution, memory or CPU as repeating the same test on uncompressed table scale as expected.

    Mark did you cover this case in your patches ? Add my vote on compression as we are facing more and more big data challenges where innoDB and TokuDB will have to be compare with LevelDB at some point

  25. Andy says:

    Vadim,

    > unfortunately upsert part did not work in this benchmark, so it was mostly INSERT benchmark

    What did you mean by that? In your tests you ran many INSERT .. ON DUPLICATE KEY UPDATE queries. That’s upsert, no?

    TokuDB seems to work pretty well (better than InnoDB in the 2nd test) for these upsert queries. So why did you say the upsert part did not work in this benchmark?

  26. Rich Prohaska says:

    TokuDB supports an optimization for upserts that avoids the point query in some cases. This optimization was not used in the benchmark. Since TokuDB does not currently support the VALUES update expression, the optimization was turned off. We are adding this to our code and will let you know when it is supported.

  27. svar,
    I have done some tests with concurrent load but did not see performance drops. It is reasonable that N threads doing concurrent load will be a lot slower than 1 for some N. Eventually you change from having all index blocks for all tables in memory to not and then disk read stalls occur. Not sure that was your problem or something else. PMP stacks or perf schema output might help debut it.

  28. svar says:

    Mark thanks for the tip, this give me the motivation to put more efforts to explain what i saw. The same data load inside uncrompressed was good so PMP result need to follow, the only difference is the compression and range partitions. I’ll keep posting

  29. 2 Andy.

    I mean that in most queries (99%) data was only inserted, not updated.
    I inserted data with mostly unique PK, That is part ON DUPLICATE KEY was not triggered.

  30. Reda NOUSHI says:

    Hi Vadim,

    I can confirm the steady downfall of InnoDB throughput down to ~1500 insert/sec (and counting) while TokuDB remained steadily around 20000 insert/sec.

    I inserted around 100Million rows using Tokutek’s iibench.py for the benchmark on a 2-way Supermicro server with 2 Xeon X5650 @2.7GHz, 96GB of RAM and a very fast LSI MegaRaid backed storage with SSD Cache.

    Aurimas worked on that machine a month ago.

  31. Does Percona have improvements to keep the InnoDB insert buffer from getting full (as in do background reads for ibuf merges when it is getting full)? That is the key to sustaining good throughput for InnoDB on iibench.

  32. Mark,

    We do not have this improvement in Percona Server yet.

  33. Rob Turpin says:

    From looking at the command line options for sysbench I’m assuming you modified the source. Can I get the modified sysbench source?

  34. Rob,

    I usually use sysbench from source code on launchpad.
    lp:sysbench

  35. Rob Turpin says:

    Vadim,

    I hadn’t heard of launchpad, but I figured it out. I’m running some TokuDB vs InnoDB performance tests in a sharded environment. This should be a big help. Thanks.

Speak Your Mind

*