Thoughts on Small Datum – Part 2

If you did not read my first blog post about Mark Callaghan’s (@markcallaghan) benchmarks as documented in his blog, Small Datum, you may want to skim through it now for a little context.


On March 11th, Mark, a former Google and now Facebook database guru, published an insertion rate benchmark comparing MySQL outfitted with the InnoDB storage engine with two NoSQL alternatives — basic MongoDB and TokuMX (the Tokutek high-performance distribution of MongoDB).  In these particular tests Mark uses flash storage media. Here are my cliff notes (a shoutout to @mipsytipsy for the apt description) and my thoughts on the business implications.

If your big data applications are write-intensive you may already know their performance characteristics are primarily governed by insertion rate, which in turn is governed by write efficiency. In this benchmark Mark compares the insertion rates for the aforementioned databases using the open-source Indexed Insertion Benchmark (aka iiBench).  He generates his data using a 100 million row database, then runs the benchmark again adding an additional 400 million rows.

Mark’s tests clearly show that MySQL (with the InnoDB storage engine), and TokuMX, outperform basic MongoDB by a wide margin.  In fact, in these tests TokuMX is at least twice as fast as basic MongoDB.

I’ve graphed some of these insertion rate results for those of us who tend toward visual learning. I used the second / larger of the two tests for the graph and included just two of his MySQL tests for simplicity (he tried a larger number of MySQL configurations with similar results).  I’ve labeled the two I am using with “(c)” for compressed MySQL and “(u)” for uncompressed.


I show uncompressed MySQL results because they show a far better insertion rate than either TokuMX or MongoDB (not trying to hide from it).  But, size really does matter.  MySQL without compression has better insertion rates but the rate of database growth and the write amplification characteristics are undesirable. I.e., I feel the MySQL results with compression is the apples-to-apples comparison. You should also check out my footnote at the bottom of this post.

Bottom line: Mark’s insertion rate tests clearly show MySQL with InnoDB and TokuMX significantly outperform basic MongoDB. If your application is a write intensive NoSQL application, it will perform significantly better with TokuMX (versus basic MongoDB).  In fact, real-world customer results and other benchmark data suggest this is just the tip of the insertion rate iceberg.  With TokuMX you will more likely see  a 20x – 80x improvement.

But you don’t have to take our word for it.  You can try these tests, or, even better, test your own MongoDB applications running on TokuMX in your own environment by downloading the free community version of TokuMX (or TokuDB) here. If you need it, the iiBench benchmark is available here. If you run your own tests, I’d love to hear from you.

One footnote: TokuDB (the Tokutek high-performance storage engine alternative to InnoDB) is not covered in Mark’s benchmark.  It delivers better performance, smaller database size, and better write amplification characteristics than MySQL with InnoDB.  But that’s a story for another blog.

You can see all the gory details on Mark’s insertion rate benchmark here.

As always, your thoughts and comments are welcome.  You can also reach me on Twitter via @dcrosenlund.

Next time, in Thoughts on Small Datum – Part 3, this marketer’s summary and graphs for Mark’s benchmark on TokuMX, MongoDB and InnoDB versus the insert benchmark with disks.

Share this post