EmergencyEMERGENCY? Get 24/7 Help Now!

268x Query Performance Increase for MongoDB with Fractal Tree Indexes, SAY WHAT?

 | August 30, 2012 |  Posted In: Tokutek, TokuView


Last week I wrote about our 10x insertion performance increase with MongoDB. We’ve continued our experimental integration of Fractal Tree® Indexes into MongoDB, adding support for clustered indexes.  A clustered index stores all non-index fields as the “value” portion of the index, as opposed to a standard MongoDB index that stores a pointer to the document data.  The benefit is that indexed lookups can immediately return any requested values instead of needing to do an additional lookup (and potential disk IOs) for the requested fields.

To create a clustered index you just need to add “clustering:true” as in the following example (note that version 2 indexes are Fractal Tree Indexes):

In this benchmark I measured the performance of a single threaded insertion workload combined with a range query retrieving 1000 documents greater than or equal to a random URI.  The range query runs on a separate thread and sleeps 60 seconds after each completed query.

The inserted documents contained the following: URI (character), name (character), origin (character), creation date (timestamp), and expiration date (timestamp).  We created a total of four secondary indexes: URI (clustered), name, origin, and creation date.

We ran the benchmark with journaling disabled and the default WriteConcern of disabled.

My benchmark client is available here.

Benchmark Environment

  • Sun x4150, (2) Xeon 5460, 8GB RAM, StorageTek Controller (256MB, write-back), 4x10K SAS/RAID 0
  • Ubuntu 10.04 Server (64-bit), ext4 filesystem
  • MongoDB v2.2.RC0

Benchmark Results

The exit velocity of standard MongoDB was 1,092 inserts per second at 38 million document insertions versus MongoDB with Fractal Tree Indexes exit velocity of 12,241 inserts per second at 49 million document insertions: an improvement of 1,020%.

More interesting is the query performance.  Note that this is a latency graph where lower is better and also that the Y-axis is on a log scale to make comparison easier.  MongoDB exited with an average of 16,668 milliseconds per query versus MongoDB with Fractal Tree Indexes average of 62 milliseconds: a 26,816% improvement.

As I said in my last post, we’re not MongoDB experts by any stretch but we wanted to share these results with the community and get people’s thoughts on applications where this might help, suggestions for next steps, and any other feedback. Also, if you are interested in learning more about TokuDB, please stop by to hear us speak at StrangeLoop, MySQL Connect, Percona Live, or join our introductory webinar next week.

By the way, MongoDB also supports covered indexes, which I will talk about in my next post.  Covered indexes can provide some of the benefits of a clustered index, but can have significant drawbacks as well.



  • Hi,

    great information!
    Do you know if there is any way to get this improvements by using the MongoDB C# Driver?


    • Benjamin,

      Our indexing performance improvements are within MongoDB itself so they are available to all clients, regardless of the driver language.


  • I’ve looked around on the mongodb site and I’ve not found any documentation for the clustering indexes. Is this something tokutek has developed as a plugin/upgrade?

    • Tyler,

      MongoDB supports covered indexes as is discussed in their documentation at http://docs.mongodb.org/manual/tutorial/create-indexes-to-support-queries/. Clustering indexes are exclusive to our implementation. Please let us know if you’d like to evaluate it.

  • The mongo documentation recommends you size indexes to fit in memory. How does the performance drop off as your database exceeds your machine’s memory size? And what about your other collections in the same database, their indexes will also be pushed out of memory.

    • The point of this experiment was to show how the two products behave on a mixed workload (inserts plus queries). In addition, the secondary index on the TokuMX collection is created clustered as this type of index allows for optimal range query performance. Since the queries are totally random there is no way that the indexes will fit in memory as the data set is constantly growing.

Leave a Reply