268x Query Performance Increase for MongoDB with Fractal Tree Indexes, SAY WHAT?

Last week I wrote about our 10x insertion performance increase with MongoDB. We’ve continued our experimental integration of Fractal Tree® Indexes into MongoDB, adding support for clustered indexes.  A clustered index stores all non-index fields as the “value” portion of the index, as opposed to a standard MongoDB index that stores a pointer to the document data.  The benefit is that indexed lookups can immediately return any requested values instead of needing to do an additional lookup (and potential disk IOs) for the requested fields.

To create a clustered index you just need to add “clustering:true” as in the following example (note that version 2 indexes are Fractal Tree Indexes):

In this benchmark I measured the performance of a single threaded insertion workload combined with a range query retrieving 1000 documents greater than or equal to a random URI.  The range query runs on a separate thread and sleeps 60 seconds after each completed query.

The inserted documents contained the following: URI (character), name (character), origin (character), creation date (timestamp), and expiration date (timestamp).  We created a total of four secondary indexes: URI (clustered), name, origin, and creation date.

We ran the benchmark with journaling disabled and the default WriteConcern of disabled.

My benchmark client is available here.

Benchmark Environment

  • Sun x4150, (2) Xeon 5460, 8GB RAM, StorageTek Controller (256MB, write-back), 4x10K SAS/RAID 0
  • Ubuntu 10.04 Server (64-bit), ext4 filesystem
  • MongoDB v2.2.RC0

Benchmark Results

The exit velocity of standard MongoDB was 1,092 inserts per second at 38 million document insertions versus MongoDB with Fractal Tree Indexes exit velocity of 12,241 inserts per second at 49 million document insertions: an improvement of 1,020%.

More interesting is the query performance.  Note that this is a latency graph where lower is better and also that the Y-axis is on a log scale to make comparison easier.  MongoDB exited with an average of 16,668 milliseconds per query versus MongoDB with Fractal Tree Indexes average of 62 milliseconds: a 26,816% improvement.

As I said in my last post, we’re not MongoDB experts by any stretch but we wanted to share these results with the community and get people’s thoughts on applications where this might help, suggestions for next steps, and any other feedback. Also, if you are interested in learning more about TokuDB, please stop by to hear us speak at StrangeLoop, MySQL Connect, Percona Live, or join our introductory webinar next week.

By the way, MongoDB also supports covered indexes, which I will talk about in my next post.  Covered indexes can provide some of the benefits of a clustered index, but can have significant drawbacks as well.

Share this post

Comments (10)

  • Benjamin Abt


    great information!
    Do you know if there is any way to get this improvements by using the MongoDB C# Driver?


    October 17, 2012 at 1:29 pm
    • Tim Callaghan


      Our indexing performance improvements are within MongoDB itself so they are available to all clients, regardless of the driver language.


      October 17, 2012 at 2:27 pm
  • Fulano Tal


    thanks for sharing these benchmarks. Do you also have data for the standard deviation etc.?


    April 22, 2013 at 10:43 pm
    • Tim Callaghan

      The raw data used for the graphs is available at here.

      April 23, 2013 at 1:49 pm
      • Fulano Tal

        Thank you. 🙂

        May 7, 2013 at 2:34 pm
  • Tyler

    I’ve looked around on the mongodb site and I’ve not found any documentation for the clustering indexes. Is this something tokutek has developed as a plugin/upgrade?

    May 16, 2013 at 3:27 pm
  • Bobo

    The mongo documentation recommends you size indexes to fit in memory. How does the performance drop off as your database exceeds your machine’s memory size? And what about your other collections in the same database, their indexes will also be pushed out of memory.

    November 10, 2014 at 8:09 pm
    • Tim Callaghan

      The point of this experiment was to show how the two products behave on a mixed workload (inserts plus queries). In addition, the secondary index on the TokuMX collection is created clustered as this type of index allows for optimal range query performance. Since the queries are totally random there is no way that the indexes will fit in memory as the data set is constantly growing.

      November 11, 2014 at 6:46 am

Comments are closed.