The challenge of handling massive data processing workloads has spawned many new innovations and techniques in the database world, from indexing innovations like our Fractal Tree® technology to a myriad of “NoSQL” solutions (here is our Chief Scientist’s perspective). Among the most popular and widely adopted NoSQL solutions is MongoDB and we became curious if our Fractal Tree indexing could offer some advantage when combined with it. The answer seems to be a strong “yes”.
Earlier in the summer we kicked off a small side project and here’s what we did: we implemented a “version 2” IndexInterface as a Fractal Tree index and ran some benchmarks. Note that our integration only affects MongoDB’s secondary indexes; primary indexes continue to rely on MongoDB’s indexing code. All the changes we made to the MongoDB source are available here. Caveat: this was a quick and dirty project – the code is experimental grade so none of it is supported or went through any careful design analysis.
For our initial benchmark we measured the performance of a single threaded insertion workload. The inserted documents contained the following: URI (character), name (character), origin (character), creation date (timestamp), and expiration date (timestamp). We created a total of four secondary indexes: URI, name, origin, and creation date. The point of the benchmark is to insert enough documents such that the indexes are larger than main memory and show the insertion performance from an empty database to one that is largely dependent on disk IO. We ran the benchmark with journaling disabled, then again with journaling enabled.
- Sun x4150, (2) Xeon 5460, 8GB RAM, StorageTek Controller (256MB, write-back), 4x10K SAS/RAID 0
- Ubuntu 10.04 Server (64-bit), ext4 filesystem
- MongoDB v2.2.RC0
Without journaling the exit velocity of standard MongoDB was 1,045 inserts per second at 54 million document insertions versus MongoDB with Fractal Tree Indexes exit velocity of 13,304 inserts per second at 198 million document insertions: an improvement of 1,173%. With journaling, MongoDB = 763 and MongoDB/FTI = 6,951: an improvement of 811%.
At this point there are several technical directions we could take and here are some we’ll likely be looking at more closely:
- Adding support for clustered indexes so keyed lookups can be fully satisfied by the index
- Implementing the primary key as a clustered index so that all indexing is handled by Fractal Tree indexes, not just secondary keys
- Running additional benchmarks for interesting use-cases
We’re not MongoDB experts by any stretch but we wanted to share these results with the community and get people’s thoughts on applications where this might help, suggestions for next steps, and any other feedback.