Former AOL chief database architect, now independent database consultant, John Schulz guest blogs about his experience with basic MongoDB (from MongoDB, Inc.) and TokuMX™. If you’d like to speak with John about your own big data challenges you can contact him via email at john_schulz at aol dot com.
Before leaving AOL to start my own consulting practice earlier this year I helped AOL investigate a significant application performance issue encountered with a MongoDB application. We quickly discovered the root cause of our problem was a significant drop-off in performance when operating at scale (the application database in question had quickly grown to billions of small documents). Even after efforts to minimize the issue, it was clear the problem was not going to away when operating at these scales. What was needed was a way to compress the database to fit into memory and/or a different approach to storage that would greatly reduce storage I/O.
Our investigation pointed us here, to Tokutek, and their distribution of MongoDB – called TokuMX.
Tokutek claims that without any modification to application code their MongoDB implementation will compress the typical MongoDB database by 4X or more. We also read that TokuMX supports clustering indexes that minimize storage I/O in a way that isn’t possible with basic MongoDB. If their claims were true, we’d be able to attack the problem from both ends. As TokuMX is open source, we were able to freely obtain the community edition and do our own benchmarks. For our tests, we simply used our existing MongoDB application without modification.
The benchmark results demonstrated to us the Tokutek claims were more than just marketing hyperbole. For example, with TokuMX, documents scanned per second did not degrade as the database grew in size – as it did with basic MongoDB. On disk writes per second, TokuMX consistently out-performed basic MongoDB by a factor of 2X or better. And from a storage I/O point of view, TokuMX I/O rates remain consistently low as the database grows in size. Basic MongoDB I/O requirements are significantly higher from the outset, and degrade as the database grows.
I encourage you to read my full report here, in the May 22nd edition of Database Journal. Feel to contact me with questions or if you’d like my assistance tackling your own big data challenges.