With TokuMX 1.4 coming out soon, with (teaser) wonderful improvements made to sharding and updates (and plenty of other goodies), I’ve recently reminisced about how we got TokuMX to this point. We (actually, really John) started dabbling with integrating Fractal Tree® indexes into MongoDB in the summer of 2012, where we (really, he) prototyped using Fractal Tree indexes only for secondary indexes. As cool as that prototype was, it wasn’t production ready, and we knew we had a challenge creating a usable product.
Back in early 2013, when we were analyzing how we wanted to (actually) integrate Fractal Tree indexes into MongoDB, internally, we saw three choices:
Needless to say, we chose option #3, and TokuMX was born soon after (with a lot of hard work). What I would like to do now is give some insight into our reasoning. But really, it comes down to these reasons:
Here are the benefits and drawbacks we saw to each approach.
Using Fractal Tree indexes only for secondary indexes
The big advantage we saw was the ability to non-intrusively inject our technology into MongoDB. Users could opt-in only with secondary indexes that were likely to benefit. Theoretically, we could slide ourselves in and not worry about the rest of the MongoDB stack such as the query planner, replication, and sharding. All we would care about is parts where our index is modified, and where our index is queried. Theoretically, this would be small in scope.
Unfortunately, the disadvantages we saw were quite big:
Using Fractal Tree indexes for entire collections
The big benefits to this approach over the “just a secondary index” approach were the following:
Unfortunately, we reasoned that the disadvantages of the “just a secondary index” approach also applied here:
Using Fractal Tree Indexes for EVERYTHING
The third option was to “take control over more of the stack”, use Fractal Tree indexes for everything and completely replace the MongoDB storage code. The challenge (but not downside) was that this approach was a lot of work. However, it was arguably less work than getting either of the above options working. We had to become experts in how replication and sharding, and in some cases, rewrite existing algorithms with new ones to better utilize Fractal Tree indexes. The benefits we saw were:
But really, the BIGGEST benefit to this approach was the following: we could innovate on more of the MongoDB core server stack in ways the other approaches would not allow. Prior to TokuMX 1.4, such innovations include (but are not limited to):
For these reasons, we chose this option, and after some hard work, TokuMX was born.
What really has me excited about TokuMX 1.4 and beyond is that we have taken these innovations further. We have improved sharding and updates in ways that would be impossible had we taken another approach. Also, we have plans to improve other areas of the system in similar ways. So stay tuned.
Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.
Besides specific database help, the blog also provides notices on upcoming events and webinars.
Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below and we’ll send you an update every Friday at 1pm ET.