How we built TokuMXLeif.Walsh
When I get to talk to people about TokuMX and how it’s an optimized MongoDB, I sometimes get follow-up questions like:
- “Is it an in-memory proxy?”
- “Write optimized? So you buffer all of the writes in memory and lose them on crash?”
- “Did you re-implement the server and match the protocol?”
None of these things describe TokuMX, but it demonstrates that there are many schools of thought on how to optimize databases, and MongoDB in particular. I’d like to elaborate more on what TokuMX really is and how we built it. First, let’s talk about what MongoDB is.
MongoDB consists of a server process that stores data and executes queries, mongod, a sharding router process, mongos, as well as a wire protocol for interacting with these servers, and clients and language drivers that implement this protocol. On top of this, there is a community of developers and users, with admin and monitoring tools that use the protocol, comprising the MongoDB ecosystem.
TokuMX takes advantage of almost all of this ecosystem. If we zoom in on just mongod, the data storage and query execution process, that’s where the important change comes in TokuMX: we replaced the storage engine inside it with our own Fractal Tree indexing library, to store the data and indexes for MongoDB. There are a few changes in the router process to take advantage of some simplifications, but by and large the rest of the ecosystem: the router, wire protocol, query language, drivers, and tools are all unchanged. Why did we replace the storage engine in mongod? Storage engines are important because they control things like concurrency and compression.
So how did we build TokuMX? It’s fairly simple: we forked MongoDB v2.2 and systematically replaced all of the storage code with calls into our core fractal tree indexing API. In particular, we store documents and secondary indexes all inside of fractal tree indexes, which means our storage footprint is small and performance is excellent. One might think that this change is so big that it must be unstable. Fortunately, we’ve been testing and supporting the fractal tree index for years through our MySQL product (TokuDB), so it’s very mature.
The big takeaway is that TokuMX is the MongoDB you know and love but built on top of Fractal Tree indexes from Tokutek. As an engineering team, we’re interested in developing and delivering a performant, reliable, and polished experience for serious MongoDB deployments. If you haven’t already, try out TokuMX and see what a great storage engine can do for your database.