Getting ready for tomorrow’s MongoDB Boston conference (come say hi if you see us!), I’m spending some time thinking about a post last week by Bryce Nyeggen: The Genius and Folly of MongoDB. It hits home in a lot of ways for me and the whole TokuMX team, because it mimics exactly the impetus we discovered for building TokuMX.
Bryce’s main point is that “MongoDB is easy to make fun of.” And that’s true, but only because one vocal group of people made a bunch of claims about MongoDB that took it way past its strengths. Those people got laughed out of town by more experienced users, and MongoDB was left stuck with a reputation of making bolder claims than they could back up—even though they weren’t necessarily the ones making them.
From our perspective, MongoDB gets a lot of things right. Their critical innovation seems to be the design of a data modeling and manipulation language that is easy to use, semantically powerful and expressive, and trivially scalable to a sharded and replicated architecture. One thing everyone (including me) seems to agree on: writing applications using MongoDB is a lot of fun.
However, MongoDB still has problems. Bryce’s post, and most other criticisms of MongoDB, point to deeper flaws. There are architectural deficiencies under the surface that you won’t notice until you start running it at a large scale: on large data sets, or with high concurrency, or with low latency requirements. In our view, all these deep problems come from one specific place: MongoDB’s storage layer.
One can easily imagine the excitement with which the original authors wrote the early MongoDB code. The storage layer, especially if you look at it before journaling was introduced, is extremely simple. It was the fastest path to something that would get data on disk so they could start playing with the abstractions, where their real innovations were. There’s no shame in that, but it puts MongoDB in a fairly shallow position: it’s easy to get started so they get a lot of early traction, but really big applications invariably struggle to grow.
The way they did journaling, if you squint, is an absolutely brilliant hack that got them what they needed without spending more time than they needed to on storage, and without changing the file format for existing databases. But it left them in an awkward state where serialization, locking, and crash safety are tied together and very hard to incrementally improve.
By contrast, Tokutek’s key innovation is doing durable storage right, and by “right” we mean fast, small, concurrent, and reliable. In the same way that 10gen put all their efforts into aspects of the database other than the storage code, we’ve put all our efforts into our storage engine, but haven’t spent any time making its interface easy to use (it’s still just a C API).
When we looked at MongoDB’s situation, we found a great opportunity. We decided to take the best parts of the MongoDB ecosystem—the data model, query language(s), cluster management, and widespread driver support—and just replace its meager storage code with our robust storage engine. Their storage code allowed some parts of their system to make certain assumptions (for example, replication assumes all the writes on the primary are effectively single-threaded) that aren’t true for TokuMX (we’re fully concurrent for writes), so there were some other subsystems that we needed to tweak or rewrite, but we’ve been able to keep all the good parts of MongoDB that make it easy and fun to develop with.
What we’ve got now in TokuMX is the enjoyable development experience you get when you’re playing with 100 documents in MongoDB for the first time, in a polished, rock-solid database system that stays fresh and responsive no matter how big your application or your team grow. Whether you’ve never tried MongoDB before, or you use it and love it, or you wish you could use it but couldn’t look past its weaknesses, we think TokuMX is a great database system from top to bottom, and you’ll love using it.