October 21, 2014

MongoDB Approach to Availability

Another thing I find interesting about MongoDB is its approach to Durability, Data Consistency and Availability. It is very relaxed and will not work for some applications but for others it can be usable in current form. Let me explain some concepts and compare it to technologies in MySQL space.

First I think MongoDB is best compared no to MySQL Server but MySQL Cluster, especially in newer versions which implement “sharding”. Same as commit to NDB Storage engine does not normally mean commit to disk, but rather commit to network it does not mean commit to disk with MongoDB, furthermore MongoDB uses Asynchronous replication, meaning it may take some time before data will be at more than one node. You can also use getLastError() to ensure data is propagated to the slave. So you can see it as a hybrid between MySQL Cluster and innodb_flush_log_at_trx_commit=2 mode. The second difference of course the fact MongoDB is not crash safe – similar to MyISAM database will need to be repaired if it crashes. Still I find behavior somewhat similar – you’re not expected to run MySQL Cluster without replication, MongoDB is practically the same.

Second – if we look at Replication Sets we find them very similar to MySQL Cluster though designed to work with Wide area network and so Async replication. There is voting required to pick the master node in case of node failure and at least 3 servers is recommended, where you can have some voting servers only cast their votes and hold no data. The other different is there is only one master rather than multiple. This is because doing master with asynchronous replication requires conflict resolution which can be tricky in general sense and MongoDB wants simplicity of operation for developers and administration.

Third if we look at how failover happens – same with NDB (native API) it is handled on driver level. When you connect to replication set you connect to set of server not one of them and if one server fails driver fails over to different master. Things are again tuned to deal with Asynchronous Replication. Consistency is maintained but at expense of certain changes may be thrown away/ “rolled back” in case of fail over.

This approach is not as clean as best possible “no committed data loss with almost instant fail over” but It makes sense for large number of applications. In fact using MySQL Replication for failover we’re operating with kind of similar situation, just with a lot less automation.

The good question of course is how robust these features are in MongoDB – many of them are new and Replication Sets are in development still. It may take a time for them to stabilize as well as later develop tools around them. How to check if 2 MongoDB nodes are indeed in sync ? How to do Hot Backups with point in time recovery ? These and many similar questions need to be answered and bugs worked out. One good example of early stage of MongoDB replication could be a bug mentioned during presentation today with replication breaking if time on master server is changed (MongoDB uses timestamps to identify events in replication log). It was just fixed last month I understood.

At the same time many things, including replication are a lot more simply with MongoDB and there is a lot less of old baggage so I hope it will be able to stabilize and mature quickly.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Bill Getas says:

    All well and good, but from a tiny IT shop perspective:

    MySQL sold out to Oracle, indirectly, possibly unknowingly. But it was still a sell out. Oracle will ruin in, not out of meanness or evil (though it is most definitely an evil multinational, also funding the same as Bill Gates, literally: let’s figure out how to get genetically modified mosquitos to vaccinate the unwitting masses) but out of simple association. We know Oracle by its fruits already. So MySQL is on its way out, and this is an inevitability if you ask anyone with a passion for and understanding of technology. An unmistakably crucial part of it very much is the David and not the Goliath.

    That leaves Percona in a bad place. Hence the Percona server, the closing association with MariaDB, and the SQL + NoSQL HandleSocket thingy, plus whatever else is coming. This is A Good Plan.

    One way to short circuit Mongo is to simply get the MySQL layer out of the way. Incorporate the ideas of Drizzle (speed, speed, and little else). Include the SphinxSearch for God’s sake! Not so much as module this, module that, but as a sensible, open system that can be gotten in one download and upgraded as pieces. Make this new Percona Beast the end-all, be-all for little DB admins and tech shops. Help us help ourselves, and benefit from the uprising of goodness, charging for custom stuff, support, etc. Show Oracle for the oaf that it is. Show Mongo for the newcomer that it is. The playing field is in a weird place now, with a particular opportunity for someone to really take over with an excellent, sensible open source offering.

    Just my opinion. Also, what prompted this is the post above seemed like it was written in some kind of desperation or ‘looking down’ upon Mongo, and this doesn’t really much beyond showing the obvious shortcomings of how Mongo doesn’t do what it should versus MySQL (or Maria). Mongo’s doing a great job, showing people the way that it can be. We’ve found their product, despite it being based heavily upon javascript (HAHAHA) to be rather eye opening in terms of speed. It’s up to Percona to show us all the Right Way it should be. Include a NoSQL + SQL + Sphinx in your new product, the Percona Beast !

    All the best from Maryland, Bill

  2. Nick says:

    Just an update on this, MongoDB now has journaling (like innodb), enabled by default: http://www.mongodb.org/display/DOCS/Journaling

Speak Your Mind

*