The MySQL Master-Master replication (often in active-passive mode) is popular pattern used by many companies using MySQL for scale out. Most of the companies would have some internal scripts to handle things as automatic fallback and slave cloning but no Open Source solution was made available.
Few months ago we were asked to implement such solution for one of the customers and they kindly agreed to let us release things under GPL2 License, and we gave them reduced rate for being Open Source friendly.
So what does this tool do and how it works ?
Currently it is implemented based on Linux IP management tools and LVM for snapshot creation but we hope support for other operation systems added in the future.
It can manage master-master pair as well as other configurations such as master-master and bunch of slaves.
Typically you would define “roles” for example READER and WRITER roles for most simply case and assign them to the severs. For example you can say both servers in pair can be reader at the same time but only one of them should be writable at the same time for master-master pair.
Each of the roles will have pair of IPs associated with it in this case and it will make sure all of these IPs are handled by some server, so you can use DNS for load balancing without worrying about TTL and similar things.
If Active server fails in the pair both its READER and WRITER role will be taken over by passive node and depending on monitoring configuration it can happen within few seconds.
Such IP based high availability and load balancing does not require any extra hardware or software and works well for bunch of applications which could be implemented using different languages etc, which makes application based fallbacks problematic.
The tool also takes extra caution to prevent application mistakes. What will happen if you will write to the slave because of application error ? Well you’ll break replication often without knowing about it. You can kind of solve it with using read-only user for slave connection but what if your application simply was misconfigured and things the server is master when it is not ? To take care of this Master Master Manager makes sure only one of the nodes is writable at all times and other is set to –read-only, so unless you use user with SUPER privilege you should be safe.
It has some other neat features, for example you may configure it to remove READER role from the server if it gets too delayed with replication (or if replication breaks) so you do not have to do it in each application.
We also took extra caution about making sure things can’t run out of sync silently. For example you might know if slave server reboots (say power goes down) you can’t be sure about data consistency because replication may be restarted with wrong position. Sometimes this shows up as an errors but for some query pattern it will not. MMM will detect this situation and will hold the server for administrator to decide.
One command LVM based sync is also implemented so restoring broken replication safe way becomes very easy.
Besides simple “cloning” of the nodes you can use same tool to create a backup with number of methods supported including compressed backup or incremental backups with rdiff.
Finally we have implemented safe role switch, meaning you can move writer to other node in clean way (making sure replication is still in sync) – I see too often this switch happens “dirty” way potentially risking replication inconsistencies. This is very handy if you want to restart one of servers in clusters to upgrade OS or add more RAM to the system.
This tool works well for number of customers and users but it is surely early software version so try it on your own risk and make sure to provide your feedback and suggestions.