After having reviewed the definition my the previous post (The definitions), the next step is to respond to some questions.
That question is quite obvious but some times, it is skipped. It can also be formulated “What is the downtime cost of the service?”. In the cost, you need to include lost revenue from the service and you also need to consider less direct impact like loss of corporate image and other marketing costs. If your downtime cost is under $10/h, you can stop reading this document, you don’t need HA. For the others, let’s move on!
What is really tricky with MySQL is the number of possible HA solutions. From the simplest the most complex let’s list the most common ones:
– MySQL replication with manual failover
– Master-Master with MMM manager
– NDB Cluster
These technologies are by far, not a one size fits all and many deployments use combination of solutions. I will not cover ScaleDB and Continuent because I know almost nothing of these solutions. There are many more questions you need to ask yourself before being able to pick the right one. Below, I listed the most common questions, I might have missed some.
Since all the technologies do not offer the same level of availability, this is a first important sorting factor. Here are estimates of the level of availability offered by the various solutions.
|Level of availability|
|Simple replication||98 to 99.9+%|
|Master-Master with MMM manager||99%|
|Heartbeat/SAN (depends on SAN)||99.5% to 99.9%|
From the table, if your requirements are for 99.99%, you are restricted to NDB Cluster while if it is only 99% you have more options. I recall that the level of availability is hard to estimate and subject to debate. These are the usually accepted level of availability for these technologies.
Obviously, if you are concerned about loss of data, you are most likely using the InnoDB storage engine, since MyISAM is not transactional and do not sync data to disk. Similarly, MySQL replication is an asynchronous process and although it is fairly fast at transferring data between the master and the slaves, there is a window of time where data loss is possible.
If you can afford to lose some data, you can consider “MySQL replication” and “Master-Master with MMM manager” otherwise, you can only consider the other three solutions.
|Data 100% safe|
|Master-Master with MMM manager||no|
|Heartbeat/SAN (depends on SAN)||yes|
There are some features like Full text indexes and GIS indexes that are supported only by MyISAM. The HA solutions that work well with MyISAM are “MySQL replication” and “Master-Master with MMM manager”. Depending on the application, the MyISAM Full text indexes might be replaced by another search engine like Sphinx in order to remove the restriction. There is no HA solution other than the ones based on replication that handles GIS indexes.
|Need MyISAM Full text or GIS indexes||“MySQL replication” and “Master-Master with MMM manager”|
|Don’t use any special MyISAM feature||All|
|Can change MyISAM Full text to Sphinx||All|
The HA solutions we present are not equal in term of their write capacity. Due to the way replication is implemented, only one thread on the slave can handle the write operations. If the replication master is multi-cores servers and is heavily writing using multiple threads, the slaves will likely not be able to keep up. Replication is not the only technology that put a strain on the write capacity, DRBD, a shared storage emulator for Linux, also reduce by about 30% (very dependent on hardware) the write capacity of a database server. In term of write capacity here are you choices.
|Master-Master with MMM manager||Fair|
|Heartbeat/SAN (depends on SAN)||Excellent|
Since NDB Cluster is an integrated sharding environment, if you are planning for a growth that will need sharding (splitting the database over multiple servers), then you might need to take a serious at that solution. If not, then, apart from the write capacity, all the solutions are approximately equal.
There is a quite direct relationship between the level of availability and the complexity of the solution. In order to reach the promised level of availability, the staff maintaining the HA setup, either internal or external, must have the required level of expertise. The required expertise level is summarized in the table below.
|MySQL replication||Typical, average MySQL DBA + some Sysadmin skills|
|Master-Master with MMM manager||Good, average MySQL DBA + good Sysadmin skills|
|Heartbeat/SAN (depends on SAN)||High, Good MySQL DBA + strong Sysadmin skills|
|Heartbeat/DRBD||High, Good MySQL DBA + strong Sysadmin skills|
|NDB Cluster||Very high, Specific NDB knowledge, strom MySQL skills and strong Sysadmin skills|
The last aspect that needs to be considered is the budget, complexity is expensive. We will consider two types of setup. The first one is a basic proof of concept of the technology with the hardware tested, the data imported and basic testing and documentation. A proof of concept setup is a good way to get used to a technology and experiment with it in a test environment. The other type of setup we will consider is a full production setup that includes extensive testing, fire drills, full documentation, monitoring, alerting, backups, migration to production and post migration monitoring. Of course, it is the safest way to migrate an HA solution to production. All the times here are estimates based on field experience, the values presented here are fairly typical and contains some buffers for unexpected problems. Although an HA solution can be built remotely through a KVM over IP and adequate remote power management, an on site intervention with physical access to the servers is the preferred way, especially for the most complex solutions.
|Proof of concept||Migration to Production|
|MySQL replication||4 hours||12 hours|
|Master-Master with MMM manager||8 hours||24 hours|
|Heartbeat/SAN (depends on SAN)||32 hours||120 hours|
|Heartbeat/DRBD||40 hours||120 hours|
|NDB Cluster||40 hours||120 hours+|
Editor’s Note: We’ve gotten many questions about the time estimates mentioned here. The above estimates shouldn’t be used to compare against any specific situation. Time will vary greatly depending on your project. For example, “setting up replication” can be as simple as CHANGE MASTER TO, and can take as little as a few minutes in some circumstances. Yves’s estimate is for a project to create a replication slave for HA purposes, not for “setting up replication.” There is a big difference between an HA project and a DBA task. — Baron Schwartz