EmergencyEMERGENCY? Get 24/7 Help Now!

MHA Quick Start Guide

 | September 2, 2016 |  Posted In: High-availability, MySQL

PREVIOUS POST
NEXT POST

high availabilityMHA (Master High Availability Manager and tools for MySQL) is one of the most important pieces of our managed services. When properly set up, it can check replication health, move writer and reader virtual IPs, perform failovers, and have its output constantly monitored by Nagios. Is it easy to deploy and follows the KISS (Keep It Simple, Stupid) philosophy that I love so much.

This blog post is a quick start guide to try it out and play with it in your own testing environment. I assume that you already know how to install software, deal with SSH keys and setup replication in MySQL. The post just covers MHA configuration.

Testing environment

Taken from /etc/hosts

mysql-server1: Our master MySQL server with 5.6
mysql-server2: Slave server
mysql-server3: Slave server
mha-manager: The server monitors the replication and from where we manage MHA. The installation is also required to meet some Perl dependencies.

We just introduced some new concepts, the MHA Node and MHA Manager:

MHA Node

It is installed and runs on each MySQL server. This is the piece of software that it is invoked by the manager every time we want to do something, like for example a failover or a check.

MHA Manager

As explained before, this is our operations center. The manager monitors the services, replication, and includes several administrative command lines.

Pre-requisites

  • Replication must already be running. MHA manages replication and monitors it, but it is not a tool to deploy it. So MySQL and replication need to be running already.
  • All hosts should be able to connect to each other using public SSH keys.
  • All nodes need to be able to connect to each other’s MySQL servers.
  • All nodes should have the same replication user and password.
  • In the case of multi-master setups, only one writable node is allowed. All others need to be configured with read_only.
  • MySQL version has to be 5.0 or later.
  • Candidates for master failover should have binary log enabled. The replication user must exist there too.
  • Binary log filtering variables should be the same on all servers (replicate-wild%, binlog-do-db…).
  • Disable automatic relay-log purge and do it regularly from a cron task. You can use an MHA-included script called “purge_relay_logs”.

While that is a large list of requisites, I think that they are pretty standard and logical.

MHA installation

As explained before, the MHA Node needs to be installed on all the nodes. You can download it from this Google Drive link.

This post shows you how to install it using the source code, but there are RPM packages available. Deb too, but only for older versions. Use the installation method you prefer. This is how to compile it:

The commands included in the node package are save_binary_logs, filter_mysqlbinlog, purge_relay_logs, apply_diff_relay_logs. Mostly tools that the manager needs to call in order to perform a failover, while trying to minimize or avoid any data loss.

On the manager server, you need to install MHA Node plus MHA Manager. This is due to MHA Manager dependance on a Perl library that comes with MHA Node. The installation process is just the same.

Configuration

We only need one configuration file on the Manager node. The example below is a good starting point:

So pretty straightforward. It specifies that there are three servers, two that can be master and one that can’t be promoted to master.

Let’s check if we meet some of the pre-requisites. We are going to test if replication is working, can be monitored, and also if SSH connectivity works.

It works. Now let’s check MySQL:

Start the manager and operations

Everything is setup, we meet the pre-requisites. We can start our manager:

The manager found our master and it is now actively monitoring it using a SELECT command. –remove_dead_master_conf tells the manager that if the master goes down, it must edit the config file and remove the master’s configuration from it after a successful failover. This avoids the “there is a dead slave” error when you restart the manager. All servers listed in the conf should be part of the replication and in good health, or the manager will refuse to work.

Automatic and manual failover

Good, everything is running as expected. What happens if the MySQL master dies!?!

First, it tries to connect by SSH to read the binary log and save it. MHA can apply the missing binary log events to the remaining slaves so they are up to date with all the before-failover info. Nice!

Theses different phases follow:

The phases are pretty self-explanatory. MHA tries to get all the data possible from the master’s binary log and slave’s relay log (the one that is more advanced) to avoid losing any data or promote a slave that it was far behind the master. So it tries to promote a slave with the most current data as possible. We see that server2 has been promoted to master, because in our configuration we specified that server3 shouldn’t be promoted.

After the failover, the manager service stops itself. If we check the config file, the failed server is not there anymore. Now the recovery is up to you. You need to get the old master back in the replication chain, then add it again to the config file and start the manager.

It is also possible to perform a manual failover (if, for example, you need to do some maintenance on the master server). To do that you need to:

  • Stop masterha_manager.
  • Run masterha_master_switch –master_state=alive –conf=/etc/app1.cnf. The line says that you want to switch the master, but the actual master is still alive, so no need to mark it as dead or remove it from the conf file.

And that’s it. Here is part of the output. It shows the tool making the decision on the new topology and asking the user for confirmation:

You can also employ some extra parameters that are really useful in some cases:

–orig_master_is_new_slave: if you want to make the old master a slave of the new one.

–running_updates_limit: if the current master executes write queries that take more than this parameter’s setting, or if any of the MySQL slaves behind master take more than this parameter, the master switch aborts. By default, it’s 1 (1 second). All these checks are for safety reasons.

–interactive=0: if you want to skip all the confirmation requests and questions masterha_master_switch could ask.

Check this link in case you use GTID and want to avoid problems with errant transactions during the failover:

https://www.percona.com/blog/2015/12/02/gtid-failover-with-mysqlslavetrx-fix-errant-transactions/

Custom scripts

Since this is a quick guide to start playing around with MHA, I won’t cover advanced topics in detail. But I will mention a few:

    • Custom scripts. MHA can move IPs around, shutdown a server and send you a report in case something happens. It needs a custom script, however. MHA comes with some example scripts, but you would need to write one that fits your environment.The directives are master_ip_failover_script, shutdown_script, report_script. With them configured, MHA will send you an email or a message to your mobile device in the case of a failover, shutdown the server and move IPs between servers. Pretty nice!

Hope you found this quickstart guide useful for your own tests. Remember, one of the most important things: don’t overdo automation!  😉 These tools are good for checking health and performing the first initial failover. But you must still investigate what happened, why, fix it and work to avoid it from happening again. In high availability (HA) environments, automate everything and cause it to stop being HA.

Have fun!

PREVIOUS POST
NEXT POST
Miguel Angel Nieto

Miguel joined Percona in October 2011. He has worked as a System Administrator for a Free Software consultant and in the supporting area of the biggest hosting company in Spain. His current focus is improving MySQL and helping the community of Free Software to grow.

4 Comments

  • Hi Miguel Angel,

    A nice write-up on MHA.

    I looked at MHA some time ago and found it to work quite well. Setting up the required ssh and MySQL access was more of a nuisance especially if the number of servers grows, but the main thing that put a brake on MHA was the fact it could not fix broken intermediate masters. That is if the topology is only a master with directly connected slaves everything is ok. If you have a master connected to an intermediate master which has a slave MHA can not handle this, or that’s my recollection. Perhaps it has changed now. If you have servers spread over multiple datacentres then it is much more likely that intermediate masters may exist, and that’s also true if you migrate from one version of MySQL to another (major one) and can not take downtime. In that situation you’re likely to have a new intermediate master of the higher version running under the older master and this new master will have slaves of the newer version.

    This lead me to look at Orchestrator which is able to resolve the issues mentioned and also only requires MySQL access to the servers to do its work. Percona have had a couple of blogs on Orchestrator so that’s covered elsewhere but I think this “limitation” I saw in MHA if it is still present should be mentioned.

    • Simon, Thanks for the post. We are going to give Orchestrator a run in one of our environments seeking to replace MMM Monitor.

  • Thanks for the comment Simon.

    Multi-tier environments are not really supported:

    https://code.google.com/p/mysql-master-ha/wiki/Requirements#For_managing_three_or_more_tier_replication_environment..

    But master-master environments are. So, as usual in free software, you have to use the tool that helps you better 🙂

Leave a Reply