October 31, 2014

Failover with the MySQL Utilities: Part 2 – mysqlfailover

In the previous post of this series we saw how you could use mysqlrpladmin to perform manual failover/switchover when GTID replication is enabled in MySQL 5.6. Now we will review mysqlfailover (version 1.4.3), another tool from the MySQL Utilities that can be used for automatic failover.

Summary

  • mysqlfailover can perform automatic failover if MySQL 5.6’s GTID-replication is enabled.
  • All slaves must use --master-info-repository=TABLE.
  • The monitoring node is a single point of failure: don’t forget to monitor it!
  • Detection of errant transactions works well, but you have to use the --pedantic option to make sure failover will never happen if there is an errant transaction.
  • There are a few limitations such as the inability to only fail over once, or excessive CPU utilization, but they are probably not showstoppers for most setups.

Setup

We will use the same setup as last time: one master and two slaves, all using GTID replication. We can see the topology using mysqlfailover with the health command:

Note that --master-info-repository=TABLE needs to be configured on all slaves or the tool will exit with an error message:

Failover

You can use 2 commands to trigger automatic failover:

  • auto: the tool tries to find a candidate in the list of servers specified with --candidates, and if no good server is found in this list, it will look at the other slaves to see if one can be a good candidate. This is the default command
  • elect: same as auto, but if no good candidate is found in the list of candidates, other slaves will not be checked and the tool will exit with an error.

Let’s start the tool with auto:

The monitoring console is visible and is refreshed every --interval seconds (default: 15). Its output is similar to what you get when using the health command.

Then let’s kill -9 the master to see what happens once the master is detected as down:

Looks good! The tool is then ready to fail over to another slave if the new master becomes unavailable.

You can also run custom scripts at several points of execution with the --exec-before, --exec-after, --exec-fail-check, --exec-post-failover options.

However it would be great to have a --failover-and-exit option to avoid flapping: the tool would detect master failure, promote one of the slaves, reconfigure replication and then exit (this is what MHA does for instance).

Tool registration

When the tool is started, it registers itself on the master by writing a few things in the specific table:

This is nice as it avoids that you start several instances of mysqlfailover to monitor the same master. If we try, this is what we get:

With the fail command, mysqlfailover will monitor replication health and exit in the case of a master failure, without actually performing failover.

Running in the background

In all previous examples, mysqlfailover was running in the foreground. This is very good for demo, but in a production environment you are likely to prefer running it in the background. This can be done with the --daemon option:

and it can be stopped with:

Errant transactions

If we create an errant transaction on one of the slaves, it will be detected:

However this does not prevent failover from occurring! You have to use --pedantic:

Limitations

  • Like for mysqlrpladmin, the slave election process is not very sophisticated and it cannot be tuned.
  • The server on which mysqlfailover is running is a single point of failure.
  • Excessive CPU utilization: once it is running, mysqlfailover hogs one core. This is quite surprising.

Conclusion

mysqlfailover is a good tool to automate failover in clusters using GTID replication. It is flexible and looks reliable. Its main drawback is that there is no easy way to make it highly available itself: if mysqlfailover crashes, you will have to manually restart it.

About Stephane Combaudon

Stéphane joined Percona in July 2012, after working as a MySQL DBA for leading French companies such as Dailymotion and France Telecom.

In real life, he lives in Paris with his wife and their twin daughters. When not in front of a computer or not spending time with his family, he likes playing chess and hiking.

Comments

  1. The mysqlfailover process is not a SPOF. If it fails the system as a whole continues to run. I think the same is true for MHA. This and also the manual restart issue can be fixed by running mysqlfailover with Solaris SMF, systemd, etc. or by running it on a cluster.

  2. Stephane Combaudon says:

    Daniël,

    I may not have used the right word: what I meant is that setting up mysqlfailover is not enough to guarantee automated failover for your database cluster. Is it acceptable? Sometimes it is, sometimes it’s not.
    Of course this can be fixed by adding another layer, but it adds some complexity.

  3. Gurbrinder Singh says:

    Hi

    Thanks for such a lovely explanation.
    Really useful!
    Can you please elaborate around the another layer we can add over mysqlfailover which although adds complexity but ensures more guarantee!

    Many thanks!

  4. Stephane Combaudon says:

    Gurbrinder,

    You could use for instance Pacemaker and Corosync to make mysqlfailover highly available. However there is some glue to write for it to work.
    Another solution could be to use Pacemaker to detect a master failure and let it trigger mysqlrpladmin to perform failover at the MySQL level.

  5. Bhavesh says:

    What are the steps to bring back the original master back in service as a slave ?

  6. Stephane Combaudon says:

    Bhavesh,

    You can use the change master to statement:
    mysql> change master to master_host=’new_master_ip’, master_user=’your_repl_user’, master_password=’your_repl_pwd’, master_auto_position=1;
    mysql> start slave;

    As GTIDs are used it’s not necessary to specify binlog coordinates.

  7. Gurbrinder Singh says:

    Hi

    Thanks a ton!
    We use VIP so is it any coding or mechanism by which VIP also failovers at same time when mysqlrpladmin command does it magic of switch over?

  8. Stephane Combaudon says:

    Gurbrinder,

    mysqlrpladmin will only take care of reconfiguring MySQL replication but you can use the –exec-after and –exec-before options to run external scripts that will move the VIP

Speak Your Mind

*