EmergencyEMERGENCY? Get 24/7 Help Now!

Simple MySQL Master HA with mysqlnd_ms

 | July 14, 2014 |  Posted In: Insight for Developers, MySQL


I had the pleasure of presenting to the PHP Users Group Philippines a few days ago about mysqlnd_ms. The mysqlnd plugin, MySQL Master Slave, is a transparent layer on top of mysqlnd extension. This allows you to do read-write splitting and slave reads load balancing without needing to change anything from your application. But do you know you can also achieve a form of high availability with this plugin? I shared 2 forms on my presentation, using async MySQL replication either in master-slave configuration or master-master configuration, while the second form is having an all primary cluster where you can write to all nodes.

This first part is to demonstrate how you can achieve a simple HA solution using the first form. First, all the sample code here can be found on my GitHub repository. So, to use the mysqlnd_ms plugin, it uses an additional external configuration file in JSON format. This configuration file, will define your master and slave nodes, failover properties and any filters (connection selection method) you want to dictate how the algorithm will provide you the connection.

Let’s start with the mysqlnd_ms configuration I used, mysqlnd_ms_ms.ini :

Here, I have two applications defined, one called “primary” and another called “standby”, I have not defined any slaves for simplicity. The two MySQL instances running on port 33001 and 33002 are in master-master configuration.

This is the custom INI file I used for the tests, master-slave.ini . The first line simply enables the plugin for use. The second line, mysqlnd_ms.disable_rw_split instructs the plugin that I should only send all queries to the master because I only have masters for this test.

As for the PHP script, the full copy can be found here, as it is a bit lengthy I will just explain the logic on what it does.

  1. To start the test, it bootstraps the test table via DROP and then CREATE queries.
  2. It then enters a for loop where it will execute an INSERT followed by a SELECT to validate the newly inserted row and additional information like the current active server id and the connection id.
  3. For every iteration of the loop, a new mysqli object is created to simulate non-persistent connections to the database server.
  4. To create the new connection, a call to the function  connect_mysql  is made which returns a mysqli object when successful. An important thing to remember here is that mysqlnd_ms uses lazy connections by default, this means that when the mysqli object is created, it is not really connected yet to the server. One has to issue a statement like  'SELECT 1'  to start the connection manually or call mysqli::real_connect . Not even  mysqli::ping  does not work without the former, I’ve opened this bug.
  5. After the mysqli object is returned, the INSERT statement will trigger mysqlnd_ms to actually establish the connection and then execute the statement. This is where the good part is, if the connection cannot be made, the query_write_mysql function will know and will re-request the connection from connect_mysql, this time within the connect_mysql function, connection to the primary will be retried at least 10 times if the type of error from the previous failure is something related to a connection like error numbers  2002  and 2003 . If the connection cannot be established after 10 retries, the application creates a sentinel file as  /tmp/PRIMARY_HAS_FAILED  and will retry the connection to the secondary (slave or passive-master).

Here is an example run, my primary has a server id or 101 while my standby is 102:

This is not the perfect setup and there are a number of limitations, however it tells us that if you have a simple HA requirement like if you’re not running a very critical application but still do not want to be waken up at night but rather deal with issues in the morning, this might just fit. So here are some more notes:

    • If you have a master-slave configuration, you just basically shot your primary (master) in the foot during the failover. You may need to rebuild its data in the morning.
    • If instead you have master-master, you might just be able to bring the primary master back online, get it caught up in replication and then delete  /tmp/PRIMARY_HAS_FAILED  file to switch your application back to it.
    • The use of  /tmp/PRIMARY_HAS_FAILED  sentinel file is rudimentary, its not the only way. You should consider sending notifications to yourself when failover happens because this method requires human intervention to put back the primary master back in rotation.

The same effect can be achieved with a little more coding, but you can already take advantage of the plugin with less.

I’ve also tested the plugin on the second form where you can write to multiple masters using Percona XtraDB Cluster. I’ve found a few interesting issues there so stay tuned.

Jervin Real

As Senior Consultant, Jervin partners with Percona's customers on building reliable and highly performant MySQL infrastructures while also doing other fun stuff like watching cat videos on the internet. Jervin joined Percona in Apr 2010.


  • Jervin, we are trying to use this code and are having an issue that has us scratching our heads. We have three nodes setup in the cluster, we can stop the MySQL process on any one or two nodes and the application fails over correctly to the active node. But when we shutdown the entire node the application won’t failover. We’ve have tried this with multiple nodes as the primary and get the same result. Any suggestions for where we might have made an error? Thanks for your help.

  • Tim,

    Are you running a Galera cluster? When you say shutdown entire node, do you mean shutdown one of the physical servers on the cluster? What error do you get when it does not failover?

    • Thank you for the response Jervin. Yes, we are using a Galera cluster and for the most part it has worked really well; met all our expectations. And frankly, this does not seem like it’s an issue with the cluster itself. All of the nodes (3 of them) are virtual so when I say shut down the mean I mean shutdown the virtual server.

      I don’t get an error; the application just locks up. We are doing more troubleshooting today to look at network traffic and see what the application is doing; if anything.

      Any help or suggestions you can provide would be appreciated. Thanks!

  • Tim,

    I would suggest to add some more instrumentation on the script you are using. By outputting the node it was currently connected to before shutting down a cluster node you will know which specific node to look into and collect data. Once you have it, collect SHOW GLOBAL STATUS at a minimum from the cluster node.

    You can create a forum thread for this so it does not get too long in the comments section 🙂

Leave a Reply