High Availability with mysqlnd_ms on Percona XtraDB Cluster

This is the second part of my series on High Availability with mysqlnd_ms. In my first post, “Simple MySQL Master HA with mysqlnd_ms,” I showed a simple HA solution using asynchronous MySQL replication. This time we will see how to leverage an all-primary cluster where you can write to all nodes. In this post I used Percona XtraDB Cluster, but you should also be able to do the same with MySQL NDB Cluster or Tungsten Replicator.

To start with, here is the mysqlnd_ms configuration I used: mysqlnd_ms_mm.ini .  All of these files are available from my Github repository. Below, I have three Percona XtraDB Cluster nodes, all defined as masters and no slaves. I’ve configured a roundrobin filter where all connections will happen on the first node, in this case 192.168.56.44 . In case the first node fails, the second node will be used and so forth until no more nodes are available. Another interesting configuration option here is the loop_before_master strategy whereby if connection or a statement to the current server fails, it will be retried silently on the remaining nodes before returning an error to the user, more on this below.

Similar to my previous post, I also used a custom INI file for PHP to use, this time aptly named master-master.ini :

A new addition to this configuration is mysqlnd_ms.multi_master , when enabled it would allow you to use all nodes or just one and treat the others as passive. The PHP script I used this time is called master-master.php , it is largely similar to master-slave-ng.php with a few differences:

  1. There is no need for  /tmp/PRIMARY_HAS_FAILED  sentinel as all nodes were writable.
  2. There is no need for  /*ms=master*/  SQL hint when validating a connection from connect_mysql function since all nodes acts as master.

So here is a quick test, first with roundrobin filter, after 4 INSERTs, I shutdown  192.168.56.44  which sends my connection to the next server in the configuration, 192.168.56.43 . When I started back  192.168.56.44  again, the script resumed connections there. Pretty cool right?

Here’s another test using the random filter which allows you to write to all nodes, on my mysqlnd_ms_mm.ini above, I just changed  roundrobin  to random . As you can see, all three nodes were being used, of course in random, at the same time you will also see when I shutdown  192.168.56.44  around where the  connect_mysql  errors and then the server was used again near the bottom after a started it back up. Still pretty cool right?

So here are some issues I’ve observed during these tests:

  1. remember_failed  during failover does not work as advertised. Supposedly, a failed node should not be used again for every connection request but in my test, this is not the case. See more from this bug. This means that if you have 2 out of 3 failed nodes in this scenario the overhead would be too big when testing both connections. Perhaps some sort of in memory shared TTL can be used to overcome this? I’m not sure.
  2. If you look closely around line 7 on my last output above the error displayed is kind of misleading. In particular it says ERRROR: 192.168.56.43 via TCP/IP , whereby it was not  192.168.56.43  that failed, it was 192.168.56.43 . This is because under the hood, immediately after failure the next node will be cycled to, this is especially true since we have loop_before_master configured. I sure do have a bug on the script that should capture the