Downloads

Blog

Simple MySQL Master HA with mysqlnd_ms

July 14, 2014

Author

Jervin Real

Insight for Developers

MySQL

Share this Post:

I had the pleasure of presenting to the PHP Users Group Philippines a few days ago about mysqlnd_ms. The mysqlnd plugin, MySQL Master Slave, is a transparent layer on top of mysqlnd extension. This allows you to do read-write splitting and slave reads load balancing without needing to change anything from your application. But do you know you can also achieve a form of high availability with this plugin? I shared 2 forms on my presentation, using async MySQL replication either in master-slave configuration or master-master configuration, while the second form is having an all primary cluster where you can write to all nodes.

This first part is to demonstrate how you can achieve a simple HA solution using the first form. First, all the sample code here can be found on my GitHub repository. So, to use the mysqlnd_ms plugin, it uses an additional external configuration file in JSON format. This configuration file, will define your master and slave nodes, failover properties and any filters (connection selection method) you want to dictate how the algorithm will provide you the connection.

Let’s start with the mysqlnd_ms configuration I used, mysqlnd_ms_ms.ini :

{
  "primary": {
    "master": {
      "master_1": {
        "host": "127.0.0.1",
        "port": "33001"
      }
    },
    "slave": {
    }
  },
  "standby": {
    "master": {
      "master_1": {
        "host": "127.0.0.1",
        "port": "33002"
      }
    },
    "slave": {
    }
  }
}

{

"primary": {

"master": {

"master_1": {

"host": "127.0.0.1",

"port": "33001"

}

"slave": {

}

"standby": {

"master": {

"master_1": {

"host": "127.0.0.1",

"port": "33002"

}

"slave": {

}

Here, I have two applications defined, one called “primary” and another called “standby”, I have not defined any slaves for simplicity. The two MySQL instances running on port 33001 and 33002 are in master-master configuration.

mysqlnd_ms.enable = 1
mysqlnd_ms.disable_rw_split = 1
mysqlnd_ms.force_config_usage = 1
mysqlnd_ms.config_file = /home/revin/git/demo-me/phpugph201407/mysqlnd_ms_ms.ini

mysqlnd_ms.enable = 1

mysqlnd_ms.disable_rw_split = 1

mysqlnd_ms.force_config_usage = 1

mysqlnd_ms.config_file = /home/revin/git/demo-me/phpugph201407/mysqlnd_ms_ms.ini

This is the custom INI file I used for the tests, master-slave.ini . The first line simply enables the plugin for use. The second line, mysqlnd_ms.disable_rw_split instructs the plugin that I should only send all queries to the master because I only have masters for this test.

As for the PHP script, the full copy can be found here, as it is a bit lengthy I will just explain the logic on what it does.

1. To start the test, it bootstraps the test table via DROP and then CREATE queries.

1. It then enters a for loop where it will execute an INSERT followed by a SELECT to validate the newly inserted row and additional information like the current active server id and the connection id.

1. For every iteration of the loop, a new mysqli object is created to simulate non-persistent connections to the database server.

1. To create the new connection, a call to the function connect_mysql is made which returns a mysqli object when successful. An important thing to remember here is that mysqlnd_ms uses lazy connections by default, this means that when the mysqli object is created, it is not really connected yet to the server. One has to issue a statement like 'SELECT 1' to start the connection manually or call mysqli::real_connect . Not even mysqli::ping does not work without the former, I’ve opened this bug.

1. After the mysqli object is returned, the INSERT statement will trigger mysqlnd_ms to actually establish the connection and then execute the statement. This is where the good part is, if the connection cannot be made, the query_write_mysql function will know and will re-request the connection from connect_mysql, this time within the connect_mysql function, connection to the primary will be retried at least 10 times if the type of error from the previous failure is something related to a connection like error numbers 2002 and 2003 . If the connection cannot be established after 10 retries, the application creates a sentinel file as /tmp/PRIMARY_HAS_FAILED and will retry the connection to the secondary (slave or passive-master).

Here is an example run, my primary has a server id or 101 while my standby is 102:

[revin@forge phpugph201407]$ php -c master-slave.ini master-slave-ng.php
Last value 0001 from server id 101 thread id 7
Last value 0003 from server id 101 thread id 8
37: [2002] Connection refused
Connection to host 'primary' failed: [0] Connection refused, retrying (1 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (2 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (3 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (4 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (5 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (6 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (7 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (8 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (9 of 10) in 3 seconds
Connection to host 'primary' failed: [0] Connection refused, retrying (10 of 10) in 3 seconds
The primary host 'primary' has failed after 30 seconds, failing over to standby!
52: [2002] Connection refused
Last value 0004 from server id 102 thread id 635
Last value 0006 from server id 102 thread id 636
Last value 0008 from server id 102 thread id 637
[...]

[revin@forge phpugph201407]$ php -c master-slave.ini master-slave-ng.php

Last value 0001 from server id 101 thread id 7

Last value 0003 from server id 101 thread id 8

37: [2002] Connection refused

Connection to host 'primary' failed: [0] Connection refused, retrying (1 of 10) in 3 seconds

Connection to host 'primary' failed: [0] Connection refused, retrying (2 of 10) in 3 seconds

Connection to host 'primary' failed: [0] Connection refused, retrying (3 of 10) in 3 seconds

Connection to host 'primary' failed: [0] Connection refused, retrying (4 of 10) in 3 seconds

Connection to host 'primary' failed: [0] Connection refused, retrying (5 of 10) in 3 seconds

Connection to host 'primary' failed: [0] Connection refused, retrying (6 of 10) in 3 seconds

Connection to host 'primary' failed: [0] Connection refused, retrying (7 of 10) in 3 seconds

Connection to host 'primary' failed: [0] Connection refused, retrying (8 of 10) in 3 seconds

Connection to host 'primary' failed: [0] Connection refused, retrying (9 of 10) in 3 seconds

Connection to host 'primary' failed: [0] Connection refused, retrying (10 of 10) in 3 seconds

The primary host 'primary' has failed after 30 seconds, failing over to standby!

52: [2002] Connection refused

Last value 0004 from server id 102 thread id 635

Last value 0006 from server id 102 thread id 636

Last value 0008 from server id 102 thread id 637

[...]

This is not the perfect setup and there are a number of limitations, however it tells us that if you have a simple HA requirement like if you’re not running a very critical application but still do not want to be waken up at night but rather deal with issues in the morning, this might just fit. So here are some more notes:

- - - If you have a master-slave configuration, you just basically shot your primary (master) in the foot during the failover. You may need to rebuild its data in the morning.
  - - If instead you have master-master, you might just be able to bring the primary master back online, get it caught up in replication and then delete /tmp/PRIMARY_HAS_FAILED file to switch your application back to it.
  - - The use of /tmp/PRIMARY_HAS_FAILED sentinel file is rudimentary, its not the only way. You should consider sending notifications to yourself when failover happens because this method requires human intervention to put back the primary master back in rotation.

The same effect can be achieved with a little more coding, but you can already take advantage of the plugin with less.

I’ve also tested the plugin on the second form where you can write to multiple masters using Percona XtraDB Cluster. I’ve found a few interesting issues there so stay tuned.

0 0 votes

Article Rating

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Tim Toennies

9 years ago

Jervin, we are trying to use this code and are having an issue that has us scratching our heads. We have three nodes setup in the cluster, we can stop the MySQL process on any one or two nodes and the application fails over correctly to the active node. But when we shutdown the entire node the application won’t failover. We’ve have tried this with multiple nodes as the primary and get the same result. Any suggestions for where we might have made an error? Thanks for your help.

Author

Jervin Real

9 years ago

Tim,

Are you running a Galera cluster? When you say shutdown entire node, do you mean shutdown one of the physical servers on the cluster? What error do you get when it does not failover?

Tim Toennies

9 years ago

Reply to Jervin Real

Thank you for the response Jervin. Yes, we are using a Galera cluster and for the most part it has worked really well; met all our expectations. And frankly, this does not seem like it’s an issue with the cluster itself. All of the nodes (3 of them) are virtual so when I say shut down the mean I mean shutdown the virtual server.

I don’t get an error; the application just locks up. We are doing more troubleshooting today to look at network traffic and see what the application is doing; if anything.

Any help or suggestions you can provide would be appreciated. Thanks!

Author

Jervin Real

9 years ago

Tim,

I would suggest to add some more instrumentation on the script you are using. By outputting the node it was currently connected to before shutting down a cluster node you will know which specific node to look into and collect data. Once you have it, collect SHOW GLOBAL STATUS at a minimum from the cluster node.

You can create a forum thread for this so it does not get too long in the comments section 🙂