The content of this article is outdated, look here for more up to date information.
Over the last year, the frustration of many of us at Percona regarding issues with MMM has grown to a level where we started looking at other ways of achieving higher availability using MySQL replication. One of the weakness of MMM is its communication layer, so instead of reinventing a flat tire, we decided, Baron Schwartz and I, to develop a solution using Pacemaker, a well known and established cluster manager with a bullet proof communication layer. One of the great thing about Pacemaker is its flexibility but flexibility may results in complexity. With the help of people from the Pacemaker community, namely Florian Haas and Raoul Bhatia, I have been able to modify the existing MySQL Pacemaker resource agent in a way that it survived our replication tests and offered a behavior pretty similar to MMM regarding Virtual IP addresses, VIPs, management. We decided to call this solution PRM for Percona Replication Manager. All the parts are opensource and available under the GPL license.
Keep in mind this solution is hot from the press, consider it alpha. Like I said above, it survived testing in a very controlled environment but it is young and many issues/bugs are likely to be found. Also, it is different from Yoshinori Matsunobu’s MHA solution and in fact it is quite a complement to it. One of my near term goal is to integrate with MHA for master promotion.
The solution is basically made of 3 pieces:
- The Pacemaker cluster manager
- A Pacemaker configuration
- A MySQL resource agent
Here I will not cover the Pacemaker installation since this is fairly straightforward and covered elsewhere. I’ll discuss the MySQL resource agent and the supporting configuration while assuming basic knowledge of Pacemaker.
But, before we start, what does this solution offers.
- Reader and writer VIPs behaviors similar to MMM
- If the master fails, a new master is promoted from the slaves, no master to master setup needed. Selection of master is based on scores published by the slaves, the more up to date slaves have higher scores for promotion
- Some nodes can be dedicated to be only slaves or less likely to become master
- A node can be the preferred master
- If replication on a slave breaks or lags beyond a defined threshold, the reader VIP(s) is removed. MySQL is not restarted.
- If no slaves are ok, all VIPs, readers and writer, will be located on the master
- During a master switch, connections are killed on the demoted master to avoid replication conflicts
- All slaves are in read_only mode
- Simple administrative commands can remove master role from a node
- Pacemaker stonith devices are supported
- No logical limits in term of number of nodes
- Easy to add nodes
In order to setup the solution you’ll need my version of the MySQL resource agent, it is not yet pushed to the main Pacemaker resource agents branch. More testing and cleaning will be needed before that happen. You can get the resource agent from here:
You can also the whole branch from here:
On my Ubuntu Lucid VM, this file goes in /usr/lib/ocf/resource.d/heartbeat/ directory.
To use this agent, you’ll need a Pacemaker configuration. As a starting point, I’ll discuss the configuration I use during my tests.
node testvirtbox1 \
node testvirtbox2 \
node testvirtbox3 \
primitive p_mysql ocf:heartbeat:mysql \
params config="/etc/mysql/my.cnf" pid="/var/run/mysqld/mysqld.pid" \
socket="/var/run/mysqld/mysqld.sock" replication_user="root" \
replication_passwd="rootpass" max_slave_lag="15" evict_outdated_slaves="false" \
binary="/usr/bin/mysqld_safe" test_user="root" \
op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \
op monitor interval="2s" role="Slave" OCF_CHECK_LEVEL="1"
primitive reader_vip_1 ocf:heartbeat:IPaddr2 \
params ip="10.2.2.171" nic="eth0"
primitive reader_vip_2 ocf:heartbeat:IPaddr2 \
params ip="10.2.2.172" nic="eth0"
primitive reader_vip_3 ocf:heartbeat:IPaddr2 \
params ip="10.2.2.173" nic="eth0"
primitive writer_vip ocf:heartbeat:IPaddr2 \
params ip="10.2.2.170" nic="eth0" \
ms ms_MySQL p_mysql \
meta master-max="1" master-node-max="1" clone-max="3" clone-node-max="1" notify="true" globally-unique="false" target-role="Master" is-managed="true"
location No-reader-vip-1-loc reader_vip_1 \
rule $id="No-reader-vip-1-rule" -inf: readerOK eq 0
location No-reader-vip-2-loc reader_vip_2 \
rule $id="No-reader-vip-2-rule" -inf: readerOK eq 0
location No-reader-vip-3-loc reader_vip_3 \
rule $id="No-reader-vip-3-rule" -inf: readerOK eq 0
location No-writer-vip-loc writer_vip \
rule $id="No-writer-vip-rule" -inf: writerOK eq 0
colocation reader_vip_1_dislike_reader_vip_2 -200: reader_vip_1 reader_vip_2
colocation reader_vip_1_dislike_reader_vip_3 -200: reader_vip_1 reader_vip_3
colocation reader_vip_2_dislike_reader_vip_3 -200: reader_vip_2 reader_vip_3
property $id="cib-bootstrap-options" \
property $id="mysql_replication" \
rsc_defaults $id="rsc-options" \
Let’s review the configuration. First it begins by 3 node entries defining the 3 nodes I have in my cluster. One attribute is required to each node, the IP address that will be used for replication. This is a real IP address not a reader or writer VIP. This attribute allows the use of a private network for replication if needed.
Next is the mysql primitive resource declaration. This primitive defines the mysql resource on each node and has many parameters, here’s the ones I had to define:
- config: The path of the my.cnf file. Remember that Pacemaker will start MySQL, not the regular init.d script
- pid: The pid file. This is use by Pacemaker to know if MySQL is already running. It should match the my.cnf pid_file setting.
- socket: The MySQL unix socket file
- replication_user: The user to use when setting up replication. It is also currently used for the ‘CHANGE MASTER TO’ command, something that should/will change in the future
- replication_passwd: The password for the above user
- max_slave_lag: The maximum allowed slave lag in seconds, if a slave lags by more than that value, it will lose its reader VIP(s)
- evict_outdated_slaves: A mandatory to set this to false otherwise Pacemaker will stop MySQL on a slave that lags behind. This will absolutely not help its recovery.
- test_user and test_passwd: The credentials to test MySQL. Default is to run select count(*) on mysql.user table, so the user given should at least have select on that table.
- op monitor: An entry is needed for each role, Master and Slave. Intervals must not be the same.
Following the mysql primitive declaration, the primitives for 3 reader vips and one writer vip are defined. Those are straightforward so I’ll skip detailed description. The next interesting element is the master-slave “ms” declaration. This is how Pacemaker defines an asymmetrical resource having a master and slaves. The only thing that may change here is clone-max=”3″ which should match the number of database nodes you have.
The handling of the VIPs is the truly new thing in the resource agent. I am grateful to Florian Haas who told me to use node attributes to avoid Pacemaker from over reacting. The availability of a reader or writer VIPs on a node are controlled by the attributes readerOK and writerOK and the location rules. An infinite negative weight is given when a VIP should not be on a host. I also added a few colocation rules to help spread the reader VIPs on all the nodes.
As a final thought on the Pacemaker configuration, remember that in order for a pacemaker cluster to run correctly on a 2 nodes cluster, you should set the quorum policy to ignore. Also, this example configuration has no stonith devices defined so stonith is disable. At the end of the configuration, you’ll notice the replication_info cluster attribute. You don’t have to define this, the mysql RA will add it automatically when the first a node will promoted to master.
There are not many requirements regarding the MySQL configuration, Pacemaker will automatically add “skip-start-slave” for a saner behavior. One of the important setting is “log_slave_updates = OFF” (default value). In some cases, if slaves are logging replication updates, it may cause failover issues. Also, the solution relies on the read_only setting on the slave so, make sure the application database use doesn’t have the SUPER privilege which overrides read_only.
Like I mentioned above, this project is young. In the future, I’d like to integrate MHA to benefit for its capacity of bringing all the nodes to a consistent level. Also, the security around the solution should be improved, a fairly easy task I believe. Of course, I’ll work with the maintainers of the Pacemaker resources agents to include it in the main branch once it matured a bit.
Finally, if you are interested by this solution but have problems setting it up, just contact us at Percona, we’ll be pleased to help.