Actively monitoring replication connectivity with MySQL’s heartbeatMiguel Angel Nieto
Until MySQL 5.5 the only variable used to identify a network connectivity problem between Master and Slave was slave-net-timeout. This variable specifies the number of seconds to wait for more Binary Logs events from the master before abort the connection and establish it again. With a default value of 3600 this has been a historically bad configured variable and stalled connections or high latency peaks were not detected for a long period of time or not detected at all. Also, if that variable is set to a low value, let’s say 30 seconds, and the master had no events to send, the slave would reset the connection after 30 seconds even if the connection was healthy.
Therefore, before this new heartbeat feature, we had no way to check the connection status between the servers. We needed an active master/slave connection check. And here is where replication’s heartbeat can help us.
This feature was introduce in 5.5 as another parameter to the CHANGE MASTER TO command. After you enable it, the MASTER starts to send “beat” packages (of 106 bytes) when it is idle (no events to send to the slave) every X seconds where X is a value you can define in seconds.
Now, let’s say that slave-net-timeout=30. If the master is idle, without events to send, it will start to send those beats. Therefore, the connection reset won’t be triggered after those 30 seconds, because now the slave knows that the connection is still alive.
How can I configure replication’s heartbeat?
Is very easy to setup with negligible overhead:
mysql_slave > STOP SLAVE;
mysql_slave > CHANGE MASTER TO MASTER_HEARTBEAT_PERIOD=1;
mysql_slave > START SLAVE;
MASTER_HEARTBEAT_PERIOD is a value in seconds in the range between 0 to 4294967 with resolution in milliseconds.
Is interesting to note that having a 5.5 slave with replication’s heartbeat enabled and connected to a 5.1 master doesn’t break the replication. Of course, the heartbeat will not work in this case because the master doesn’t know what is a beat or how to send it 🙂
What status variables do I have?
The heartbeat check period time and the number of beats received.
mysql_slave > SHOW STATUS LIKE '%heartbeat%';
| Variable_name | Value |
| Slave_heartbeat_period | 1.000 |
| Slave_received_heartbeats | 1476 |
How can we check if the connection is down?
– If the master’s binary log position is greater than the one in slave but it is not receiving those new events, then it is down.
– If the master is idle but we see the number of received heartbeats increasing, then the connection is not down.
– If the master is idle but we don’t see heartbeats increasing, then it is down.