Tools for Monitoring and Managing MySQL Replication

MySQL Replication is very simple to set up. In this post I’ll discuss its importance and five handy tools for monitoring and managing MySQL replication.

What is MySQL Replication? It’s the process of copying the (real-time events) data from one master instance to another slave instance and maintaining the redundant consistent data in a different machine. This enables a distributed database system that shares the same level of information.

In MySQL the replication works based on the three threads as shown below.

1) I/O thread on the slave server:  To start on receiving replication events, an I/O thread starts on the slave server and connects to the master server.

2) Master connection handler thread:  As a connection handier, master starts a thread whenever a replication slave connects to a master. The master server sends the events from its binary log file to the slave I/O thread, notifying slave about newly written events to its binary log. The slave I/O thread which records them to in the slave’s relay log file.

3) Slave SQL thread:  When it starts, immediately reads the events from the relay log and applies on the Slave DB. Once it finishes the processing of every relay log and if the I/O thread is writing the events to a new relay log file then it deletes the processed one. Suppose if the  I/O thread is writing  the events on a relay log and which is the same file SQL thread is reading then the SQL thread pauses until more events are available in the relay log.

MySQL replication (slave) redundant instances is an excellent method of improving data performance and stability. It ensures the availability of another copy of a database whenever there arises any issues with the master server. One of the other advantages is the report query (select) offloading to a slave server, which is a common practice to reduce the workload of the master DB server as there are multiple servers that are able to respond to the queries. The third advantage is to schedule the backup from the slave server, etc.

All the benefits discussed above are smart and effective only if replication is up and running and the data is in sync with the master.

Here are useful tools from Percona Toolkit which help you in monitoring and managing the MySQL replication (Slaves).

tools for monitoring and managing MySQL replication1) pt-heartbeat: Tool measures/monitor replication lag on a MySQL in real time. It is important to have a replication monitoring system to confirm that replication is up and running and lag is current.

In typical way of monitoring, we use “SHOW SLAVE STATUS” to find out the information like Slave_IO_Running: Yes, Slave_SQL_Running: Yes and Seconds_Behind_Master: 0 etc, but is not reliable as  Seconds_Behind_Master shows difference between last timestamp read in the binlogs and current time. Many reasons like slow network, long running queries, blocking operations or a 2nd level slaves(Master > slave1> slave2) etc. can produce an irrelevant value for the variable.

So I recommend to use pt-heartbeat, which depends only on the heartbeat record being replicated to slave instead of the above said unreliable method of finding the lag. pt-heartbeat will insert/update a row in the master and the time delay is calculated depending on when the data was inserted and when it became available to read in the slave. It works at any depth in the replication hierarchy. For example, it will reliably report how far a slave lags its original master (master’s master).

Example :

We used to schedule the backup from the slave to avoid the additional load with the master server. In this case it is important to confirm the slave is current with the master to ascertain the backup is having the recent data. Here is a simple script you can use to verify the replication status on a periodical basis(cron) and to know the status just before the backup scheduled.

2) pt-slave-find: Finds and prints replication hierarchy of the slaves – shows you the topology and replication hierarchy of your MySQL replication instances.

Example :

3) pt-slave-restart: Watches the MySQL replication slaves for any error and tries to restart the replication.

The tool is very useful for skipping statements that cause errors and continuing replication. If you use this carelessly, the slave will be having the inconsistent data. However  when you use the tool, I recommended you to confirm the consistency of data between master and slave with help of pt-table-checksum.

Example : Restart the slave for error-numbers=1062 (Duplicate entry ‘1’ for key ‘PRIMARY’)

4) pt-table-checksum: Performs an online replication consistency check by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master.

Example :

Note: It  is important to run the pt-table-checksum tool regardless of whether or not you’ve ever skipped an event with pt-slave-restart to make sure we are having the identical data on the slave side.

5) pt-table-sync: Sync the slave with their master (synchronizes data efficiently between MySQL tables.)

Example :