Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

pmp-check-pt-table-checksum Percona Monitoring Plugin

September 14, 2016

Author

Muhammad Irfan

MySQL

Share this Post:

Recently, I worked on a customer case where the customer needed to monitor the checksum via Nagios monitoring. The pmp-check-pt-table-checksum plugin from Percona Monitoring Plugins for MySQL achieves this goal. I thought it was worth a blogpost.

pmp-check-pt-table-checksum alerts you when the pt-table-checksum tool from Percona Toolkit finds data drifts on a replication slave. pmp-checksum-pt-table-checksum monitors data differences on the slave from the checksum table as per information in the last checksum performed by the pt-table-checksum tool. By default, the plugin queries the percona.checksum table to fetch information about data discrepancies. You can override this behavior with the “-T” option. You can check the pmp-check-pt-table-checksum documentation for details.

Let’s demonstrate checksum monitoring via Nagios. My setup contains a master with two slave(s) connected, as follows:

Host 10.0.3.131 is master.

Host 10.0.3.83 is slave1

Host 10.0.3.36 is slave2

I intentionally generated more data on the master so pt-table-checksum can catch the differences on the slave(s). Here’s what it looks like:

mysql-master> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+------+
10 rows in set (0.00 sec)

mysql-slave1> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------+
5 rows in set (0.00 sec)

mysql-slave2> SELECT * FROM test.t1;
+------+
| id |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+------+
5 rows in set (0.00 sec)

mysql-master> SELECT * FROM test.t1;

+------+

| id |

+------+

| 1 |

| 2 |

| 3 |

| 4 |

| 5 |

| 6 |

| 7 |

| 8 |

| 9 |

| 10 |

+------+

10 rows in set (0.00 sec)

mysql-slave1> SELECT * FROM test.t1;

+------+

| id |

+------+

| 1 |

| 2 |

| 3 |

| 4 |

| 5 |

+------+

5 rows in set (0.00 sec)

mysql-slave2> SELECT * FROM test.t1;

+------+

| id |

+------+

| 1 |

| 2 |

| 3 |

| 4 |

| 5 |

+------+

5 rows in set (0.00 sec)

As you can see, slave1 and slave2 are different from the master: the master has ten rows while the slave(s) have five rows each (table t1).

Then, I executed pt-table-checksum from the master to check for data discrepancies:

[root@master]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password
 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
08-25T04:57:10 0 1 10 1 0 0.018 test.t1

[root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password
Differences on slave1
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.t1 1 -5 1

Differences on slave2
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.t1 1 -5 1

[root@master]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE

08-25T04:57:10 0 1 10 1 0 0.018 test.t1

[root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password

Differences on slave1

TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY

test.t1 1 -5 1

Differences on slave2

TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY

test.t1 1 -5 1

pt-table-checksum correctly identifies the differences for the test.t1 table on slave1 and slave2. Now, you can use the pmp-check-pt-table-checksum Percona checksum monitoring plugin. Let’s try to run it locally (via CLI) from the Nagios host.

[root@nagios]# pmp-check-pt-table-checksum -H slave1 -l checksum_user -p checksum_password -P 3306
WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1

[root@nagios]# pmp-check-pt-table-checksum -H slave2 -l checksum_user -p checksum_password -P 3306
WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1]

[root@nagios]# pmp-check-pt-table-checksum -H slave1 -l checksum_user -p checksum_password -P 3306

WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1

[root@nagios]# pmp-check-pt-table-checksum -H slave2 -l checksum_user -p checksum_password -P 3306

WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1]

NOTE: The checksum_user database user needs SELECT privileges on both the checksum table (Percona.checksums) and the slave(s) in order for SQL to alert for checksum differences on slave(s).

On the Nagios monitoring server, you need to add the pmp-check-pt-table-checksum command to the commands.cfg file:

define command{
command_name            pmp-check-pt-table-checksum
command_line            $USER1$/pmp-check-pt-table-checksum -H $HOSTADDRESS$ -c $ARG1$
        }

define command{

command_name pmp-check-pt-table-checksum

command_line $USER1$/pmp-check-pt-table-checksum -H $HOSTADDRESS$ -c $ARG1$

}

NOTE: I used “-c” option for pmp-check-pt-table-checksum, which raises a critical error instead of a warning.

And, on the existing hosts.cfg file (i.e., slave1.cfg and slave2.cfg), you need to add a monitoring command accordingly as below:

define service{
        use                             generic-service
        host_name                       slave1
        service_description             Checksum Status
        check_command                   pmp-check-pt-table-checksum!1
}

define service{

use generic-service

host_name slave1

service_description Checksum Status

check_command pmp-check-pt-table-checksum!1

}

In this command “1” is an argument to command “-c $ARG1$” so pmp-check-pt-table-checksum will raise a critical error when one or more chunks on the slave(s) are different from the master.

Last but not least, restart the Nagios daemon on the monitoring host to make the change.

Below is how it looks like on the Nagios monitoring on the web:

pmp-check-pt-table-checksum

I also think the “INTERVAL” option is useful:

-i INTERVAL Interval over which to ensure pt-table-checksum was run, in days; default - not to check.

1

-i INTERVAL Interval over which to ensure pt-table-checksum was run, in days; default - not to check.

It makes sure that chunks are recent on the checksum table. Used the other way around, it checks on how old your chunks are. This option ensures the checksum cron executes at a defined number of days. Let’s say you have pt-table-checksum cron running once per week. In that case, setting INTERVAL 14 or 21 alerts you if chunks are older then defined number of days (i.e., the INTERVAL number).

Conclusion:

Percona Monitoring plugins for MySQL are very useful and easy to embed in your centralize monitoring dashboard. You can schedule pt-table-checksum via a cron job, and get reports regarding master/slave(s) data drifts (if any) from one global dashboard on the monitoring host. There are various plugins available from Percona, e.g. processlist plugin, replication delay plugin, etc. Along with that, Percona offers Cacti and Zabbix templates to graph various MySQL activities.