EmergencyEMERGENCY? Get 24/7 Help Now!

MySQL replication primer with pt-table-checksum and pt-table-sync

 | August 12, 2015 |  Posted In: MySQL

PREVIOUS POST
NEXT POST

MySQL replication is a process that allows you to easily maintain multiple copies of MySQL data by having them copied automatically from a master to a slave database.

It’s essential to make sure the slave servers have the same set of data as the master to ensure data is consistent within the replication stream. MySQL slave server data can drift from the master for many reasons – e.g. replication errors, accidental direct updates on slave, etc.

Here at Percona Support we highly recommend that our customers periodically run the pt-table-checksum tool to verify data consistency within replication streams. Specifically, after fixing replication errors on slave servers to ensure that the slave has identical data as its master. As you don’t want to put yourself in a situation where you need to failover to a slave server for some reason and you find different data on that slave server.

In this post, I will examine the pt-table-checksum and pt-table-sync tools usage from Percona Toolkit on different replication topologies. We often receive queries from customers about how to run these tools and I hope this post will help.

Percona Toolkit is a free collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex to perform manually.

One of those tools is pt-table-checksum, which works by dividing table rows into chunks of rows. The size of a chunk changes dynamically during the operation to avoid overloading the server. pt-table-checksum has many safeguards including variation into chunk size to make sure queries run in a desired amount of time.

pt-table-checksum verifies chunk size by running EXPLAIN query on each chunk. It also monitors slave server’s continuously in order to make sure replicas not falls too far behind and in this case tool pauses itself to allow slave to catch up. Along with that there are many other safeguards builtin and you can find all the details in this documentation

In my first example case, I am going to run pt-table-checksum against pair of replication servers – i.e. master having only one slave in replication topology. We will run pt-table-checksum tool on master server to verify data integrity on slave and in case If differences found by pt-table-checksum tool we will sync those changes on slave server via pt-table-sync tool.

I have created a dummy table under test database and inserted 10 records on master server as below:

Then I intentionally deleted a few records from the slave server to make it inconsistent with the master for the purpose of this post.

Now, in this case the master server has 10 records on the dummy table while the slave server has only 5 records missing records from id>5 – we will run pt-table-checksum at this point on the master server to see if the pt-table-checksum tool catches those differences.

This needs to be executed on the master. The user and password you specify will be used to not only connect to the master but the slaves as well. You need the following privileges for the pt-table-checksum mysql user:

Earlier, in pt-table-checksum command, I used –replicate option which writes replication queries to mentioned table percona.checksums. Next I passed  –ignore-databases option which accepts comma separated list of databases to ignore. Moreover, –create-replicate-table and —empty-replicate-table options are “Yes” by default and you can specify both options explicitly if you want to create database table different then percona.checksums.

pt-table-checksum reported 1 DIFF which is number of chunks which are different from master on one or more slaves. You can find details about tabular columns e.g. TS, ERRORS and so on on documentation of pt-table-checksum. After that, I ran next command to identify which table has difference on slave.

In this command I used –replicate-check-only option which only reports the tables with having differences vice versa only checksum differences on detected replicas are printed. It doesn’t checksum any tables. It checks replicas for differences found by previous checksumming, and then exits.

You may also login to the slave and also execute below query to find out which tables have inconsistencies.

pt-table-checksum identified test.dummy table is different on slave now we are going to use pt-table-sync tool to synchronize table data between MySQL servers.

I ran the pt-table-sync tool from an opposite host this time i.e. from the slave as I used the –sync-to-master option which treats DSN as slave and syncs to master. Again, pt-table-sync will use the mysql username and password you specify to connect to the slave as well as to its master. –replicate option here examines the specified table to find out the data differences and –print just prints the SQL  (REPLACE queries) not actually executes it.

You may audit the queries before executing to sync data between master/slave.  You may see it printed only missing records on the slave. Once you are happy with the results, you can substitute –print with –execute to do actual synchronization.

As a reminder, these queries always executed on the master as this is the only safe way to do the changes on slave. However, on the master it’s no-op changes as these records already exists on master but then falls to slave via replication stream to sync it with master.

If you find lots of differences on your slave server it may lag during synchronization of those changes. As I mentioned earlier, you can use –print option to go through your queries which are going to be executed to sync slave with master server. I found this post useful if you see a huge difference in the table between master/slave(s).

Note, you may use the –dry-run option initially which only analyzes print information about the sync algorithm and then exits. It shows verbose output; it doesn’t do any changes though. –dry-run parameter will basically instruct pt-table-sync to not actually do the sync, but just perform some checks.

Let me present another replication topology, where the master has two slaves where slave2 is running on non-default port 3307 while master and slave1 running on port 3306. Further, slave2 is out of sync with master and I will show you how to sync slave2 which running on port 3307 with master.

Let’s run pt-table-checksum tool on master database server.

I used –recursion-method parameter this time which is method to use find slaves in replication stream and it’s pretty useful when your servers run on non-standard port i.e. other than 3306. I created dsns table under percona database with following entries. You may find dsns table structure in documentation.

Next I ran below pt-table-checksum command to identify which slave server has differences on test.dummy table.

This shows that slave2 has different data on test.dummy table as compared to the master. Now let’s run pt-table-sync tool to sync those differences and make slave2 identical as the master.

It shows 2 rows are different on slave2. Substituting –print with –execute synchronized the differences on slave2 and re-running pt-table-checksum tool shows no more differences.

Conclusion:
pt-table-checksum and pt-table-sync are the finest tools from Percona Toolkit to validate data between master/slave(s). With the help of these tools you can easily identify data drifts and fix them. I mentioned a couple of replication topologies above about how to check replication consistency and how to fix it in case of data drift. You may script pt-table-checksum / pt-table-sync steps and cron checksum script to periodically check the data consistency within replication stream.

This procedure is only safe for a single level master-slave(s) hierarchy. I will discuss the procedure for other topologies in future posts – i.e. I will describe more complex scenarios on how to use these tools in chain replication i.e. master -> slave1 -> slave2 pair and in Percona XtraDB Cluster setup.

PREVIOUS POST
NEXT POST
Muhammad Irfan

Muhammad Irfan is vastly experienced in LAMP Stack. Prior to joining Percona Support, he worked in the role of MySQL DBA & LAMP Administrator, maintained high traffic websites, and worked as a Consultant. His professional interests focus on MySQL scalability and on performance optimization.

10 Comments

  • Nice one. The article is explained in simple words. I would rather call it “pt-table-checksum & pt-table-sync for dummies” 🙂

  • It is worth mentioning that pt-table-checksum works only with replication-method=statement.

    I also recommend a tool, i have written
    https://github.com/KredytyChwilowki/MySQLReplicaIntegrityCheck

    It is not perfect, but works good in all 3 replication modes – statement, row, based.

  • Hi Irfan

    Great post on using PT checksum and . Wanted to check if this works for Galera clusters as well (codership version) .

    if it does, can you please share the process like you did in this post.

    Thanks

  • Great information. It would be nice to see some of it on the pt-table-checksum manual page as well: like the privileges explanation.

    We have a SLAVE server for backups that is not exposed to the external network, (listening on 127.0.0.1 – localhost), and, because of that, running pt-table-checksum on the MASTER will never be able to find this SLAVE.

    To verify the replication integrity from within the SLAVE I figured that using –recursion-method=dsn with ‘h=localhost’ as the DSN argument and adding the MASTER DSN as the command argument should do it.

    mysql-master> USE percona;
    mysql-master> CREATE TABLE dsns (
    id int(11) NOT NULL AUTO_INCREMENT,
    parent_id int(11) DEFAULT NULL,
    dsn varchar(255) NOT NULL,
    PRIMARY KEY (id)
    );
    mysql-master> INSERT INTO dsns SET dsn = ‘h=localhost,u=percona,p=p4s5w0rd’;

    [root@slave]# pt-table-checksum –ignore-databases mysql h=master,u=percona,p=p4s5w0rd
    [root@slave]# echo $?

  • [root@slave]# pt-table-checksum –ignore-databases mysql –recursion-method=dsn=h=localhost,D=percona,t=dsns h=master,u=percona,p=p4s5w0rd
    [root@slave]# echo $?

    Sorry about that, please feel free to edit the comment above.

  • Hi Irfan,

    The checksum_user does not seem have permission to execute changes as you have in the article. Also, you mention that we should execute the change queries on Master server, but there does not seem to be option to execute change from master server. Please advise.

  • How to do a similar fix but in case of the 2 servers are slaves and masters at the same time, master-Master replication for HA.

Leave a Reply