How to Handle pt-table-checksum Errors

pt-table-checksum ErrorsIn this blog post, we’ll look at how to approach pt-table-checksum errors.

pt-table-checksum is one of the most popular tools in Percona Toolkit, and it is widely used to identify data differences between masters and slaves. Therefore, as Percona Support Engineers we have customers often asking questions related to the pt-table-checksum errors and warnings produced. Below are the most common issues raised with pt-table-checksum, and we decided to address those issues to help with how to mitigate related warnings or errors.

Unable to detect slaves

It’s possible that the tool cannot connect to the slaves due to not specific enough information found on the master. By default, it is looking for slaves based on the replica threads visible in master’s processlist. This could be the problem if, for example, the slave’s MySQL runs with a different TCP port, the hostname is not resolved correctly or both the master and slave are on the same host, or this is Galera-based replication. In this case, there is –recursion-method option to try with different discovery methods: ‘hosts’ or ‘cluster’. And if all of them fail, you can specify each slave details manually using the ‘dsn’ method.

An example using this option for the cluster looks like this:

and when a DSN is needed (like for mysqlsandbox instances), we have to add the slave(s) details to the table, similar to below:

ROW format on slave

The problem is that second and next level replicas (in chain replication topology) will not calculate the diffs as expected. So this message warns that the slave is using binlog_format=ROW, as the tool needs STATEMENT format to calculate the diffs separately on the slave and master. This is done by replicating the command (e.g., INSERT INTO percona.checksum SELECT CRC32 …. WHERE … ) as the original statement, not as a row copy of CRC values already computed on the master. And that is possible as the tool sets the binlog_format=STATEMENT in its session. This session setting does not propagate further into the slave’s own binary log though. This is not a problem when all the slaves are replicating directly from the master, and in such cases we can ignore that message and use the –no-check-binlog-format option.

By the way, the warning message is misleading regarding breaking replication claim, hence the bug reported.

Unable to switch session binlog_format to STATEMENT

or:

This can be an issue if STATEMENT mode is unsupported in the MySQL variant or special edition of it – Amazon RDS for example, or when switching is prohibited either by lack of SUPER privilege (limitation for Amazon Aurora), or Percona XtraDB Cluster Strict Mode safety precaution as seen on the example above. To workaround it in Percona XtraDB Cluster, temporarily relaxing the strict mode (be careful as this may be dangerous) will work:

For Aurora though (only in case asynchronous replication is used between Aurora clusters or from Aurora to non-Aurora MySQL), you will have to change the binlog_format globally to STATEMENT using the option groups.

Too large chunk size or no good index

or

Instead of examining each table with a single big query, the pt-table-checksum splits tables into chunks to ensure that the checksum is non-intrusive and doesn’t cause too much replication lag or load on the server. To create these chunks, it needs an index of some sort (preferably a primary key or unique index). If there is no index, and the table contains a suitably small number of rows, the tool tries to checksum the table in a single chunk.

Skipping the table, as in the second message example, is a common issue with pt-table-checksum and can be caused by different/outdated table statistics on the master or slave side. To alleviate this issue, make sure all your tables contain a primary or unique key. pt-table-checksum requires that to divide a table into chunks effectively. We also suggest that you make sure these messages are not related to real differences in this table (maybe a row count is significantly different). Also, executing pt-table-checksum with PTDEBUG is a good idea as it captures a lot of debugging info and it provides better insight into what is causing the issue.

There can be some random skipping of tables across many tool runs, and it’s probably because of a mix of two variables. One of it is innodb_stats_on_metadata. Turn it off, at least during the checksum running, such that InnoDB index stats won’t change so often. We remind you it’s a dynamic variable, which means you can change it without MySQL server restart. On the other hand, if constant statistics change for a table (even though the innodb_stats_on_metadata=0, statistics change with each significant amount of writes) is a problem, you may want to disable it for the duration of checksum. Check innodb_stats_auto_update option in Percona Server for MySQL for details.

pt-table-checksum uses an EXPLAIN query to determine the number of rows in the chunk, so ever-changing table statistics is most likely the reason for skipped tables. This is where pt-table-checksum decides to skip a chunk or not. This avoids the scenario that a table has fewer rows on the master but many on a replica, and is checksummed in a single large query, which causes a very long delay in replication. This is also affected by –chunk-size-limit, which defaults to 2. Try setting up higher chunk-size-limit or chunk-time so that pt-table-checksum allows larger chunks, but do it during off-peak periods. Of course, allowing too big of a chunk makes the server suffer for heavy selects, and slave lag may also be a problem while –chunk-time adjusts the chunk size dynamically so that the checksum query executes in a defined amount of time.

For tables that can’t be chunked and must be checksummed in a single run, the chunk size should be sufficiently large, and sometimes is not enough. That’s where the chunk-size-limit comes into play. The –chunk-size-limit modifier is a multiplier for chunk-size and allows larger chunks. To make sure your server is not heavily loaded, you can set a threshold at which pt-table-checksum pauses itself. This can be done by using –-max-load parameter of pt-table-checksum so, in this way –chunk-time and –chunk-size-limit won’t noticeably impact your server. We would suggest to start with default value –chunk-size-limit and increase it gradually till it succeeds. High values of –chunk-size-limit guarantee higher rates of successful runs, but there’s no way to tell if it will always be successful because the number of rows processed is only an estimate. It’s worth mentioning that you can also try running ANALYZE TABLE on “skipped tables” before running pt-table-checksum to make sure statistics are up to date. This may help or may not help, as statistics are estimated and it still might not be inaccurate.

Also, scripting retries of skipped chunks can be a good approach. You can redirect the pt-table-checksum output to a log file and parse that log to find out which tables need to be re-tried separately. You can do many re-tries for a single table if necessary, and the checksum result for a particular table in the checksums table gets overwritten without affecting other results.

All the problems described above will not take place when a table has a primary key on auto_increment int column.

Suboptimal query plan