As of Percona Toolkit v2.1.5, pt-table-checksum works correctly with Percona XtraDB Cluster, but it doesn’t work quite like a traditional replication setup because cluster nodes are not like traditional replicas. In this post I demonstrate how to use pt-table-checksum with Percona XtraDB Cluster.
First, you’ll need Percona Toolkit v2.1.5 or newer and Percona XtraDB Cluster 5.5.27-23.6 or newer (these are the current versions at the time of writing). Second, I presume that you already know how to use pt-table-checksum and that your cluster is already setup. If the later presumption is false, then read the Percona XtraDB Cluster documentation or watch the Percona XtraDB Cluster – Installation and Setup webinar. My setup has 3 nodes called “node300”, “node4000”, and “node6000” on ports 4000, 5000, and 6000 respectively.
The first thing you must do is create a “DSNs table” because pt-table-checksum cannot currently auto-detect cluster nodes. My DSNs table contains:
|
1 |
node4000> SELECT * FROM percona.dsns;<br>+----+-----------+-----------------------+<br>| id | parent_id | dsn |<br>+----+-----------+-----------------------+<br>| 1 | NULL | h=127.1,P=4000,u=root |<br>| 2 | NULL | h=127.1,P=5000,u=root |<br>| 3 | NULL | h=127.1,P=6000,u=root |<br>+----+-----------+-----------------------+ |
There is a row for each node in the cluster, specifying the node’s DSN. (The id and parent_id columns are not used by pt-table-chekcusm yet, so their values don’t matter.) Since all these nodes are in a cluster, this table should exist on all nodes. This means you can run pt-table-checksum on any node. I’ll run the tool on node4000.
The second thing you must do is specify “–recursion-method dsn=DSN-TABLE” when running pt-table-checksum, where “DSN-TABLE” is a DSN specifying the above DSNs table. For example:
|
1 |
--recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns |
That makes pt-table-checksum connect to the given host (127.0.0.1:4000), and select all rows from the given table (percona.dsns), and check those nodes for differences. Without this option, the tool may throw an error like:
|
1 |
node4000 is a cluster node but no other nodes or regular replicas were found.<br>Use --recursion-method=dsn to specify the other nodes in the cluster. |
If the tool detects that the master host (the main DSN specified on the command line) is a cluster node but no other cluster nodes were specified and no regular replicas were detected, then it could checksum but it could not detect any differences, so it throws that error rather than doing only half its job. Furthermore, not being able to detect differences might be misinterpreted as there being no differences, so it’s better to error than potentially mislead users.
The aforementioned hints at something else you should know: pt-table-checksum can work with cluster nodes and regular replicas at the same time, but since cluster nodes require –recursion-method=dsn, you must also specify any regular replicas in the DSN table, else pt-table-checksum will not check them for differences.
Here’s the abbreviated output of a real run:
|
1 |
$ ./pt-table-checksum h=127.1,P=4000,u=root <br>--recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns<br>Not checking replica lag on node4000 because it is a cluster node.<br>Not checking replica lag on node5000 because it is a cluster node.<br>Not checking replica lag on node6000 because it is a cluster node.<br> TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE<br>10-05T14:02:17 0 0 0 1 0 0.025 mysql.columns_priv<br>10-05T14:02:17 0 0 0 1 0 0.019 mysql.db<br>...<br>10-05T14:02:18 0 0 3 1 0 0.023 percona.dsns<br>10-05T14:02:18 0 0 10 1 0 0.032 test.t |
The tool works as usual: checking every database and table. So far there are no differences, but we’ll run it again with a difference on one node to prove that it works. Right now, though, notice the three warnings at the beginning of the output: pt-table-checksum cannot and will not check the replica lag on cluster nodes because SHOW SLAVE STATUS on a cluster node doesn’t work (because a node isn’t a slave) and cluster nodes should not be out of sync.
Now let’s make a difference on node5000 and verify that pt-table-checksum detects it when running on node4000:
|
1 |
node5000> SET SQL_LOG_BIN=0;<br>Query OK, 0 rows affected (0.00 sec)<br><br>node5000> UPDATE test.t SET val=val+1 WHERE id=30;<br>Query OK, 1 row affected (0.00 sec)<br>Rows matched: 1 Changed: 1 Warnings: 0<br><br>node5000> SELECT * FROM test.t WHERE id=30;<br>+----+------+<br>| id | val |<br>+----+------+<br>| 30 | 11 |<br>+----+------+ |
Ok, so row 30 has value 11 on node5000. Let’s double-check that it’s different (the original value) on the other nodes:
|
1 |
node4000> SELECT * FROM test.t WHERE id=30;<br>+----+------+<br>| id | val |<br>+----+------+<br>| 30 | 10 |<br>+----+------+<br><br>node6000> SELECT * FROM test.t WHERE id=30;<br>+----+------+<br>| id | val |<br>+----+------+<br>| 30 | 10 |<br>+----+------+ |
Ok, so we know there’s a difference on node5000, now let’s see if pt-table-checksum detects it:
|
1 |
$ ./pt-table-checksum h=127.1,P=4000,u=root -d test <br>--recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns<br>Not checking replica lag on node4000 because it is a cluster node.<br>Not checking replica lag on node5000 because it is a cluster node.<br>Not checking replica lag on node6000 because it is a cluster node.<br> TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE<br>10-05T14:47:52 0 1 10 1 0 0.023 test.t |
Success! The difference was detected. Since we’re still “PXC-approving” pt-table-sync, it’s best that a DBA manually investigate any differences.
In summary:
Please submit bugs if found.
Resources
RELATED POSTS