Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

MySQL replication primer with pt-table-checksum / pt-table-sync, part 2

March 17, 2016

Author

Muhammad Irfan

MySQL

Share this Post:

This is the second and last tutorial blog post on how to use pt-table-checksum / pt-table-sync tools for MySQL replication.

In the first post, I showed you how to use the pt-table-checksum / pt-table-sync tools in a typical MySQL replication setup. In this post, we’ll discuss MySQL replication for more advanced topologies. I will show you how to use these tools in a chained master/slave replication setup in general, and in Percona XtraDB Cluster specifically. (Here is another good post about how to use the pt-table-checksum tool in Percona XtraDB Cluster.)

Let me first show you an example where I tried to run pt-table-checksum / pt-table-sync on a chained master/slave setup. In this example, the setup is Master -> Slave1 -> Slave2, where the master has a binary log format set to STATEMENT, while the chained Slave1 has a binary log format set to ROW.

Master:
mysql> SHOW GLOBAL VARIABLES LIKE 'binlog_format';
+---------------+-----------+
| Variable_name | Value     |
+---------------+-----------+
| binlog_format | STATEMENT |
+---------------+-----------+

Slave1:
mysql> SHOW VARIABLES LIKE 'binlog_format';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW   |
+---------------+-------+

[root@master ~]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password
Replica slave1 has binlog_format ROW which could cause pt-table-checksum to break replication.  Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation.  If you understand the risks, specify --no-check-binlog-format to disable this check.

Master:

mysql> SHOW GLOBAL VARIABLES LIKE 'binlog_format';

+---------------+-----------+

| Variable_name | Value |

+---------------+-----------+

| binlog_format | STATEMENT |

+---------------+-----------+

Slave1:

mysql> SHOW VARIABLES LIKE 'binlog_format';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| binlog_format | ROW |

+---------------+-------+

[root@master ~]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password

Replica slave1 has binlog_format ROW which could cause pt-table-checksum to break replication. Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation. If you understand the risks, specify --no-check-binlog-format to disable this check.

As soon as I tried to execute pt-table-checksum on this replication topology, it failed because the binary logs use different formats. The pt-table-checksum documentation mentions this issue in the Limitations section.

The problem here is that Slave1 receives changes from its Master, which will be STATEMENT ( pt-table-checksum will force this into STATEMENT binlog_format in its session). Slave1 executes the checksum queries, and all seems good until that point. But then Slave1 will also write the changes to its binary log to replicate to Slave2. pt-table-checksum only works properly when STATEMENT-based formatting is used to generate the checksum. If it’s ROW-based, the row changes to the checksums table are just replicated from the chain master, and no checksumming happens. In this case, this is what happens with Slave2, which is why pt-table-checksum doesn’t perform its checksum.

To remedy this problem we need to ignore this checking by using –no-check-binlog-format, or we need to change the binary format to STATEMENT on chain master (Slave1).

In our second example, I’m going to run pt-table-checksum on a three-node Percona XtraDB Cluster. Data inconsistencies can occur in Galera Cluster because of human errors or bugs. For this purpose, I inserted a few rows in Node1, and the Node1/Node2 test.dummy table data look like the following:

Node1 & Node2:
mysql> SELECT * FROM test.dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)

Node1 & Node2:

mysql> SELECT * FROM test.dummy;

+----+------+

| id | name |

+----+------+

| 1 | a |

| 2 | b |

| 3 | c |

| 4 | d |

| 5 | e |

+----+------+

5 rows in set (0.00 sec)

Whereas Node3 is missing last two records, as shown below (I intentionally deleted the last couple rows with the setting wsrep_on off/on in between):

Node3:
mysql> SELECT * FROM dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
+----+------+
3 rows in set (0.00 sec)

Node3:

mysql> SELECT * FROM dummy;

+----+------+

| id | name |

+----+------+

| 1 | a |

| 2 | b |

| 3 | c |

+----+------+

3 rows in set (0.00 sec)

I have filled the DSN table accordingly with checksum user credentials:

mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.10,u=checksum_user,p=checksum_password,P=3306');
mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.14,u=checksum_user,p=checksum_password,P=3306');
mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.169,u=checksum_user,p=checksum_password,P=3306');

mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.10,u=checksum_user,p=checksum_password,P=3306');

mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.14,u=checksum_user,p=checksum_password,P=3306');

mysql> INSERT INTO percona.dsns (dsn) VALUES ('h=10.0.3.169,u=checksum_user,p=checksum_password,P=3306');

pt-table-checksum on Percona XtraDB Cluster works with a single cluster, where all nodes are cluster nodes and not regular replicas. The example above is a good one – Node1, Node2 and Node3 all belong to a single cluster. Cluster-to-cluster setups aren’t supported. (Refer to documentation for more details.)

Let’s try to run pt-table-checksum from authoritative node Node1:

root@node1:# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
08-21T08:27:12      0      1        5       1       0   0.045 test.dummy

root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
Differences on node3
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1

root@node1:# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns

Not checking replica lag on node2 because it is a cluster node.

Not checking replica lag on node3 because it is a cluster node.

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE

08-21T08:27:12 0 1 5 1 0 0.045 test.dummy

root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns

Not checking replica lag on node2 because it is a cluster node.

Not checking replica lag on node3 because it is a cluster node.

Differences on node3

TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY

test.dummy 1 -2 1

pt-table-checksum prints messages during the checksum process saying it can’t check the replica lag. It’s obvious why: SHOW SLAVE STATUS is not supported on cluster nodes as it would be in a traditional master -> slave(s) replication setup. Further, pt-table-checksum found the difference on the test.dummy table. Now, pt-table-checksum also supports “recursion-method=cluster,” which will auto-discover cluster nodes by querying the variable wsrep_incoming_addresses status. This new recursion-method “cluster” works equally well in Percona XtraDB Cluster. You may use “recursion-method=cluster” in place of “recursion-method=dsn” Let’s find the differences again using recursion-method “cluster”:

root@node1:~# pt-table-checksum --empty-replicate-table --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=cluster
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
02-24T02:19:48      0      0        3       1       0   0.017 percona.dsns
02-24T02:19:48      0      1        5       1       0   0.017 test.dummy
root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=cluster
Not checking replica lag on node2 because it is a cluster node.
Not checking replica lag on node3 because it is a cluster node.
Differences on node3
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1

root@node1:~# pt-table-checksum --empty-replicate-table --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=cluster

Not checking replica lag on node2 because it is a cluster node.

Not checking replica lag on node3 because it is a cluster node.

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE

02-24T02:19:48 0 0 3 1 0 0.017 percona.dsns

02-24T02:19:48 0 1 5 1 0 0.017 test.dummy

root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password --recursion-method=cluster

Not checking replica lag on node2 because it is a cluster node.

Not checking replica lag on node3 because it is a cluster node.

Differences on node3

TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY

test.dummy 1 -2 1

The differences are found on Node3. Now the next big challenge is to synchronize Node3 with the other nodes in the cluster. pt-table-sync is the tool to synchronize MySQL table data differences, but unfortunately, the pt-table-sync tool is not supported on Galera Cluster. I tried few hacks with pt-table-sync in Percona XtraDB Cluster, but no luck (as shown below):

Sync test.dummy on node1 to node3:
root@node1:~# pt-table-sync --dry-run h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10
# NOTE: --dry-run does not show if data needs to be synced because it
#       does not access, compare or sync data.  --dry-run only shows
#       the work that would be done.
# Syncing D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user in dry-run mode, without accessing or comparing data
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 Chunk     09:11:31 09:11:31 0    test.dummy

root@node1:~# pt-table-sync --print h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10
INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:48372 user:root host:node1*/;
INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:48372 user:root host:node1*/;

Sync test.dummy on node1 to node3:

root@node1:~# pt-table-sync --dry-run h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10

# NOTE: --dry-run does not show if data needs to be synced because it

# does not access, compare or sync data. --dry-run only shows

# the work that would be done.

# Syncing D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user in dry-run mode, without accessing or comparing data

# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE

# 0 0 0 0 Chunk 09:11:31 09:11:31 0 test.dummy

root@node1:~# pt-table-sync --print h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10

INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:48372 user:root host:node1*/;

INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:48372 user:root host:node1*/;

I tried to run pt-table-sync from Node1 to sync the differences on Node3. I first used –dry-run to make sure everything was good. Later, I passed the –print option to review queries before actually syncing. In XtraDB Cluster, pt-table-sync prints INSERT/UPDATE/DELETE queries instead of REPLACE queries. Executing pt-table-sync with the –execute option will break the cluster, as those changes will be replicated to all the cluster nodes. Since Node1 and Node2 already have those records in their table, the cluster will break with a “Duplicate Key” error. To workaround this problem, use the pt-table-sync –replace option (as shown below):

root@node1:/var/lib/mysql# pt-table-sync --print --replace h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:942 user:root host:node1*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:942 user:root host:node1*/;

root@node1:/var/lib/mysql# pt-table-sync --print --replace h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy h=10.0.3.10

REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:942 user:root host:node1*/;

REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.10,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:942 user:root host:node1*/;

When I run pt-table-sync with “–execute” and “–replace”, it breaks with deadlock errors (see here: https://bugs.launchpad.net/percona-toolkit/+bug/1250450). Let’s try to run those REPLACE statements from one of the authoritative nodes (i.e., Node1):

mysql> SHOW BINLOG EVENTS IN 'node4-bin.000002';
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| Log_name         | Pos | Event_type  | Server_id | End_log_pos | Info                                          |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| node4-bin.000002 |   4 | Format_desc |         1 |         120 | Server ver: 5.6.24-72.2-56-log, Binlog ver: 4 |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
1 row in set (0.03 sec)

mysql> source replace.sql;
Query OK, 1 row affected (0.02 sec)

Query OK, 1 row affected (0.00 sec)

mysql> SHOW BINLOG EVENTS IN 'node4-bin.000002';
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| Log_name         | Pos | Event_type  | Server_id | End_log_pos | Info                                          |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
| node4-bin.000002 |   4 | Format_desc |         1 |         120 | Server ver: 5.6.24-72.2-56-log, Binlog ver: 4 |
+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+
1 row in set (0.00 sec)

mysql> SHOW BINLOG EVENTS IN 'node4-bin.000002';

+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+

+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+

| node4-bin.000002 | 4 | Format_desc | 1 | 120 | Server ver: 5.6.24-72.2-56-log, Binlog ver: 4 |

+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+

1 row in set (0.03 sec)

mysql> source replace.sql;

Query OK, 1 row affected (0.02 sec)

Query OK, 1 row affected (0.00 sec)

mysql> SHOW BINLOG EVENTS IN 'node4-bin.000002';

+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+

+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+

| node4-bin.000002 | 4 | Format_desc | 1 | 120 | Server ver: 5.6.24-72.2-56-log, Binlog ver: 4 |

+------------------+-----+-------------+-----------+-------------+-----------------------------------------------+

1 row in set (0.00 sec)

As you can see, pt-table-sync tool prints REPLACE INTO statements, which seems correct as those will be no-op changes for nodes that already have that data, and replication will sync unsynchronized nodes. But here is the problem: Galera Cluster works through a ROW binary log format, and REPLACE statements won’t get to an unsynchronized node since there won’t be a binary log event recorded for REPLACE statements into the binary log if executed from a node that is already synchronized (Node1 and Node2 in this case) since a ROW image change didn’t occur.

So how do we synchronized those changes in Galera Cluster? I’ve found a couple of workarounds to synchronize nodes in Galera Cluster!

Approach # 1:
Generate a .sql script via pt-table-sync tool with –print option, and review the changes that need to occur on the unsynchronized node. Once you are satisfied, you can execute this script directly on the problematic node with binary logging disabled for that particular execution (you can push SET SQL_LOG_BIN=0 at first and enable SQL_LOG_BIN=1 again at last line of script). This ensures those changes don’t synchronize to other nodes of the cluster.

NOTE: This could be risky if you forgot to disable binary logging (i.e., SET SQL_LOG_BIN=0 for the synchronization process) because as previously explained changes will replicate to all nodes, and all the remaining nodes in the cluster will go down (as those rows already exist on the other nodes). So, be careful with this method!

Also, it’s important to mention that when manually syncing with an SQL file, it may not be an atomic operation. Between the time it takes to generate the SQL file and execute it, there is a chance that the rows in question could change. To deal with this, you should engage LOCK TABLE tablename WRITE followed by UNLOCK TABLES to release the lock after the syncing process. In fact, the most secure method is to stop writing to the table that needs to be synchronized on all the nodes before running pt-table-sync. You can resume writes on the table after the sync operation for the target table completes. One other important thing is if there are a vast number of table rows to sync via this method, it makes sense to enable wsrep_desync (set global wsrep_desync=on ) until the node gets synchronized, and then disable wsrep_desync again.

As during this synchronization process, this node may lag behind and flow control will no longer take care of the desynced node.

mysql> SHOW STATUS LIKE 'wsrep_cluster_size%';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+

mysql> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)

mysql> LOCK TABLE dummy WRITE;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.14,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:3800 user:root host:node3*/;
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.14,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:3800 user:root host:node3*/;
Query OK, 1 row affected (0.00 sec)

mysql> select * from test.dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)

mysql> SHOW STATUS LIKE 'wsrep_cluster_size%';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.01 sec)

mysql> UNLOCK TABLES;
Query OK, 0 rows affected (0.00 sec)

mysql> SET SQL_LOG_BIN=1;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW STATUS LIKE 'wsrep_cluster_size%';

+--------------------+-------+

| Variable_name | Value |

+--------------------+-------+

| wsrep_cluster_size | 3 |

+--------------------+-------+

mysql> SET SQL_LOG_BIN=0;

Query OK, 0 rows affected (0.00 sec)

mysql> LOCK TABLE dummy WRITE;

Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.14,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:3800 user:root host:node3*/;

Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=10.0.3.14,p=...,t=dummy,u=checksum_user lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:3800 user:root host:node3*/;

Query OK, 1 row affected (0.00 sec)

mysql> select * from test.dummy;

+----+------+

| id | name |

+----+------+

| 1 | a |

| 2 | b |

| 3 | c |

| 4 | d |

| 5 | e |

+----+------+

5 rows in set (0.00 sec)

mysql> SHOW STATUS LIKE 'wsrep_cluster_size%';

+--------------------+-------+

| Variable_name | Value |

+--------------------+-------+

| wsrep_cluster_size | 3 |

+--------------------+-------+

1 row in set (0.01 sec)

mysql> UNLOCK TABLES;

Query OK, 0 rows affected (0.00 sec)

mysql> SET SQL_LOG_BIN=1;

Query OK, 0 rows affected (0.00 sec)

Approach # 2:
Another approach is to rebuild database table in question. I got this idea from Jervin’s impressive post. You need to use the pt-online-schema-change tool and a NOOP ALTER. You need to run it through one of the authoritative synchronization nodes of the cluster, as below:

root@node2:~# pt-online-schema-change --execute --alter 'ENGINE=INNODB' D=test,t=dummy
No slaves found.  See --recursion-method if host node4 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `test`.`dummy`...
Creating new table...
Created new table test._dummy_new OK.
Altering new table...
Altered `test`.`_dummy_new` OK.
2016-02-11T01:23:22 Creating triggers...
2016-02-11T01:23:22 Created triggers OK.
2016-02-11T01:23:22 Copying approximately 5 rows...
2016-02-11T01:23:22 Copied rows OK.
2016-02-11T01:23:22 Swapping tables...
2016-02-11T01:23:22 Swapped original and new tables OK.
2016-02-11T01:23:22 Dropping old table...
2016-02-11T01:23:22 Dropped old table `test`.`_dummy_old` OK.
2016-02-11T01:23:22 Dropping triggers...
2016-02-11T01:23:22 Dropped triggers OK.
Successfully altered `test`.`dummy`.

root@node2:~# pt-online-schema-change --execute --alter 'ENGINE=INNODB' D=test,t=dummy

No slaves found. See --recursion-method if host node4 has slaves.

Not checking slave lag because no slaves were found and --check-slave-lag was not specified.

Operation, tries, wait:

copy_rows, 10, 0.25

create_triggers, 10, 1

drop_triggers, 10, 1

swap_tables, 10, 1

update_foreign_keys, 10, 1

Altering `test`.`dummy`...

Creating new table...

Created new table test._dummy_new OK.

Altering new table...

Altered `test`.`_dummy_new` OK.

2016-02-11T01:23:22 Creating triggers...

2016-02-11T01:23:22 Created triggers OK.

2016-02-11T01:23:22 Copying approximately 5 rows...

2016-02-11T01:23:22 Copied rows OK.

2016-02-11T01:23:22 Swapping tables...

2016-02-11T01:23:22 Swapped original and new tables OK.

2016-02-11T01:23:22 Dropping old table...

2016-02-11T01:23:22 Dropped old table `test`.`_dummy_old` OK.

2016-02-11T01:23:22 Dropping triggers...

2016-02-11T01:23:22 Dropped triggers OK.

Successfully altered `test`.`dummy`.

Once the table is rebuild, ROW images will be copied to the node needing synchronization. It might be a good idea to use the –dry-run option before doing the actual changes with –execute.

You’ll need to decide which of these approaches to choose: while this second approach seems a much safer option, it may take some time to complete. If the database table is in gigabytes and terabytes, and the differences in rows are few compared to the actual table size, then the first approach could be faster (but riskier).

In next example, I have added an async node as a replica of Node1, where Node1 acts as the master server for the async slave. The setup looks like:

node1 <-> node2 <-> node3
 |         
 |         
 +-> replica (async)

node1 <-> node2 <-> node3

+-> replica (async)

pt-table-checksum can only detect differences on a replica from the master node (i.e., Node1). In this case, the tool needs to be executed from the master node (Node1). If you run pt-table-checksum from Node2 or Node3, it won’t detect diffs on the replica because its master node is Node1. You can find details about this in the documentation.

Let’s try to see all this in below examples. In this test, I’m going to run the pt-table-checksum tool from Node1 (master) of the async node (slave) to see if it can find the differences on the async replica node.

root@node1:~# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
09-03T03:02:48      0      0        3       1       0   0.022 percona.dsns
09-03T03:02:48      0      1        5       1       0   0.041 test.dummy

root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password
Differences on async1
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1

root@node1:~# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE

09-03T03:02:48 0 0 3 1 0 0.022 percona.dsns

09-03T03:02:48 0 1 5 1 0 0.041 test.dummy

root@node1:~# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password

Differences on async1

TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY

test.dummy 1 -2 1

The pt-table-checksum tool can find the differences on async node test.dummy table. Let’s try to sync those diffs with the help of pt-table-sync tool:

root@async1:~# pt-table-sync --dry-run --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy
# NOTE: --dry-run does not show if data needs to be synced because it
#       does not access, compare or sync data.  --dry-run only shows
#       the work that would be done.
# Syncing D=test,h=localhost,p=...,t=dummy,u=checksum_user in dry-run mode, without accessing or comparing data
# DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
#      0       0      0      0 Chunk     08:38:55 08:38:55 0    test.dummy

root@async1:~# pt-table-sync --print --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,P=3306,h=10.0.3.14,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:49118 user:root host:async1*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,P=3306,h=10.0.3.14,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:49118 user:root host:async1*/;

root@async1:~# pt-table-sync --execute --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy
Deadlock found when trying to get lock; try restarting transaction at line 6115 while doing test.dummy on localhost

root@async1:~# pt-table-sync --dry-run --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy

# NOTE: --dry-run does not show if data needs to be synced because it

# does not access, compare or sync data. --dry-run only shows

# the work that would be done.

# Syncing D=test,h=localhost,p=...,t=dummy,u=checksum_user in dry-run mode, without accessing or comparing data

# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE

# 0 0 0 0 Chunk 08:38:55 08:38:55 0 test.dummy

root@async1:~# pt-table-sync --print --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy

REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,P=3306,h=10.0.3.14,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:49118 user:root host:async1*/;

REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn_D=test,P=3306,h=10.0.3.14,p=...,t=dummy,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn_D=test,h=localhost,p=...,t=dummy,u=checksum_user lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:49118 user:root host:async1*/;

root@async1:~# pt-table-sync --execute --sync-to-master h=localhost,u=checksum_user,p=checksum_password,D=test,t=dummy

Deadlock found when trying to get lock; try restarting transaction at line 6115 while doing test.dummy on localhost

As you can see, I executed pt-table-sync tool from the async node to find the exact differences, and it printed the two rows that were missing on the dummy table on the async replica. Further, when executed with the –execute option, the tool fails with a “Deadlock” error. In reality, there is no deadlock found from the InnoDB status. In order to sync the slave with its master, you can take one of the approaches described above: either use the .sql script to execute into slave server directly, or rebuild the table on the master Galera node and it will rebuild across the slaves.

Let me show you one final example where I tried to run pt-table-checksum from Node3 to find the differences on the async slave with Node1 as its master host:

root@node3:~# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password
Diffs cannot be detected because no slaves were found.  Please read the --recursion-method documentation for information.
node3 is a cluster node but no other nodes or regular replicas were found.  Use --recursion-method=dsn to specify the other nodes in the cluster.

root@node3:~# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=localhost,u=checksum_user,p=checksum_password

Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.

node3 is a cluster node but no other nodes or regular replicas were found. Use --recursion-method=dsn to specify the other nodes in the cluster.

Here, Node3 refused to find differences because there is no direct replica attached to it. The documentation mentions this behavior, as cluster environment, differences can only be detected on the master node directly connected to async slave (as is the case here with Node1 to async replica). In order to run a checksum from Node3 to check for differences between cluster nodes, use the –recursion-method mentioned in previous examples.

Conclusion:
Galera Cluster doesn’t support pt-table-sync in general. But here we’ve found a couple of workarounds to solve this problem:

- You can sync those changes manually by redirecting the pt-table-sync --print output to file to generate a .SQL template, review changes carefully and execute it on nodes with the missing data. Again, this approach is fast but possibly dangerous.

- Another option is the noop alter method using pt-online-schema-change to rebuild the table. It’s the recommended method, but it could be slower.

Finally, after synchronizing the data changes it is a good idea to re-run the checksum tool to verify any discrepancies.

0 0 votes

Article Rating

3 Comments

Oldest

Newest Most Voted

Aftab Khan

10 years ago

Nice article! I think there is one more way to synchronise inconsistent node using latest incremental backup. Quoting from Jay’s blog:
“but this method will only work if and only if:

* You have a large gcache
* You can restore a backup faster than it takes for your gcache to overflow ”

https://www.percona.com/blog/2015/07/16/bypassing-sst-pxc-incremental-backups/