Live MySQL Slave Rebuild with Percona Toolkit

PREVIOUS POST
NEXT POST

MySQL slave data out of syncRecently, we had an edge case where a MySQL slave went out-of-sync but it couldn’t be rebuilt from scratch. The slave was acting as a master server to some applications and it had data was being written to it. It was a design error, and this is not recommended, but it happened. So how do you synchronize the data in this circumstance? This blog post describes the steps taken to recover from this situation. The tools used to recover the slave were pt-slave-restartpt-table-checksum, pt-table-sync and mysqldiff.

Scenario

To illustrate this situation, it was built a master x slave configuration with sysbench running on the master server to simulate a general application workload. The environment was set with a Percona Server 5.7.24-26 and sysbench 1.0.16.

Below are the sysbench commands to prepare and simulate the workload:

With the environment set, the slave server was stopped, and some operations to desynchronize the slave were performed to reproduce the problem.

Fixing the issue

With the slave desynchronized, a restart on the replication was executed. Immediately, the error below appeared:

To recover the slave from this error, we had to point the slave to an existing binary log with a valid binary log position. To get a valid binary log position, the command shown below had to be executed on the master:

Then, a CHANGE MASTER command was run on the slave:

Now the slave had a valid binary log file to read, but since it was inconsistent, it hit another error:

Working past the errors

Before fixing the inconsistencies, it was necessary to keep the replication running and to skip the errors. For this, the pt-slave-restart tool will be used. The tool needs to be run on the slave server:

The tool skips errors and starts the replication threads. Below is an example of the output of the pt-slave-restart:

Finding the inconsistencies

With the tool running on one terminal, the phase to check the inconsistencies began. First things first, an object definition check was performed using mysqldiff utility. The mysqldiff tool is part of MySQL utilities. To execute the tool:

And below are the differences found between the master and the slave:

1-) A table that doesn’t exist

2-) A wrong table structure

Performing the recommendations on the slave (creating the table and the table modification) the object’s definition was now equal. The next step was to check data consistency. For this, the pt-table-checksum was used to identify which tables are out-of-sync. This command was run on the master server.

And an output example:

Fixing the data inconsistencies

Analyzing the DIFFS column it is possible to identify which tables were compromised. With this information, the pt-table-sync tool was used to fix these inconsistencies. The tool synchronizes MySQL table data efficiently, performing non-op changes on the master so they can be replicated and applied on the slave. The tools need to be run on the slave server. Below is an example of the tool running:

It is possible to perform a dry-run of the tool before executing the changes to check what changes the tool will apply:

After executing the pt-table-sync, we recommend that you run the pt-table-checksum again and check if the DIFFS column shows the value of 0.

Conclusion

This blog post was intended to cover all possible issues that could happen on a slave when it goes out-of-sync such as DDL operations, binary log purge and DML operations. This process involves many steps and it could take a long time to finish, especially in large databases. Note that this process might take longer than the backup/restore process. However, in situations like the one mentioned above, it might be the only solution to recover a slave.


Image based on Photo by Randy Fath on Unsplash

 

PREVIOUS POST
NEXT POST

Share this post

Comments (3)

  • Ralf Buescher Reply

    It might be an idea to have apps use dedicated credentials and priviledges. This way an account can read but not write – it might help to make a developer to revonsider his options, when hit with a a „No Insert/ Update/ Create/ Drop/ Alter Permission Granted“ error. Errors are not in any case a human error, but rather the result of a lack of information.

    March 13, 2019 at 8:30 am
  • spine care centre Reply

    Perfect tool kit

    March 14, 2019 at 12:05 am
  • Neo Anderson Reply

    In place of pt-slave-restart you can also use inbuilt MySQL parameter slave_exec_mode=IDEMPOTENT

    This will keep the replication running even in case of errors, these errors, like you described, can be fixed using pt-table-checksum and pt-table-sync.

    March 14, 2019 at 8:03 am

Leave a Reply