Syncing MySQL slave table with pt-online-schema-change

Syncing MySQL slave table with pt-online-schema-change

PREVIOUS POST
NEXT POST

I recently encountered a situation in which after running Percona Toolkit’s pt-table-checksum on a customer system, 95% of the table on the MySQL master was different on the MySQL slave. Although this table was not a critical part of the infrastructure, from time to time, writes to the table from the master would break replication. Additionally, this table has about 6 million rows, and running pt-table-sync would take sometime. Rebuilding the slave from backup of the master would not be an easy option as well since the slave acts as an archive where it has a lot more data than the master.

So how did we solve it? With pt-online-schema-change and a NOOP ALTER.

How is it possible? pt-online-schema-change works by creating a shadow copy of the original table and start copying the rows from the latter to the former. An additional set of TRIGGERs also ensures that any additional changes to existing rows after they have been copied to the shadow version will propagate.

There is little one caveat though, the  binlog_format  on the master would have to be ROW so the actual ROW images from the master would be copied to the slave. If your binlog_format is set to STATEMENT, you’d still end up with the same inconsistency. Since change statements will be logged as STATEMENT, the NOOP ALTER will operate on the slave table instead of copying from the master. You can configure the binlog_format for the alter process via the –set-vars option i.e.  --set-vars 'binlog_format=ROW'  to make it more easier!

PREVIOUS POST
NEXT POST

Share this post

Comments (4)

  • Alex Reply

    Could you please clarify why pt-online-schema-change would be faster and safer than pt-table-sync?

    September 18, 2014 at 2:12 am
  • Karl Reply

    What binlog format do you guys generally recommend? Is MIXED stable enough to use?

    September 18, 2014 at 5:03 am
  • Flavian Reply

    That’s a Good technique for VERY BIG DB’s

    September 19, 2014 at 9:16 am
  • Jervin Real Reply

    @Alex,

    pt-table-sync relies on the same code as pt-table-checksum, as such for any chunks that are calculated as different, each row is inspected for possible change correction. Not in all case pt-online-schema-change is faster, for example, if you only have a few rows difference then pt-table-sync is still the way to go, however if its a large table and simply a large percentage has drifted, the method described could be faster.

    @Karl,

    It depends really, I’d say go for ROW if you can, but if you have certain dependencies on STATEMENT then choose MIXED or STATEMENT altogether.

    September 20, 2014 at 9:31 pm

Leave a Reply