Multi-threaded replication with MySQL 5.6: Use GTIDs!

Multi-threaded replication with MySQL 5.6: Use GTIDs!

PREVIOUS POST
NEXT POST

MySQL 5.6 allows you to execute replicated events in parallel as long as data is split across several databases. This feature is named “Multi-Threaded Slave” (MTS) and it is easy to enable by setting slave_parallel_workers to a > 1 value. However if you decide to use MTS without GTIDs, you may run into annoying issues. Let’s look at two of them.

Skipping replication errors

When replication stops with an error, a frequent approach is to “ignore now and fix later.” This means you will run SET GLOBAL sql_slave_skip_counter=1 to be able to restart replication as quickly as possible and later use pt-table-checksum/pt-table-sync to resync data on the slave.

Then the day when I hit:

I tried to use the trick:

But:

Note that the position reported with Exec_Master_Log_Pos has moved forward, but I still have my duplicate key error. What’s wrong?

The issue is that the positions reported by SHOW SLAVE STATUS are misleading when using MTS. Quoting the documentation about Exec_Master_Log_Pos:

When using a multi-threaded slave (by setting slave_parallel_workers to a nonzero value in MySQL 5.6.3 and later), the value in this column actually represents a “low-water” mark, before which no uncommitted transactions remain. Because the current implementation allows execution of transactions on different databases in a different order on the slave than on the master, this is not necessarily the position of the most recently executed transaction.

So the solution to my problem is first to make sure that there is no execution gap, and only then to skip the offending event. There is a specific statement for the first part:

And now I can finally skip the error and restart replication:

The last thing to do is of course to resync the slave.

Backups

If you cannot trust the output of SHOW SLAVE STATUS to get the current binlog position, it means that taking a backup from a slave with parallel replication is tricky.

For instance, if you run mysqldump --dump-slave=2 to get the binlog position of the master, mysqldump will first run STOP SLAVE and then SHOW SLAVE STATUS. Is stopping the slave sufficient to avoid execution gaps? Actually, no.

The only option then seems to be: run STOP SLAVE followed by START SLAVE UNTIL SQL_AFTER_MTS_GAPS, followed by mysqldump while replication is stopped. Not very handy!

GTIDs to the rescue!

The solution for both issues is to use GTIDs.

They help when you want to skip an event because when using GTIDs, you must explicitly specify the transaction you will be skipping. It doesn’t matter whether there are execution holes.

They also help for backups because mysqldump takes the position from gtid_executed which is updated at each transaction commit (XtraBackup does that too).

Conclusion

If your application uses several databases and if you’re fighting with replication lag, MTS can be a great feature for you. But although GTIDs are not technically necessary, you’ll be exposed to tricky situations if you don’t use them.

Is everything rosy when using both GTIDs and MTS? Not exactly… But that will be the topic for a separate post!

By the way, if you are in the Brussels area this weekend, come see me and other great speakers at the MySQL and friends devroom at FOSDEM!

PREVIOUS POST
NEXT POST

Share this post

Comments (9)

  • tekslate Reply

    Thank you, its very useful

    January 29, 2015 at 6:26 am
  • voja Reply

    How does the MTS feature fit with Percona XtraDB Cluster and parallel apply? Complementary/conflicting/not supported?

    February 12, 2015 at 6:22 am
  • Stephane Combaudon Reply

    Voja,

    MTS is only used by MySQL replication, not by Galera replication, so parallel apply in XtraDB Cluster and MTS are independent features.

    If you have a cluster and if one of the nodes has an async replica, it can make sense to enable the MTS feature on the async replica if you have several databases in your schema. But it won’t have any impact on what happens inside the cluster.

    February 12, 2015 at 7:33 am
  • Jayaram Reply

    Thanks for the use full info,

    Is it suggestable ? , to usage MTS without GTID while having mixed database engines, i.e. some of our databases are in MyISAM, and the remaining are in InnoDB.

    March 27, 2015 at 2:52 am
  • Ian W Reply

    We’ve enabled parallel slave workers on a DB and are seeing some log messages, which, I don’t think are bad but can’t find anything in the documentation about.

    2015-05-14 12:30:07 23709 [Note] Multi-threaded slave statistics: seconds elapsed = 123; events assigned = 646145; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; slept when Workers occupied = 150007

    Could you shed some light / point me at documentation that explains these?

    May 14, 2015 at 7:34 am
  • Khan Reply

    Q Stephane,

    is it recommended to use Multi-threaded replication if we have 1 database in a M-M scenario. Note that we are only writing to active. Or will this impact -vely?

    August 5, 2015 at 5:47 pm
  • Flavian Reply

    Current MTS setup is per database connection so if you have only 1 database MTS wont be of much use.

    October 12, 2015 at 7:45 am
  • Bala Guntipalli Reply

    It’s A Great Pleasure reading your Article Bala Guntipalli Thanks for posting.

    June 26, 2018 at 1:11 am
  • Bala Guntipalli Reply

    Excellent article, Cool, Looking ahead to reading a lot. Sensible article Bala Guntipalli Thanks for posting.

    June 26, 2018 at 1:14 am

Leave a Reply