Estimating potential for MySQL 5.7 parallel replication

parallel replicationUnlike MySQL 5.6, where parallel replication can only be used when replicas have several schemas, MySQL 5.7 replicas can read binlog group commit information coming from the master to replicate transactions in parallel even when a single schema is used. Now the question is: how many replication threads should you use?

A simple benchmark

Let’s assume we have one master and three slaves, all running MySQL 5.7.

One slave is using regular single-threaded replication (the control slave in the graph below), one is using 20 parallel workers (MTS 20 workers) and the last one is using 100 parallel workers (MTS 100 workers).

As a reminder, here is the settings that need to be adjusted for 5.7 parallel replication:

GTID replication is also highly recommended, if you don’t want to face annoying issues.

Now let’s run a simple sysbench workload, inserting records in 25 separate tables in the same database with 100 concurrent threads:

Because we’re using 100 concurrent threads on the master, we can expect that some parallelization is possible. This means that if we see replication lag with the control slave, we’ll probably see less lag with the 20-worker slave and even less with the 100-worker slave.

This is not exactly what we get:


Parallel replication is indeed useful, but the 100-worker slave doesn’t provide any benefits compared to the 20-worker slave. Replication lag is even slightly worse.

What happened?

Some instrumentation with performance_schema

To have a better understanding of how efficiently the parallel replication threads are used, let’s enable some instrumentation on slaves (in other words, recording executed transactions):

For better readability of the results, let’s create a new view (tracking how many transactions are executed by each replication thread):

Now, after running sysbench again, let’s do some math to see how often each replication thread was run:

We can see that the workload has a limited potential for parallelism – therefore, it’s not worth configuring more than 3-4 replication threads.

The slight performance degradation with 100 replication threads is probably due to the overhead of the coordinator thread.


Estimating the optimal number of replication threads with MySQL 5.7 parallel replication is quite difficult if you’re just guessing. The performance_schema provides a simple way to understand how the workload is handled by the replication threads.

It also allows you to see if tuning binlog_group_commit_sync_delay provides more throughput on slaves without too much impact on the master’s performance.

Share this post

Comments (7)

  • Vadim Tkachenko


    You use 25 tables.
    Why can slave can do only 4 threads in parallel ?

    February 10, 2016 at 2:21 pm
  • Stephane Combaudon


    I was surprised by the result too and I don’t have a good explanation. The workload may be too simple with INSERTs running too fast to get any chance to get committed in parallel on the master.

    If I have time, I’ll try to run the same test but with a different workload (like UPDATEs with a secondary key).

    February 10, 2016 at 3:15 pm
  • Kenny Gryp

    Hi Stephane,

    Maybe some more tuning is possible to improve performance.

    – Tuning binlog_group_commit_sync_delay and binlog_group_commit_sync_no_delay_count (and keeping sync_binlog=1) on the master could increase group commits and increase the potential to execute queries in parallel. In MariaDB you can actually see status of how good much group commit is happening by looking at Binlog_commits and Binlog_group_commits (don’t know yet how to do that in MySQL 5.7).
    – Having a larger slave_pending_jobs_size_max on the slave might increase performance as well, allthough I think that’s not the case in your test.

    February 13, 2016 at 8:53 am
  • Wagner Bianchi

    Stéphan, very nice blog post, congrats!

    Can you do the same test but having master configured with:

    binlog_group_commit_sync_delay=1000000 (default value)

    and then, slaves with:

    slave_parallel_type = LOGICAL_CLOCK
    slave_parallel_workers = 20

    I think the big improvement as well as tested by Vadim before is the writeback cache


    February 19, 2016 at 2:42 pm
    • John Xu

      How many cores do the slaves have? Looks like 4? This might be the answer to why the config 4 SQL threads on slave is optimal.

      December 27, 2016 at 10:21 pm
  • Scott Noyes

    See also for another way to glean similar information from the binary log.

    July 11, 2017 at 11:37 am
  • Oliver Tappin

    Typo: “quite difficult if your just guessing.”, should be “you’re”.

    May 22, 2019 at 6:06 am

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.