MySQL version 5.6 and later allows you to execute replicated events using parallel threads. This feature is called Multi-Threaded Slave (MTS), and to enable it you need to modify the slave_parallel_workers variable to a value greater than 1.
Recently, a few customers asked about the meaning of some new statistics printed in their error log files when they enable MTS. These error messages look similar to the example stated below:
[Note] Multi-threaded slave statistics for channel '': seconds elapsed = 123; events assigned = 57345; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 0 waited (count) when Workers occupied = 0 waited when Workers occupied = 0
The MySQL reference manual doesn’t show information about these statistics. I’ve filled a bug report asking Oracle to add information about these statistics in the MySQL documentation. I reported this bug as #85747.
Before they update the documentation, we can use the MySQL code to get insight as to the statistics meaning. We can also determine how often these statistics are printed in the error log file. Looking into the rpl_slave.cc file, we find that when you enable MTS – and log-warnings variable is greater than 1 (log-error-verbosity greater than 2 for MySQL 5.7) – the time to print these statistics in MySQL error log is 120 seconds. It is determined by a hard-coded constant number. The code below shows this:
Statistics go to the error log every # of seconds when --log-warnings > 1
const long mts_online_stat_period= 60 * 2;
Does this mean that every 120 seconds MTS prints statistics to your MySQL error log (if enabled)? The answer is no. MTS prints statistics in the mentioned period depending on the level of activity of your slave. The following line in MySQL code verifies the level of the slave’s activity to print the statistics:
if (rli->is_parallel_exec() && rli->mts_events_assigned % 1024 == 1)
From the above code, you need MTS enabled and the modulo operation between the mts_events_assigned variable and 1024 equal to 1 in order to print the statistics. The mts_events_assigned variable stores the number of events assigned to the parallel queue. If you’re replicating a low level of events, or not replicating at all, MySQL won’t print the statistics in the error log. On the other hand, if you’re replicating a high number of events all the time, and the mts_events_assigned variable increased its value until the remainder from the division between this variable and 1024 is 1, MySQL prints MTS statistics in the error log almost every 120 seconds.
You can find the explanation these statistics below (collected from information in the source code):
- Worker queues filled over overrun level: MTS tends to load balance events between all parallel workers, and the slave_parallel_workers variable determines the number of workers. This statistic shows the level of saturation that workers are suffering. If a parallel worker queue is close to full, this counter is incremented and the worker replication event is delayed in order to avoid reaching worker queue limits.
- Waited due to a Worker queue full: This statistic is incremented when the coordinator thread must wait because of the worker queue gets overfull.
- Waited due to the total size: This statistic shows the number of times that the coordinator thread slept due to reaching the limit of the memory available in the worker queue to hold unapplied events. If this statistic keeps increasing when printed in your MySQL error log, you should resize the slave_pending_jobs_size_max variable to a higher value to avoid the coordinator thread waiting time.
- Waited at clock conflicts: In the case of possible dependencies between transactions, this statistic shows the wait time corresponding to logical timestamp conflict detection and resolution.
- Waited (count) when used occupied: A counter of how many times the coordinator saw Workers filled up with “enough” with assignments. The enough definition depends on the scheduler type (per-database or clock-based).
- Waited when workers occupied: These are statistics to compute coordinator thread waiting time for any worker available, and applies solely to the Commit-clock scheduler.
Multi-threaded slave is an exciting feature that allows you to replicate events faster, and keep in sync with master instances. By changing the log-warnings variable to a value greater than 1, you can get information from the slave error log file about how multi-threaded performance.