In this blog, I will cover important aspects which you need to test when benchmarking replication setup. MySQL has great tools that could be used to test its performance. They include:
sysbench – https://github.com/akopytov/sysbench
LinkBench – https://github.com/facebookarchive/linkbench
However, most of the instructions take into account standalone MySQL servers. If you want to test how your replication setup performs, you may want to perform additional checks.
Can the replica catch up to the source server?
To answer this question, you need to run a load on the source machine and monitor the replica delay. You can do it by examining the SHOW REPLICA STATUS ( SHOW SLAVE STATUS) output, particularly the value of the Seconds_Behind_Source ( Seconds_Behind_Master ) parameter. The same parameter is monitored by the “MySQL Replication Lag” graph under the “MySQL Replication Summary” dashboard in Percona Monitoring and Management (PMM).
The more advanced check would be to compare Exec_Source_Log_Pos ( Exec_Master_Log_Pos ), Relay_Log_File with Read_Source_Log_Pos ( Read_Master_Log_Pos ), Source_Log_File pairs since Seconds_Behind_Source may be affected by long-running commands and return wrong values.
You may tune your replica server by adding parallelization if you see increasing lag. Check also how option binlog_transaction_dependency_tracking works: https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_transaction_dependency_tracking
Can the replica run queries while applying updates from the source server?
Once you are happy with the replication lag, you may start adding the load you plan to run on this replica and test its performance. For example, if you plan to use the replica for handling read-only queries, you can run a read-only benchmark on it. You need to watch the performance of the read-only benchmark and replication lag.
If you plan to run analytical queries on the replica: test them. If you plan to use this replica for taking backups: take the backup and measure how it will affect the performance of the replication.
Synchronous replication solutions such as Galera, Percona XtraDB Cluster (PXC), and Group Replication are affected by the flow control paused time. You can use PMM to monitor it.
For PXC/Galera clusters, watch graph “Flow Control Paused Time” under “PXC/Galera Cluster Summary” dashboard.
Or you can watch the value of the wsrep_flow_control_paused variable. See Galera Flow Control in Percona XtraDB Cluster for MySQL for more details.
For Group Replication, you need to check the value of the COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE column in the table performance_schema.replication_group_member_stats . That shows how many transactions are waiting in the queue on the secondary node to apply.
You may also watch “Replication Lag” graph under “MySQL Group Replication” dashboard in PMM
Your best test is your production
As said by Dimitri Kravtchuk at Fosdem 2020:
While it is hard to create such a benchmark on the server that accepts writes from users, for the replica, it is comparatively easy. All you need to do is to create a replica of your production server and not make it available for the application yet. First, watch if it can handle replication lag while idle, then start adding load to it. You may also use the mirroring feature of ProxySQL Note; however, you need to check if some of the limitations described in this post, still apply.
How fast will the replica catch up?
One more thing you may want to test is how fast the replica will recover after a delay in replicating updates from the source server. To perform this test, stop the replica while running the source server’s load, restart it after a certain amount of time, and watch replication lag. If you are testing a replica that serves read-only load, let the load test continue.
It is better to perform stop, wait, and start in one series of commands, so you will have control over how long the replica was stopped. For example, to stop replica for one hour, use the following commands:
Then wait until the replica is running:
SELECT SERVICE_STATE FROM performance_schema.replication_applier_status;
Once this query returns ON, start monitoring the value of Seconds_Behind_Source in the SHOW REPLICA STATUS output until it becomes 0. Once it happens, record the time. This way, you will know how much time will be needed to apply one hour of changes.
You can use the same benchmarking tools that are recommended for standalone MySQL servers to perform performance tests against replication setups. You only need to monitor replication lag together with other performance characteristics.