I was interested to hear about semi-sync replication improvements in MySQL’s 5.7.4 DMR release and decided to check it out. I previously blogged about poor semi-sync performance and was pretty disappointed from semi-sync’s performance across WAN distances back then, particularly with many client threads.
The basic environment of these tests was:
- AWS EC2 m3.medium instances
- Master in us-east-1, slave in us-west-1 (~78ms ping RTT)
- CentOS 6.5
- Semi-sync replication plugin installed and enabled.
- GTID’s enabled (except on 5.5)
- sysbench 0.5 update_index.lua test, 60 seconds, 250k table size.
- MySQL 5.7 was tested with both AFTER_SYNC and AFTER_COMMIT settings for rpl_semi_sync_master_wait_point
- I tested Percona XtraDB Cluster 5.6 / Galera 3.5 as well by means of comparison
Without further ado, here’s the TpmC results I got for a single client thread:
These graphs are interactive, so mouse-over for more details. I’m using log scales to better highlight the differences.
The blue bars represent transactions per second (more is better). The red bars represent average latency per transaction per client (less is better). Remember these transactions are synchronously being copied across the US before the client can execute another.
The first test is our control: Async allows ~273 TPS on a single thread. Once we introduce synchronicity, we clearly see the bulk of the time is that round trip. Note that MySQL 5.5 and 5.6 are a bit higher than MySQL 5.7 and Percona XtraDB Cluster, the latter of which show pretty similar results.
This gets more interesting to see if we redo the same tests, but with 32 test threads:
In the MySQL 5.5 and 5.6 tests, we can clearly see nasty serialization. Both really don’t allow more performance than single threaded sysbench. I was happy to see, however, that this seems to be dramatically improved in MySQL 5.7, nice job Oracle!
AFTER_SYNC and AFTER_COMMIT vary, but AFTER_SYNC is the default and is preferred over AFTER_COMMIT. The reasoning here is AFTER_SYNC forces the semi-sync wait BEFORE the transaction is committed on the master. The client still must wait for the semi-sync in AFTER_COMMIT, but other transactions may see its data on the master BEFORE we confirm the semi-sync slave has received it. This is potentially bad because if the master crashed at that instant, clients may have read data from the master that did not make it to a failover slave. This is a type of ‘phantom read’ and Yoshinori explains it in more detail here.
What about Percona XtraDB Cluster?
I also want to discuss the Percona XtraDB Cluster results, Galera here is somewhat slower than MySQL 5.7 semi-sync. There may be some enhancements to Galera that can be made (competition is a good thing), but there are still some significant differences here:
- Galera allows for writing on any and all nodes, semi-sync does not
- Galera introduces the certification process to check for conflicts, Semi-sync does not
- Galera is not 2-phase commit and transactions are not committed synchronously anywhere except the node originating the transaction. So, it is similar to Semi-sync in this way.
- I ran the Galera tests with no log-bin (Galera does not require it)
- I ran the Galera tests with innodb_flush_log_at_trx_commit=1
- I set the fc_limit on the second node really high to eliminate Flow control as a bottleneck. In a live cluster, it would typically be needed.
- Galera provides parallel slave threads for faster apply, but it doesn’t matter here because I set the fc_limit so high
Semi-sync in MySQL 5.7 looks like a great improvement. Any form of synchronicity is always going to be expensive, particularly over 10s and 100s of milliseconds of latency. With MySQL 5.7, I’d be much more apt to recommend semi-sync as an option than in previous releases. Thanks to Oracle for investing here.