I have a customer who is considering Percona XtraDB Cluster (PXC) in a two colo WAN environment. They wanted me to do a test comparing PXC against semi-synchronous replication to see how they stack up against each other.
The test environment included AWS EC2 nodes in US-East and US-West (Oregon). The ping RTT latency between these nodes was right around 100ms.
All environments used Percona Server or Percona XtraDB Cluster server 5.5.24. Innodb durability was disabled for all tests (innodb_flush_log_at_trx_commit=2). All other settings were the same kept the same unless otherwise noted.
I tested against the following setups:
The control environment was simply a Master with log-bin enabled and one slave connected (not semi-sync).
Same as the control but with semi-sync enabled on the slave (slave was in the other colo).
3 nodes in one datacenter. Writes were only done on a single node. The cluster contained the tuning:
|
1 |
wsrep_provider_options = "gcs.fc_limit = 256; gcs.fc_factor = 0.99; gcs.fc_master_slave = yes" |
based on Codership’s recommended tuning for single-node writing. I was running wsrep_slave_threads=16 and all other required wsrep settings.
Also note that I disabled log-bin and innodb_support_xa on these nodes.
Just like PXC 1-colo, except with 3 more nodes in the second colo. I ignored quorum arbitration for the purposes of this test. Writes were only done on one node in one colo.
I did two different tests to illustrate the differences between these technologies.
For this test, I simply created a table, inserted about 10 rows, and calculated an average time for each INSERT to autocommit.
| Environment | Results (ms) |
|---|---|
| Control | 0.25 |
| Semi-sync | 102 |
| PXC 1-colo | 2 |
| PXC 2-colo | 108 |
From this I made the following observations:
For this test I used the latest sysbench with 20 tables (~5G of data) and 32 test clients. The results are simply the average transactions per second I got from a 60 second run. I do not believe these tests were disk-bound in any way, I utilized Percona Server’s Buffer pool dump/restore feature to preserve the caches across server restarts, and a gaussian distribution on sysbench (i.e., non-random).
| Environment | Results (TPS) |
|---|---|
| Control | 840 |
| Semi-sync | 10 |
| PXC 1-colo | 856 |
| PXC 2-colo | 224 |
This to me was really interesting:
Percona Xtradb Cluster and Galera on which it is based offers a much more realistic multi-datacenter system for high availability and disaster recovery than semi-synchronous replication in stock MySQL.