Comparing Percona XtraDB Cluster with Semi-Sync replication Cross-WAN

I have a customer who is considering Percona XtraDB Cluster (PXC) in a two colo WAN environment.  They wanted me to do a test comparing PXC against semi-synchronous replication to see how they stack up against each other.

Test Environment

The test environment included AWS EC2 nodes in US-East and US-West (Oregon).  The ping RTT latency between these nodes was right around 100ms.

All environments used Percona Server or Percona XtraDB Cluster server 5.5.24.  Innodb durability was disabled for all tests (innodb_flush_log_at_trx_commit=2).  All other settings were the same kept the same unless otherwise noted.

I tested against the following setups:


The control environment was simply a Master with log-bin enabled and one slave connected (not semi-sync).


Same as the control but with semi-sync enabled on the slave (slave was in the other colo).

XtraDB Cluster 1-colo

3  nodes in one datacenter.  Writes were only done on a single node.  The cluster contained the tuning:

based on Codership’s recommended tuning for single-node writing.  I was running wsrep_slave_threads=16 and all other required wsrep settings.

Also note that I disabled log-bin and innodb_support_xa on these nodes.

XtraDB Cluster 2-colo

Just like PXC 1-colo, except with 3 more nodes in the second colo.  I ignored quorum arbitration for the purposes of this test.  Writes were only done on one node in one colo.


I did two different tests to illustrate the differences between these technologies.

Single-write latency test

For this test, I simply created a table, inserted about 10 rows, and calculated an average time for each INSERT to autocommit.

EnvironmentResults (ms)
PXC 1-colo2
PXC 2-colo108


From this I made the following observations:

  • The overhead of Galera replication is approximately 10x longer than local binary log writing.  It seems conceivable to me that this may be improvable by some Galera tuning. UPDATE: latency between EC2 nodes, even within the same availability zone seems to average > 1ms, so I don’t think this result is necessarily all that significant.
  • Semi-sync, as expected, takes approximately 1 RTT for the slave to verify to the master that it got the transaction.
  • PXC 2-colo (with 6 nodes instead of 2) does very well considering how much work it is doing, still approximately 1 RTT for commit.
  • Semi-sync does not actually enforce the transaction is committed on the slave, just that the slave received it.
  • PXC 2-colo committed the data on all 6 nodes across two datacenters in approximately the same amount of time as one Semi-sync transaction (a whole lot more work).

Sysbench 32-client test

For this test I used the latest sysbench with 20 tables (~5G of data) and 32 test clients.  The results are simply the average transactions per second I got from a 60 second run.  I do not believe these tests were disk-bound in any way, I utilized Percona Server’s Buffer pool dump/restore feature to preserve the caches across server restarts, and a gaussian distribution on sysbench (i.e., non-random).

EnvironmentResults (TPS)
PXC 1-colo856
PXC 2-colo224


This to me was really interesting:

  • PXC 1-colo beat the control.  I attribute this to log-bin and innodb_support_xa being disabled in the PXC environments
  • Semi-sync performs dismally under concurrency.  Think about it, our single inserts took ~100ms, so we are effectively serializing our 32 clients with semi-sync replication.  Apparently each client must wait for it’s turn writing to the binlog and waiting for confirmation from the slave, there can be no parallelization here.
  • I could not find any way to parallelize semi-synchronous replication (at least in 5.5), I could be missing something.
  • PXC’s true parallel replication shines here, applying up to 16 writesets at a time based on the number of threads I configure.   Even though the individual commits were approximately the same speed as semi-sync, parallelism allows PXC to scale much further across high latency environments.
  • If each transaction took 100ms to commit, then I’d expect 16 threads to handle about 160TPS at best.  There must be some other optimization here that I can’t explain.
  • I did not try optimizing the PXC 2-colo result, but I intend to do so soon.  I would not be surprised to get better results with the right tuning.  Even just setting the wsrep_slave_threads=32 to have 1 applier thread per node per test client.  More on that in a future post.


Percona Xtradb Cluster and Galera on which it is based offers a much more realistic multi-datacenter system for high availability and disaster recovery than semi-synchronous replication in stock MySQL.

Share this post

Comments (13)

  • Alex Reply

    Hi Jay,

    About 2ms latency in the first test: in us-east-1 zone ping RTT seems to be ~0.5 ms, so it is really hard to improve on that 2ms result when running PXC in EC2. You probably can’t go below 1ms.


    June 14, 2012 at 1:41 pm
  • Justin Swanhart Reply

    I think you have a typo. You tested 5.5.24 right? Not 5.1.24?

    Are there risks of innodb_flush_log_at_trx_commit=2 on XtraDB cluster? Can data get out of sync?

    June 14, 2012 at 1:49 pm
  • Henrik Ingo Reply

    Jay: Really surprised about the semi-sync result. I never tested it over WAN, but over LAN I’ve found it to be really fast. ( Only on disk-bound of course you get the same slave-lag issues as with the async MySQL replication.

    Justin: There are no risk other than what you already know: If a node crashes, some transactions may not have been written to disk. However, when the node comes back up, it will do whatever is necessary to sync up with the other nodes anyway, including the option of wiping out its own data and taking a full copy from another node. Or to put it another way, once a crashed node comes back up, it is out of sync anyway, so it doesn’t matter at which point the data on disk has gotten out of sync.

    So relaxing InnoDB flushing is essentially based on the idea that a commit is durable due to the replication, not due to flushing to disk. (MySQL NDB Cluster has the same approach.)

    June 14, 2012 at 3:25 pm
  • Andy Reply


    For the control what settings did you use for sync_binlog and innodb_support_xa?

    June 14, 2012 at 9:01 pm
  • Henrik Ingo Reply

    Interesting. On the codership mailing list Erkan has also made tests where semi-sync proved unsuitable for WAN replication:!msg/codership-team/Qfjg4VL2OEY/Swkwb2dCoCUJ

    Seems like it is true then.

    June 14, 2012 at 11:59 pm
  • Jay Janssen Reply

    To be honest, I was a little surprised too, but the math makes sense for binary log serialization.

    Justin: 5.5.24, yes 🙂 Risk in my my opinion is based on the likelihood of all nodes going offline at once.

    Andy: sync_binlog was off in all tests. innodb_support_xa was ON in the tests where I was using log-bin.

    June 15, 2012 at 5:22 am
  • Peter Zaitsev Reply


    Sometimes people miss the “game changers” which change performance picture completely. WAN vs LAN is one of such as round trip can be 100+ times longer. Other one is HDD without Raid with BBU compared to one with, the difference would be some 200 fsync/sec vs 10.000+ again close to 100x difference. Similar numbers apply to SSD vs HDD

    The point is if you’re testing on LAN I would not make any assumptions about WAN performance because it is likely to be a lot different.

    Interesting to see though what Group Commit essentially does not work with Semi-Sync in 5.5 I wounder if it is going to be resolved in 5.6 as it can be a serious bottleneck.

    June 15, 2012 at 2:36 pm
  • Vincent van Scherpenseel Reply


    The link for label “Codership’s recommended tuning” is outdated. It should be changed to

    February 13, 2015 at 10:03 am
  • james Reply

    Thanks a lot, Jay.

    Any updated figures for newer MySQL (e.g. 5.7) please?
    Oracle claimed that 5.7 largely improved replication.

    Also, does the new “Codership’s recommended tuning” make any differences please?


    March 31, 2016 at 6:45 am
  • james Reply

    also, if RTT = 10ms, max writes = 856 – (856 – 244)/10 = 793?

    March 31, 2016 at 11:05 am
  • james Reply

    shall set up Galera on WAN for testing soon. Just wonder if you guys have already done the bench marking for RTT 10ms. Thanks

    March 31, 2016 at 11:21 am
  • James Reply

    Hi Jay,
    My stand-alone server (bing log enabled) on VMWare with 2.6.32-504.23.4.el6.x86_64, 4G RAM and 4CPU can achieve > 62k writes/s


    April 6, 2016 at 7:50 am
  • Kenny Gryp Reply

    In MySQL 5.7, Semi-sync does not longer suffer serialization in with +1 thread

    When there is 100ms between master and semi-sync slave:

    [root@perconaserver vagrant]# sysbench –db-ps-mode=disable –mysql-host=localhost –mysql-user=root –mysql-db=semi –num-threads=32 –report-interval=1 /usr/share/sysbench/oltp_insert.lua run
    [ 1s ] thds: 32 tps: 256.22 qps: 256.22 (r/w/o: 0.00/256.22/0.00) lat (ms,95%): 211.60 err/s: 0.00 reconn/s: 0.00
    [ 2s ] thds: 32 tps: 286.30 qps: 286.30 (r/w/o: 0.00/286.30/0.00) lat (ms,95%): 114.72 err/s: 0.00 reconn/s: 0.00
    [ 3s ] thds: 32 tps: 289.18 qps: 289.18 (r/w/o: 0.00/289.18/0.00) lat (ms,95%): 112.67 err/s: 0.00 reconn/s: 0.00
    [ 4s ] thds: 32 tps: 290.83 qps: 290.83 (r/w/o: 0.00/290.83/0.00) lat (ms,95%): 112.67 err/s: 0.00 reconn/s: 0.00
    [ 5s ] thds: 32 tps: 319.08 qps: 319.08 (r/w/o: 0.00/319.08/0.00) lat (ms,95%): 110.66 err/s: 0.00 reconn/s: 0.00
    [ 6s ] thds: 32 tps: 286.84 qps: 286.84 (r/w/o: 0.00/286.84/0.00) lat (ms,95%): 112.67 err/s: 0.00 reconn/s: 0.00
    [ 7s ] thds: 32 tps: 289.76 qps: 289.76 (r/w/o: 0.00/289.76/0.00) lat (ms,95%): 112.67 err/s: 0.00 reconn/s: 0.00
    [ 8s ] thds: 32 tps: 289.30 qps: 289.30 (r/w/o: 0.00/289.30/0.00) lat (ms,95%): 110.66 err/s: 0.00 reconn/s: 0.00

    October 9, 2017 at 7:42 am

Leave a Reply