Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live.  Register

Investigating MySQL Replication Latency in Percona XtraDB Cluster

March 4, 2013
Author
Peter Zaitsev
Share this Post:


Investigating MySQL Replication Latency in Percona XtraDB Cluster

Investigating MySQL Replication Latency in Percona XtraDB Cluster

I was curious to check how Percona XtraDB Cluster behaves in terms of replication latency (or data propagation latency). Specifically, I wanted to see if stale reads could occur on other nodes immediately after a write.

To test this, I wrote a simple script (included at the end) that:

  • Writes to one node
  • Immediately reads from another node
  • Retries until data propagates
  • Measures latency

The setup included 3 cluster nodes (DPE1, DPE2, DPE3) connected via a 1Gbit network, with tests run from a separate client server.

Baseline (No Load)

Key observations:

  • Replication is asynchronous from a propagation standpoint
  • Less than 1% inconsistency observed
  • Average delay under 1ms

With Load on Write Node (DPE1)

Results:

  • ~40% inconsistency rate
  • Average delay still only a few milliseconds

Load on Read Node (DPE2)

Observation:

  • Similar inconsistency frequency
  • Higher latency → read-side load impacts propagation more

Write-Heavy Workload

Surprisingly better than mixed workload (~11% inconsistency).

Write Load on Read Node

Worst-case scenario:

  • Over 50% inconsistency
  • Outliers up to 500ms+

Load on Unused Node (DPE3)

Minimal impact as expected.

Synchronous Reads Option

This ensures reads wait for replication, providing full consistency.

Large Transaction Impact

Key issue:

  • Large transactions cause massive stalls (up to ~45 seconds)
  • Due to certification + replication overhead

Observed Stall Behavior

Certification can stall unrelated operations across nodes.

Summary

  • Excellent performance for small transactions
  • Low latency under normal conditions
  • Synchronous reads available when needed
  • Large transactions can cause severe stalls

Recommendation: Understand transaction size and tolerance for latency when designing applications on Percona XtraDB Cluster.

Appendix: Test Script

Appendix: PXC Configuration

0 0 votes
Article Rating
Subscribe
Notify of
guest

16 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Tobias
Tobias
13 years ago

Are there any benchmarks showing the difference on TPS or commit time for setting wsrep_causal_reads to 0 or 1? I did not see any benchmark stating which values for wsrep_causal_reads are used, including the benchmarks done by percona.

Robert Hodges
13 years ago

Hi Peter, thanks so much for this article. I was planning to do exactly the same sort of test as part of preparing for my multi-master talk at the Percona Live conference. This behavior, especially for long transactions, is very much expected so it’s great to see some proof from such a simple benchmark. Another interesting thing to try is generating distributed deadlock through a table hotspot that creates a rollback when you update simultaneously across multiple nodes. I would expect this to become a bigger problem with workloads that (a) have a lot of updates on a small set of rows and (b) as the overall load increases, since this makes the deadlock window bigger when update propagation is delayed in any way. I’ll post my results unless you beat me to it.

mike morse
mike morse
13 years ago

Hi Peter,

Great you’re doing tests on the intricacies of Galera, we use it in production, and overall very satisfied with it, but more use cases and advanced testing is needed.

Could you clarify your meaning in your statement,

“First Replication by default in Percona XtraDB Cluster is Asynchronous from Data Propagation Standpoint – it takes time (though short one in this case) for changes committed on the one node to become visible to the other.”

I want to confirm you are you saying your test is showing a delay simply between the local commit (not visibility) of the update on one node vs. the visibility on the second node (all nodes however would read exactly the same lag of commit -> visibility when the certification and then cluster wide commit is issued).

Or are you saying it shows data inconsistency between the two nodes in terms of the actual visibility (inconsistency in the cluster wide commit)?

Robert Hodges
13 years ago

, what do you mean by a cluster-wide commit? Galera just ensures the transactions get to the DBMS nodes, which certify and commit independently. In this case you can get some differences between when transactions show up on different nodes. In fact, it should be possible for transactions to commit and become visible on replicas before they are visible on the originating master if the master commits more slowly. This could happen if there are differences in the speed of file systems across hosts. (Another interesting case to test.)

mike morse
mike morse
13 years ago

Thanks Peter, so indeed you were referring to actual visibility differences on the nodes, when you said “..Asynchronous from Data Propagation Standpoint…” I thought perhaps you meant the time from when the update was first issued (including the time to replicate over and certify).

Robert – when I said cluster-wide commit, I meant at the point at which a global transaction ID is issued, then certification takes place, when I said local commit, I meant when the statement was initiated (meant to write ‘local commit issuance’) just before replication and certification. No argument the commits happen independently (good clarification to bring up) and it’s possible for them to happen at different times, just trying to figure out what was being measured here.

Robert Hodges
13 years ago

, thanks for the clarification. One of the issues with Galera overall is that the terminology is sometimes not very clear from the user docs, though of course it’s pretty clear in the heads of the Codership developers. Cheers, Robert

Vadim Tkachenko
Admin
13 years ago

Peter,

The script got totally broken,
so there is no good way to copy-paste it.

Jason
Jason
13 years ago

If you’re interested in using Percona xtraDB Cluster for production, I would highly recommend you use something else instead. Trust me on this, it will cost you dearly.

Manish
Manish
13 years ago

Hi Peter,

Nice article. Recently we shifted on following new dedicated server. From first day onwards, we were facing the performance issue. Still MYSQL load was very high but site was not going down.
Then we did some R&D on Percona MYSQL server and found that Percona need some configuration setting regarding to memory allocation. So we did some memory allocation setting in my.cnf file. On first day after doing memory allocation setting in my.cnf file, site performance was good.

But from second day again performance degradation started and now it is slow again. MySQL load is not going very high. We have also noticed that many process stay in sleeping condition for long time like 200-300 processes.

Current Server Details (Dedicated Server):

Software:
Apache Version – 2.2.15 (CentOS)
PHP Version – 5.3.3
MySQL – 5.1.66 [Percona XtraDB Cluster (GPL) (5.5.30)] (InnoDB Engine)
PHP My Admin – 5.3.3

Hardware:
Node:2
CPU = Quard Core
RAM = 12 GB
HDD = 1 TB
IP Addresses = 2

Please help me if there is something we missed.

nv
nv
12 years ago

Jason – Could you clarify what you mean? Why will it cost dearly and what would you propose instead? Percona Server?

In my experience all Percona products are quite stable and are pretty well supported (even if you don’t have the premium consulting services)

Echoinvestigations
11 years ago

We are providing services echo-investigations for business or wedding, associate people qualifications or quality.

Sing
9 years ago

Doing Sysbench test with Standalone Mysql Server we got trasactions/sec 3000 & with percona cluster we got 400/sec. Is it expected?

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved