Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Investigating MySQL Replication Latency in Percona XtraDB Cluster

March 4, 2013

Author

Share this Post:

I was curious to check how Percona XtraDB Cluster behaves in terms of replication latency (or data propagation latency). Specifically, I wanted to see if stale reads could occur on other nodes immediately after a write.

To test this, I wrote a simple script (included at the end) that:

Writes to one node
Immediately reads from another node
Retries until data propagates
Measures latency

The setup included 3 cluster nodes (DPE1, DPE2, DPE3) connected via a 1Gbit network, with tests run from a separate client server.

Baseline (No Load)

Summary: 94 out of 10000 rounds (0.94%)  Delay distribution: Min: 0.71 ms; Max: 2.16 ms Avg: 0.89 ms

1	Summary: 94 out of 10000 rounds (0.94%) Delay distribution: Min: 0.71 ms; Max: 2.16 ms Avg: 0.89 ms

Key observations:

Replication is asynchronous from a propagation standpoint
Less than 1% inconsistency observed
Average delay under 1ms

With Load on Write Node (DPE1)

sysbench --test=oltp --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32 --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=3000 run

1	sysbench --test=oltp --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32 --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=3000 run

Summary: 3901 out of 10000 rounds (39.01%) Delay distribution: Min: 0.66 ms; Max: 201.36 ms Avg: 3.81 ms
Summary: 3893 out of 10000 rounds (38.93%) Delay distribution: Min: 0.66 ms; Max: 42.9 ms Avg: 3.76 ms

1 2	Summary: 3901 out of 10000 rounds (39.01%) Delay distribution: Min: 0.66 ms; Max: 201.36 ms Avg: 3.81 ms Summary: 3893 out of 10000 rounds (38.93%) Delay distribution: Min: 0.66 ms; Max: 42.9 ms Avg: 3.76 ms

Results:

~40% inconsistency rate
Average delay still only a few milliseconds

Load on Read Node (DPE2)

Summary: 3747 out of 10000 rounds (37.47%) Delay distribution: Min: 0.86 ms; Max: 108.15 ms Avg: 8.62 ms
Summary: 3721 out of 10000 rounds (37.21%) Delay distribution: Min: 0.81 ms; Max: 291.81 ms Avg: 8.54 ms

1 2	Summary: 3747 out of 10000 rounds (37.47%) Delay distribution: Min: 0.86 ms; Max: 108.15 ms Avg: 8.62 ms Summary: 3721 out of 10000 rounds (37.21%) Delay distribution: Min: 0.81 ms; Max: 291.81 ms Avg: 8.54 ms

Observation:

Similar inconsistency frequency
Higher latency → read-side load impacts propagation more

Write-Heavy Workload

sysbench --test=oltp --oltp-test-mode=nontrx --oltp-nontrx-mode=update_key --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32 --init-rng=on --max-requests=0 --max-time=3000 run

1	sysbench --test=oltp --oltp-test-mode=nontrx --oltp-nontrx-mode=update_key --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32 --init-rng=on --max-requests=0 --max-time=3000 run

Summary: 1062 out of 10000 rounds (10.62%) Delay distribution: Min: 0.71 ms; Max: 285.07 ms Avg: 3.21 ms
Summary: 1113 out of 10000 rounds (11.13%) Delay distribution: Min: 0.81 ms; Max: 275.94 ms Avg: 5.06 ms

1 2	Summary: 1062 out of 10000 rounds (10.62%) Delay distribution: Min: 0.71 ms; Max: 285.07 ms Avg: 3.21 ms Summary: 1113 out of 10000 rounds (11.13%) Delay distribution: Min: 0.81 ms; Max: 275.94 ms Avg: 5.06 ms

Surprisingly better than mixed workload (~11% inconsistency).

Write Load on Read Node

Summary: 5349 out of 10000 rounds (53.49%) Delay distribution: Min: 0.81 ms; Max: 519.61 ms Avg: 5.02 ms
Summary: 5355 out of 10000 rounds (53.55%) Delay distribution: Min: 0.81 ms; Max: 526.95 ms Avg: 5.06 ms

1 2	Summary: 5349 out of 10000 rounds (53.49%) Delay distribution: Min: 0.81 ms; Max: 519.61 ms Avg: 5.02 ms Summary: 5355 out of 10000 rounds (53.55%) Delay distribution: Min: 0.81 ms; Max: 526.95 ms Avg: 5.06 ms

Worst-case scenario:

Over 50% inconsistency
Outliers up to 500ms+

Load on Unused Node (DPE3)

Summary: 833 out of 10000 rounds (8.33%) Delay distribution: Min: 0.66 ms; Max: 353.61 ms Avg: 2.76 ms

1	Summary: 833 out of 10000 rounds (8.33%) Delay distribution: Min: 0.66 ms; Max: 353.61 ms Avg: 2.76 ms

Minimal impact as expected.

Synchronous Reads Option

SET GLOBAL wsrep_causal_reads=1;

1	SET GLOBAL wsrep_causal_reads=1;

This ensures reads wait for replication, providing full consistency.

Large Transaction Impact

UPDATE sbtest2 SET k=k+1;

1	UPDATE sbtest2 SET k=k+1;

Summary: 452 out of 10000 rounds (4.52%) Delay distribution: Min: 0.66 ms; Max: 46526.7 ms Avg: 104.28 ms

1	Summary: 452 out of 10000 rounds (4.52%) Delay distribution: Min: 0.66 ms; Max: 46526.7 ms Avg: 104.28 ms

Key issue:

Large transactions cause massive stalls (up to ~45 seconds)
Due to certification + replication overhead

Observed Stall Behavior

per-request statistics:
min: 0.69ms
avg: 1.12ms
max: 52334.76ms

per-request statistics:

min: 0.69ms

avg: 1.12ms

max: 52334.76ms

Certification can stall unrelated operations across nodes.

Summary

Excellent performance for small transactions
Low latency under normal conditions
Synchronous reads available when needed
Large transactions can cause severe stalls

Recommendation: Understand transaction size and tolerance for latency when designing applications on Percona XtraDB Cluster.

Appendix: Test Script

$writer_host="dpe01";
$reader_host="dpe02";
$user="test";
$password="test";
$table="test.sbtest";

$increment=2;
$offset=1;
$max_id=1000;
$rounds=10000;

$writer=new mysqli($writer_host,$user,$password);
$reader=new mysqli($reader_host,$user,$password);

$total_delay=0;
$min_delay=100000000;
$max_delay=0;
$delays=0;
$sum_delay=0;

for($val=0; $val<$rounds;$val++) {
  $id=rand(1,$max_id);
  $id=floor($id/$increment)*$increment+$offset;
  $writer->query("UPDATE $table set k=$val where id=$id");

  $tw=microtime(true);
  $retries=0;

  while(true) {
    $result=$reader->query("SELECT k from $table where id=$id");
    $row=$result->fetch_row();

    if ($row[0]!=$val)
      $retries++;
    else {
      $tr=microtime(true);
      break;
    }
    $result->close();
  }

  if ($retries!=0) {
    $delay=round(($tr-$tw)*1000,2);
    $delays++;
    $sum_delay+=$delay;
    $min_delay=min($min_delay,$delay);
    $max_delay=max($max_delay,$delay);
  }
}

$writer_host="dpe01";

$reader_host="dpe02";

$user="test";

$password="test";

$table="test.sbtest";

$increment=2;

$offset=1;

$max_id=1000;

$rounds=10000;

$writer=new mysqli($writer_host,$user,$password);

$reader=new mysqli($reader_host,$user,$password);

$total_delay=0;

$min_delay=100000000;

$max_delay=0;

$delays=0;

$sum_delay=0;

for($val=0; $val<$rounds;$val++) {

$id=rand(1,$max_id);

$id=floor($id/$increment)*$increment+$offset;

$writer->query("UPDATE $table set k=$val where id=$id");

$tw=microtime(true);

$retries=0;

while(true) {

$result=$reader->query("SELECT k from $table where id=$id");

$row=$result->fetch_row();

if ($row[0]!=$val)

$retries++;

else {

$tr=microtime(true);

break;

}

$result->close();

}

if ($retries!=0) {

$delay=round(($tr-$tw)*1000,2);

$delays++;

$sum_delay+=$delay;

$min_delay=min($min_delay,$delay);

$max_delay=max($max_delay,$delay);

}

Appendix: PXC Configuration

wsrep_node_address=10.9.9.1
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_cluster_address=gcomm://10.9.9.1,10.9.9.2,10.9.9.3
wsrep_slave_threads=8
wsrep_sst_method=xtrabackup
wsrep_cluster_name=DPE
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1

wsrep_node_address=10.9.9.1

wsrep_provider=/usr/lib/libgalera_smm.so

wsrep_cluster_address=gcomm://10.9.9.1,10.9.9.2,10.9.9.3

wsrep_slave_threads=8

wsrep_sst_method=xtrabackup

wsrep_cluster_name=DPE

binlog_format=ROW

default_storage_engine=InnoDB

innodb_autoinc_lock_mode=2

innodb_locks_unsafe_for_binlog=1

0 0 votes

Article Rating

16 Comments

Oldest

Newest Most Voted

Tobias

13 years ago

Are there any benchmarks showing the difference on TPS or commit time for setting wsrep_causal_reads to 0 or 1? I did not see any benchmark stating which values for wsrep_causal_reads are used, including the benchmarks done by percona.

Author

Peter Zaitsev

13 years ago

Tobias,

I have not seen much. This is one thing I was going to take a look at 🙂

Robert Hodges

13 years ago

Hi Peter, thanks so much for this article. I was planning to do exactly the same sort of test as part of preparing for my multi-master talk at the Percona Live conference. This behavior, especially for long transactions, is very much expected so it’s great to see some proof from such a simple benchmark. Another interesting thing to try is generating distributed deadlock through a table hotspot that creates a rollback when you update simultaneously across multiple nodes. I would expect this to become a bigger problem with workloads that (a) have a lot of updates on a small set of rows and (b) as the overall load increases, since this makes the deadlock window bigger when update propagation is delayed in any way. I’ll post my results unless you beat me to it.

mike morse

13 years ago

Hi Peter,

Great you’re doing tests on the intricacies of Galera, we use it in production, and overall very satisfied with it, but more use cases and advanced testing is needed.

Could you clarify your meaning in your statement,

“First Replication by default in Percona XtraDB Cluster is Asynchronous from Data Propagation Standpoint – it takes time (though short one in this case) for changes committed on the one node to become visible to the other.”

I want to confirm you are you saying your test is showing a delay simply between the local commit (not visibility) of the update on one node vs. the visibility on the second node (all nodes however would read exactly the same lag of commit -> visibility when the certification and then cluster wide commit is issued).

Or are you saying it shows data inconsistency between the two nodes in terms of the actual visibility (inconsistency in the cluster wide commit)?

Robert Hodges

13 years ago

‘@Mike, what do you mean by a cluster-wide commit? Galera just ensures the transactions get to the DBMS nodes, which certify and commit independently. In this case you can get some differences between when transactions show up on different nodes. In fact, it should be possible for transactions to commit and become visible on replicas before they are visible on the originating master if the master commits more slowly. This could happen if there are differences in the speed of file systems across hosts. (Another interesting case to test.)

Author

Peter Zaitsev

13 years ago

Mike – what I mean is what there is the delay between data is committed at DPE1 and it become visible at DPE2. I have not tested it but I assume from architecture design there is no guaranty what it will be visible at DPE2 and DPE3 at the same time.

If you need to ensure you never read stale data you should be using wsrep_causal_reads=1

mike morse

13 years ago

Thanks Peter, so indeed you were referring to actual visibility differences on the nodes, when you said “..Asynchronous from Data Propagation Standpoint…” I thought perhaps you meant the time from when the update was first issued (including the time to replicate over and certify).

Robert – when I said cluster-wide commit, I meant at the point at which a global transaction ID is issued, then certification takes place, when I said local commit, I meant when the statement was initiated (meant to write ‘local commit issuance’) just before replication and certification. No argument the commits happen independently (good clarification to bring up) and it’s possible for them to happen at different times, just trying to figure out what was being measured here.

Robert Hodges

13 years ago

‘@Mike, thanks for the clarification. One of the issues with Galera overall is that the terminology is sometimes not very clear from the user docs, though of course it’s pretty clear in the heads of the Codership developers. Cheers, Robert

Admin

Vadim Tkachenko

13 years ago

Peter,

The script got totally broken,
so there is no good way to copy-paste it.

Author

Peter Zaitsev

13 years ago

Vadim,

Ah… Still the highlighting plugin problems. I have changed it to blockquote hope it is easy now

Jason

13 years ago

If you’re interested in using Percona xtraDB Cluster for production, I would highly recommend you use something else instead. Trust me on this, it will cost you dearly.

Manish

13 years ago

Hi Peter,

Nice article. Recently we shifted on following new dedicated server. From first day onwards, we were facing the performance issue. Still MYSQL load was very high but site was not going down.
Then we did some R&D on Percona MYSQL server and found that Percona need some configuration setting regarding to memory allocation. So we did some memory allocation setting in my.cnf file. On first day after doing memory allocation setting in my.cnf file, site performance was good.

But from second day again performance degradation started and now it is slow again. MySQL load is not going very high. We have also noticed that many process stay in sleeping condition for long time like 200-300 processes.

Current Server Details (Dedicated Server):

Software:
Apache Version – 2.2.15 (CentOS)
PHP Version – 5.3.3
MySQL – 5.1.66 [Percona XtraDB Cluster (GPL) (5.5.30)] (InnoDB Engine)
PHP My Admin – 5.3.3

Hardware:
Node:2
CPU = Quard Core
RAM = 12 GB
HDD = 1 TB
IP Addresses = 2

Please help me if there is something we missed.

Author

Peter Zaitsev

13 years ago

Manish,
Blog post comment is not the best place for asking questions specific to your case. I’d encourage you to ask the question on our forums instead https://www.percona.com/forums/questions-discussions/percona-xtradb-cluster