
Investigating MySQL Replication Latency in Percona XtraDB Cluster
I was curious to check how Percona XtraDB Cluster behaves in terms of replication latency (or data propagation latency). Specifically, I wanted to see if stale reads could occur on other nodes immediately after a write.
To test this, I wrote a simple script (included at the end) that:
The setup included 3 cluster nodes (DPE1, DPE2, DPE3) connected via a 1Gbit network, with tests run from a separate client server.
|
1 |
Summary: 94 out of 10000 rounds (0.94%) Delay distribution: Min: 0.71 ms; Max: 2.16 ms Avg: 0.89 ms |
Key observations:
|
1 |
sysbench --test=oltp --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32 --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=3000 run |
|
1 2 |
Summary: 3901 out of 10000 rounds (39.01%) Delay distribution: Min: 0.66 ms; Max: 201.36 ms Avg: 3.81 ms Summary: 3893 out of 10000 rounds (38.93%) Delay distribution: Min: 0.66 ms; Max: 42.9 ms Avg: 3.76 ms |
Results:
|
1 2 |
Summary: 3747 out of 10000 rounds (37.47%) Delay distribution: Min: 0.86 ms; Max: 108.15 ms Avg: 8.62 ms Summary: 3721 out of 10000 rounds (37.21%) Delay distribution: Min: 0.81 ms; Max: 291.81 ms Avg: 8.54 ms |
Observation:
|
1 |
sysbench --test=oltp --oltp-test-mode=nontrx --oltp-nontrx-mode=update_key --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32 --init-rng=on --max-requests=0 --max-time=3000 run |
|
1 2 |
Summary: 1062 out of 10000 rounds (10.62%) Delay distribution: Min: 0.71 ms; Max: 285.07 ms Avg: 3.21 ms Summary: 1113 out of 10000 rounds (11.13%) Delay distribution: Min: 0.81 ms; Max: 275.94 ms Avg: 5.06 ms |
Surprisingly better than mixed workload (~11% inconsistency).
|
1 2 |
Summary: 5349 out of 10000 rounds (53.49%) Delay distribution: Min: 0.81 ms; Max: 519.61 ms Avg: 5.02 ms Summary: 5355 out of 10000 rounds (53.55%) Delay distribution: Min: 0.81 ms; Max: 526.95 ms Avg: 5.06 ms |
Worst-case scenario:
|
1 |
Summary: 833 out of 10000 rounds (8.33%) Delay distribution: Min: 0.66 ms; Max: 353.61 ms Avg: 2.76 ms |
Minimal impact as expected.
|
1 |
SET GLOBAL wsrep_causal_reads=1; |
This ensures reads wait for replication, providing full consistency.
|
1 |
UPDATE sbtest2 SET k=k+1; |
|
1 |
Summary: 452 out of 10000 rounds (4.52%) Delay distribution: Min: 0.66 ms; Max: 46526.7 ms Avg: 104.28 ms |
Key issue:
|
1 2 3 4 |
per-request statistics: min: 0.69ms avg: 1.12ms max: 52334.76ms |
Certification can stall unrelated operations across nodes.
Recommendation: Understand transaction size and tolerance for latency when designing applications on Percona XtraDB Cluster.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
$writer_host="dpe01"; $reader_host="dpe02"; $user="test"; $password="test"; $table="test.sbtest"; $increment=2; $offset=1; $max_id=1000; $rounds=10000; $writer=new mysqli($writer_host,$user,$password); $reader=new mysqli($reader_host,$user,$password); $total_delay=0; $min_delay=100000000; $max_delay=0; $delays=0; $sum_delay=0; for($val=0; $val<$rounds;$val++) { $id=rand(1,$max_id); $id=floor($id/$increment)*$increment+$offset; $writer->query("UPDATE $table set k=$val where id=$id"); $tw=microtime(true); $retries=0; while(true) { $result=$reader->query("SELECT k from $table where id=$id"); $row=$result->fetch_row(); if ($row[0]!=$val) $retries++; else { $tr=microtime(true); break; } $result->close(); } if ($retries!=0) { $delay=round(($tr-$tw)*1000,2); $delays++; $sum_delay+=$delay; $min_delay=min($min_delay,$delay); $max_delay=max($max_delay,$delay); } } |
|
1 2 3 4 5 6 7 8 9 10 |
wsrep_node_address=10.9.9.1 wsrep_provider=/usr/lib/libgalera_smm.so wsrep_cluster_address=gcomm://10.9.9.1,10.9.9.2,10.9.9.3 wsrep_slave_threads=8 wsrep_sst_method=xtrabackup wsrep_cluster_name=DPE binlog_format=ROW default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 innodb_locks_unsafe_for_binlog=1 |
Resources
RELATED POSTS
Are there any benchmarks showing the difference on TPS or commit time for setting wsrep_causal_reads to 0 or 1? I did not see any benchmark stating which values for wsrep_causal_reads are used, including the benchmarks done by percona.
Tobias,
I have not seen much. This is one thing I was going to take a look at 🙂
Hi Peter, thanks so much for this article. I was planning to do exactly the same sort of test as part of preparing for my multi-master talk at the Percona Live conference. This behavior, especially for long transactions, is very much expected so it’s great to see some proof from such a simple benchmark. Another interesting thing to try is generating distributed deadlock through a table hotspot that creates a rollback when you update simultaneously across multiple nodes. I would expect this to become a bigger problem with workloads that (a) have a lot of updates on a small set of rows and (b) as the overall load increases, since this makes the deadlock window bigger when update propagation is delayed in any way. I’ll post my results unless you beat me to it.
Hi Peter,
Great you’re doing tests on the intricacies of Galera, we use it in production, and overall very satisfied with it, but more use cases and advanced testing is needed.
Could you clarify your meaning in your statement,
“First Replication by default in Percona XtraDB Cluster is Asynchronous from Data Propagation Standpoint – it takes time (though short one in this case) for changes committed on the one node to become visible to the other.”
I want to confirm you are you saying your test is showing a delay simply between the local commit (not visibility) of the update on one node vs. the visibility on the second node (all nodes however would read exactly the same lag of commit -> visibility when the certification and then cluster wide commit is issued).
Or are you saying it shows data inconsistency between the two nodes in terms of the actual visibility (inconsistency in the cluster wide commit)?
‘@Mike, what do you mean by a cluster-wide commit? Galera just ensures the transactions get to the DBMS nodes, which certify and commit independently. In this case you can get some differences between when transactions show up on different nodes. In fact, it should be possible for transactions to commit and become visible on replicas before they are visible on the originating master if the master commits more slowly. This could happen if there are differences in the speed of file systems across hosts. (Another interesting case to test.)
Mike – what I mean is what there is the delay between data is committed at DPE1 and it become visible at DPE2. I have not tested it but I assume from architecture design there is no guaranty what it will be visible at DPE2 and DPE3 at the same time.
If you need to ensure you never read stale data you should be using wsrep_causal_reads=1
Thanks Peter, so indeed you were referring to actual visibility differences on the nodes, when you said “..Asynchronous from Data Propagation Standpoint…” I thought perhaps you meant the time from when the update was first issued (including the time to replicate over and certify).
Robert – when I said cluster-wide commit, I meant at the point at which a global transaction ID is issued, then certification takes place, when I said local commit, I meant when the statement was initiated (meant to write ‘local commit issuance’) just before replication and certification. No argument the commits happen independently (good clarification to bring up) and it’s possible for them to happen at different times, just trying to figure out what was being measured here.
‘@Mike, thanks for the clarification. One of the issues with Galera overall is that the terminology is sometimes not very clear from the user docs, though of course it’s pretty clear in the heads of the Codership developers. Cheers, Robert
Peter,
The script got totally broken,
so there is no good way to copy-paste it.
Vadim,
Ah… Still the highlighting plugin problems. I have changed it to blockquote hope it is easy now
If you’re interested in using Percona xtraDB Cluster for production, I would highly recommend you use something else instead. Trust me on this, it will cost you dearly.
Hi Peter,
Nice article. Recently we shifted on following new dedicated server. From first day onwards, we were facing the performance issue. Still MYSQL load was very high but site was not going down.
Then we did some R&D on Percona MYSQL server and found that Percona need some configuration setting regarding to memory allocation. So we did some memory allocation setting in my.cnf file. On first day after doing memory allocation setting in my.cnf file, site performance was good.
But from second day again performance degradation started and now it is slow again. MySQL load is not going very high. We have also noticed that many process stay in sleeping condition for long time like 200-300 processes.
Current Server Details (Dedicated Server):
Software:
Apache Version – 2.2.15 (CentOS)
PHP Version – 5.3.3
MySQL – 5.1.66 [Percona XtraDB Cluster (GPL) (5.5.30)] (InnoDB Engine)
PHP My Admin – 5.3.3
Hardware:
Node:2
CPU = Quard Core
RAM = 12 GB
HDD = 1 TB
IP Addresses = 2
Please help me if there is something we missed.
Manish,
Blog post comment is not the best place for asking questions specific to your case. I’d encourage you to ask the question on our forums instead https://www.percona.com/forums/questions-discussions/percona-xtradb-cluster
Jason – Could you clarify what you mean? Why will it cost dearly and what would you propose instead? Percona Server?
In my experience all Percona products are quite stable and are pretty well supported (even if you don’t have the premium consulting services)
We are providing services echo-investigations for business or wedding, associate people qualifications or quality.
Doing Sysbench test with Standalone Mysql Server we got trasactions/sec 3000 & with percona cluster we got 400/sec. Is it expected?