September 19, 2014

Measuring Max Replication Throughput on Percona XtraDB Cluster with wsrep_desync

Checking throughput with async MySQL replication

Replication throughput is the measure of just how fast the slaves can apply replication (at least by my definition).  In MySQL async replication this is important to know because the single-threaded apply nature of async replication can be a write performance bottleneck.  In a production system, we can tell how fast the slave is currently running (applying writes), and we might have historical data to check for the most throughput ever seen, but that doesn’t give us a solid way of determining where we stand right NOW().

An old consulting trick to answer this question is to simply stop replicating on your slave for a minute, (usually just the SQL_THREAD), restart it and watch how long it takes to catch up.  We can also watch the slave thread apply rate during this interval to get a sense of just how many writes per second we can do and compare that with the normal rate (during peak hours, for example).  This can be a handy way of quickly assessing how close you are to our maximum theoretical throughput.

But what about with PXC and Galera?  This is easy on async because the master doesn’t care, but to be able to do this on PXC we need a way to intentionally lag a node without hanging or causing flow control on the rest of the cluster.  And as it turns out, as of version 5.5.33, there’s a pretty easy way.

Measuring an average apply rate on PXC

First we need to pick a node that is not taking reads or writes (or shift some traffic away from one that is).  We’re assuming reads and writes are happening on the rest of the cluster normally, and probably also that the node we chose has pretty similar hardware to every other node.  Once we have this, we can use myq_status to see replication coming into the node and being applied:

If we check the rate of growth of wsrep_last_committed over a full minute we can see:

So we’re averaging 5.1k TPS applying on this node (and across the whole cluster). But how much can we handle at peak?

Measuring Max Replication throughput on PXC

In another window on that same node, we execute this SQL (all at once):

We’ve desynced the node, locked writes to all tables, and checked the last seqno we committed on this node (665368). The wsrep_desync state tells this node to enter the Donor/Desynced state, which means it will not send Flow Control to the rest of the cluster if its queue gets backlogged.

Then we proceed to take a read lock on all tables, pausing the Galera applier.  Then sleep 60 seconds, unlock the tables and wait to see how long it recovers.

Once the initial FTWRL happens, we can immediately see the node drop to the Donor/Desynced state and watch replication start to queue up on the node:

The rest of our cluster is operating normally here.

A minute later the queue is backlogged to almost 350k transactions. Then the lock is released, and Galera starts to apply that queue as quickly as possible:

We can see right away that our ‘Ops Dn’ is much higher: peaking at 13k, but how can we get a good average? Let’s watch it catch all the way up:

So, it took this node 50 seconds to catch up again. Right at 20:20:05 when the queue zeroed out, I checked wsrep_last_committed again:

Be sure to turn off wsrep_desync when we are done and caught up!  Note you can turn off wsrep_desync right away, but that puts the node into the JOINED state which does limited flow control to help the node catch up.  We want our sample to be unbiased by flow control (at least from this node).

So the node drops back into the ‘Synced’ state and FC applies again:

Conclusion

So in 50 seconds, the node was able to apply 667183 transactions (difference between the two wsrep_last_seqno) which comes out to 13.3k tps apply capacity (at least sustained for 1 minute). This tells us we’re around 38% capacity for write throughput. Is that a perfect number? Maybe not, but it at least gives you a rough idea.

However, the point is that thanks to wsrep_desync we can measure this safely within a synchronous replication environment that may normally not allow this type of operation.

About Jay Janssen

Jay joined Percona in 2011 after 7 years at Yahoo working in a variety of fields including High Availability architectures, MySQL training, tool building, global server load balancing, multi-datacenter environments, operationalization, and monitoring. He holds a B.S. of Computer Science from Rochester Institute of Technology.

Comments

  1. Alex says:

    Jay, another amazing insight into Galera internals and application of wsrep_desync, which, I bet, none of us would have thought of! (y)

Speak Your Mind

*