Making Sense of MySQL Group Replication Consistency Levels

MySQL Group Replication ConsistencyFrom the initial release, one of the biggest complaints I had about Group Replication is that it allowed “stale” reads and there was no way to prevent them or to even know that you read “stale” data. That was a huge limitation. Thankfully, Oracle released features to control the consistency levels, and it was exactly a year ago! I don’t know about you, but I personally was confused by naming it group_replication_consistency=’AFTER’ or ‘BEFORE’.

So now I want to try to make sense of it and share my understanding (even if it is one year later).

Setup:

We will start with the default group_replication_consistency=’EVENTUAL’ and work from there. So let’s consider a very simple table:

With over 10mln rows:

And we will do a very simple action. On Node 1 we will update the table and on Node 2 we will read data.

Node 1:

And immediately after that on the Node 2:

There are a few points to note:

  • On Node 1 it took 1min 20 sec for the transaction to execute
  • On Node 2 we got the result immediately, but we essentially got a “stale” read in that data was already updated, but we got an old version of it.

How can we get a better outcome:

group_replication_consistency=’BEFORE’;

Let’s look into the consistency level ‘BEFORE’. It says that a transaction on Node 2 will wait until the previously committed transaction on the Node 1 transaction is committed also on Node 2. So let’s change the set session to group_replication_consistency=’BEFORE’; on Node 2 and execute a similar update again.

Node 1 (still group_replication_consistency=’EVENTUAL’;):

And after this, on Node 2 (with set session group_replication_consistency=’BEFORE’;):

So there are very notable changes:

  • The transaction on Node 2 returned the correct result now.
  • But now it took 1 min 11 sec to return it (instead of 0sec as in the previous case). Basically, the transaction waited on the transaction from Node 1 to be applied on Node 2
  • The execution time on Node 1 did not change.

This mode allowed us to do exactly what we wanted – prevent stale reads. Great outcome! So what about group_replication_consistency=’AFTER’ ?

group_replication_consistency=’AFTER’;

This mode says that a transaction on Node 1 will not return OK until it makes sure that other nodes applied the transaction. To see it in action, we will put Node 1 into group_replication_consistency=’AFTER’ and Node 2 into group_replication_consistency=’EVENTUAL’;.

Node 1 (in group_replication_consistency=’AFTER’):

After that, Node 2 (in group_replication_consistency=’EVENTUAL’):

Here the situation is different. Now the execution time on Node 1 doubled, as the transaction waits to be committed on all nodes and after that on Node 2 the execution is immediate.

This mode still avoids “stale” reads, but in this case, we shifted the wait time from Node 2 to Node 1, and this is how we can view the difference between ‘BEFORE’ and ‘AFTER’ consistency modes in Group Replication.

Both modes provide a consistent view, but:

  • In ‘BEFORE’ mode: the readers on secondary nodes will be blocked, waiting on the moment when the consistent view is available, and
  • In ‘AFTER’ mode the writers are blocked till other nodes get a consistent view.

So which mode to choose? Actually I think it is good to have an option here. You can choose, if you want, to put wait time on your readers or on your writers; the decision is up to how your application is designed.

Share this post

Comments (4)

  • lefred Reply

    Thank you for the post Vadim. And indeed it depends of the application. If you do a majority of reads and a smaller amount of writes AND you want to avoid stale reads, then you accept to pay the price at write time and use AFTER. We also have 2 others modes : BEFORE_AND_AFTER (and you pay each time, we are very close to synchronous replication, even it’s not yet) and also BEFORE_ON_PRIMARY_FAILOVER, which works as EVENTUAL but in case of failover it insure that the new PRIMARY applied everything before acting as so.
    Something very nice is that you can accept to pay the price only when it’s really needed and switch the consistency level you need per session.

    January 23, 2020 at 1:24 pm
  • Peter Zaitsev Reply

    Very interesting,

    It also shows what MySQL Group Replication does not have pipelining (or whatever right term would be) – the statement which updates lots of data is not started on other nodes until it is completed on the first node hence you have more than double latency for data propagation.

    Also if query itself should take 1m 20 sec on the node to execute and we have 3min for replication this means we also have added 20sec overhead of transferring the data

    January 23, 2020 at 6:34 pm

Leave a Reply