Group Replication and Percona XtraDB Cluster: Overview of Common Operations

In this blog post I would like to give an overview of the most common failover scenarios and operations when using MySQL Group Replication 8.0.19 (aka GR) and Percona XtraDB Cluster 8 (PXC) (which is based on Galera), and explain how each technology handles each situation. I have created a three-node cluster with Group Replication using a single Primary and a three-node PXC, both with default settings. I am also going to use ProxySQL to interface with both clusters.

In both clusters, the name of the nodes are mysql1, mysql2,  and mysql3. In Group Replication the Primary node is where the writes go if we are using single primary configuration. In PXC, I will use the same term as well and will call the node Primary where I am sending the writes. Just to note, in PXC there is no concept of primary node, all nodes are equal.

This is a rough representation of how the setup looks like for both solutions.

Group Replication

Primary Server Crashes

primary server crashes

Group Replication – Writing

In this test, I only send write queries to the cluster. When I killed the Primary server on GR, it took between 5-15s to reorganize the topology and for ProxySQL to send the writes to the new Primary. Starting the old Primary and adding it back to the cluster did not cause any outages.

Group Replication – Reading

What if I only send read queries to the cluster, crashing the primary will cause any outage in reads? ProxySQL will simply redirect the traffic to other nodes. The cluster is not going to be blocked during the reorganization.

Percona XtraDB Cluster – Writing/Reading

In PXC there is no difference between reading and writing, once a node crashes/goes away/gets separated, the cluster has to re-create the cluster view and check the quorum. While doing that, it does not accept any reads or writes. Usually, this takes 3-10s, in this time frame the application is impacted.

Removing/Adding Node

How the clusters do if we remove or add a new node.

Group Replication

In GR, adding or removing a node is not going to impact or cause any outage in the application. If we add a new node with the clone plugin the cluster will propagate the data to the new node.

Percona XtraDB Cluster

Removing or adding a node will not cause any outages. Similarly, like in GR when we add a new node, it is going to perform an SST (State Snapshot Transfer) to get all the data from another node.

Partial Network Failure

What happens with the cluster if a reader node gets separated from the Primary but it is still able to see other nodes?

Partial Network Failure

In this case, there is a network outage between the mysql2 (Primary) and mysql3.

Group Replication

In my previous blog post, MySQL Group Replication – Partial Network Failure Performance Impact, I explained this special case in more detail. Basically a partial network outage can seriously impact the write performance in the cluster which can lead to application issues and/or downtime.

Percona XtraDB Cluster

In PXC there is going to be a 3-5s outage while the cluster re-creates the cluster view and begins relaying the traffic to a node that sees that server. After that, it will continue working just like before without any serious performance impact.

Total Network Isolation

Total Network Isolation

Now, mysql3 is totally separated from all the other nodes.

Group Replication

The cluster can accept reads and writes without any outage, ProxySQL is going to redirect the reads to the other nodes.

Percona XtraDB Cluster

On PXC there is going to be a 3-5 second outage while the cluster realizes a node is not available and will re-create the cluster view as above. After that, it is able to process reads and writes.

Local Applications

Local Applications

What happens if a node or part of the nodes are separated and they do not have the quorum, but they have the application server in the same network segments which could still connect to the server.

Group Replication

The separated nodes are still going to accept read traffic, so the application could make decisions based on outdated data. That is the default settings but you can configure that with the variable called group_replication_exit_state_action.

Percona XtraDB Cluster

In PXC, if a node gets separated, it is not going to accept any reads or writes. The priority is the data consistency and only the segment which has the quorum will accept any reads and writes.

Changing Primary

Group Replication

If you would like to use a new Primary node you have to promote a reader to be the new primary:

ProxySQL will follow the changes but it is going to cause a few seconds of outages while the cluster reorganizes itself.

Percona XtraDB Cluster

There is no concept of Primary on PXC, any node can take writes any time, so we only have to redirect the traffic to another node in our load-balancer (ie: ProxySQL). There is also a pxc_maint_mode variable in PXC. Changing that to MAINTENANCE would soft remove the connection from the node even if that is the Primary, but this is poorly supported in ProxySQL Native Galera support. I would recommend using the 1.4 scheduler which respects this variable.

Summary

Group ReplicationPercona XtraDB Cluster
Primary Crashes5-15s outage5-10s outage
Reader CrashesNo impact3-5s outage
Adding a NodeNo impactNo impact
Removing a NodeNo impactNo impact
Partial Network FailurePerformance Impact3-5s outage, than normal performance
Total Network IsolationNo impact3-5s outage
Changing Primary1-3s outageNo impact on the cluster

Group Replication has less impact if a reader node goes down or gets separated. In PXC, because all of the nodes are the same, there is no dedicated Primary; if anything happens with any of the nodes, the cluster has to vote and re-create the cluster view which can have some impact on the applications. However, PXC handles primary promotions and network failures better.

As we can see, both cluster solutions have their own pros and cons. I hope this summary will help you to understand them a bit more and make the decisions on which technology to use a bit easier.

Share this post

Leave a Reply