Group Replication in Percona Server for MySQL

January 17, 2020
Author
Vadim Tkachenko
Share this Post:

group replication percona server mysqlPercona Server for MySQL 8.0.18 ships all functionality to run Group Replication and InnoDB Cluster setups, so I decided to evaluate how it works and how it compares with Percona XtraDB Cluster in some situations.

For this I planned to use three bare metal nodes, SSD drives, and a 10Gb network available for in-between nodes communication, but later I also added tests on three bare metal nodes with NVMe drives and 2x10Gb network cards.

To simplify deployment, I created simple ansible scripts.

Load Data

The first initial logical step is to load data into an empty cluster, so let’s do this with our sysbench-tpcc script.

The resulted dataset is about 100GB.

Group Replication, Load Time

The time to finish the script is 61 minutes, 19 seconds.

Let’s review how the network was loaded on a secondary node in Group Replication during the execution:

Average network traffic: 19.02 MiB/sec

PXC 5.7.28, Load Time

The time to finish the script is 39 minutes, 27 seconds.

Average network traffic: 29.81 MiB/sec

PXC 8.0.15 Experimental, Load Time

The time to finish the script is 43 minutes, 22 seconds.

Average network traffic: 27.35 MiB/sec

One Node PXC 5.7.28 Load Time

To see how PXC would perform without network interactions, I loaded data into a one node PXC cluster and it took 36 minutes, 34 seconds.  So there is a minimal network overhead for PXC 5.7 (36 minutes for one node vs 39 minutes for three nodes).

Node Joining

The next experiment I wanted to perform is to see how long it would take for a new node to join the cluster with data loaded in the previous part. The Group Replication supports two methods to catch-up: incremental (loading data from binary logs) and the clone plugin (physical copy of data).

Let’s measure time for a new node to join to catch-up with both methods:

Incremental

It took 1 hour and 52 minutes for a node to join and apply binary logs.

Incremental State Transfer in Percona XtraDB Cluster

It might not be obvious, but actually it is possible to have an incremental state transfer for big dataset changes in Percona XtraDB Cluster too, we just need to use a big enough gcache.

For testing purposes, I will set wsrep_provider_options=”gcache.size=150G” to check how long it will take to ship and apply IST in PXC.

Log extract:

In total it took 29 minutes 30 seconds to transfer and apply IST. So it was four times faster to apply IST than to apply binary logs.

Clone

I show the full log during the clone process, as it contains interesting information:

In total it took about 5 minutes for a new node to catch up, as we see Group Replication automatically Network load during clone transfer:

Network traffic was 525 MiB/sec, which about half of 10Gb network bandwidth.

Notes about Clone: It required a new node to restart, and MySQL could not do it automatically as I used a custom systemctl file, so it seems MySQL could not handle a restart. I had to perform a manual restart.

Actually, in this case, we are limited by SATA SSD read performance, that’s why we see only 525MiB/sec.

This brings up an interesting topic because although the clone plugin is working fast, it also shows that it can exhaust available resources, and if there is a live client load, likely it will be affected.

SST in PXC 5.7.28

Let’s compare how SST in Percona XtraDB Cluster performs for joining a new node.

So it took 9 minutes for SST to complete and a new node to join.

Network load during SST:

Network traffic during SST was 229 MiB/sec, which is only 25% of 10Gb network bandwidth. The reason why SST uses less network bandwidth is a bug in the SST script, where our xbstream binary does not use multi-threading.

If we apply the fix to use multiple threads:

We are back to 496MiB/sec of transfer and the total time for SST is 5 minutes, 23 seconds.

Clone and SST on NVMe Storage with 2x10Gb Network

In previous cases our transfer rate was limited by SATA SSD read performance, so let’s see if we are able to achieve a faster transfer time when more resources are available. The data is stored on NVMe devices and for the network, I used a 2x10Gb network connection.

The raw network throughput I am able to achieve with iPerf3 is 17.8 Gbits/sec (or 2.23 GiB/sec).

Clone plugin:

So the clone plugin used all available network bandwidth in 2.22GiB/sec, finishing the transfer four times faster than with SATA SSD and 10Gb network.

As for PXC SST, without a fix for xbstream:

There I see only 909MiB/sec.

And with the fix for xbstream:

In this case, we get 1.26GiB/sec, which is noticeably slower than the clone plugin. I do not know yet why with SST we can get to the 2GiB/sec throughput.

Conclusions

The clone plugin is really a fast way to transfer data, and in my experiments was faster than SST in Percona XtraDB Cluster. The only downside it could not join a new node without a manual restart.

The incremental update was really slow and not the best way to perform a node join, but Group Replication chose this way by default even though it was not the optimal decision.

Both technologies are sensitive to available hardware, and the hardware upgrade (storage, network) could be viable options to improve performance.

As for loading data, the 3-node Group replication cluster was slower than the Percona XtraDB Cluster: 61 minutes for Group Replication and 39 minutes for PXC.

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved