In this blog, I want to evaluate Group Replication Scaling capabilities in cases when we increase the number of nodes and increase user connections.
For testing, I will deploy multi-node bare metal servers, where each node and client are dedicated to an individual server and connected between themselves by a 10Gb network.
Also, I will use 3-nodes and 5-nodes Group Replication setup.
Hardware specifications:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |      System | Supermicro; SYS-F619P2-RTN; v0123456789 (Other) Service Tag | S292592X0110239C    Platform | Linux     Release | Ubuntu 18.04.4 LTS (bionic)      Kernel | 5.3.0-42-generic Architecture | CPU = 64-bit, OS = 64-bit   Threading | NPTL 2.27     SELinux | No SELinux detected Virtualized | No virtualization detected # Processor ##################################################  Processors | physical = 2, cores = 40, virtual = 80, hyperthreading = yes      Models | 80xIntel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz      Caches | 80x28160 KB # Memory #####################################################       Total | 187.6G | 
For the benchmark, I use sysbench-tpcc 1000W prepared database as:
| 1 | ./tpcc.lua --mysql-host=172.16.0.11 --mysql-user=sbtest --mysql-password=sbtest --mysql-db=sbtest --time=300 --threads=64 --report-interval=1 --tables=10 --scale=100 --db-driver=mysql --use_fk=0 --force_pk=1 --trx_level=RC prepare | 
The configs, scripts, and raw results are available on our GitHub.
The workload is “in-memory,” that is, data (about 100GB) should fit into innodb_buffer_pool (also 100GB).
For the MySQL version, I use MySQL 8.0.19.
Results
Let’s review the results I’ve got. First, let’s take a look at how performance changes when we increase user threads from 1 to 256 for 3 nodes.

Interesting to see how the results become unstable when we increase the number of threads. To view it in more detail, let’s draw the chart with the individual scales for each set of threads:

As we can see, there are a lot of variations for threads starting with 64. Let’s check 64 and 128 threads with a 1-sec resolution.


It looks like there are cyclical processes going on, with periodic drops to 0. It seems like it is related to this bug.
3 nodes vs. 5 nodes
Now let’s check the performance under 5 nodes (comparing to 3 nodes)

There does not seem to be a huge difference; only when there are stable results with 8-16 threads, we can see a decline for 5 nodes. For threads 64 to 256, when the variance is prevailing, it is hard to notice the difference.
Conclusions
From my findings, it seems that Group Replication handles extra nodes quite well in this workload, but the multiple threads are problematic.
I am open to suggestions on how the performance of multiple threads can be improved.
 
 
 
 
						 
						 
						 
						 
						