Percona Server for MySQL 8.0.18 ships all functionality to run Group Replication and InnoDB Cluster setups, so I decided to evaluate how it works and how it compares with Percona XtraDB Cluster in some situations.
For this I planned to use three bare metal nodes, SSD drives, and a 10Gb network available for in-between nodes communication, but later I also added tests on three bare metal nodes with NVMe drives and 2x10Gb network cards.
To simplify deployment, I created simple ansible scripts.
Load Data
The first initial logical step is to load data into an empty cluster, so let’s do this with our sysbench-tpcc script.
1 |
./tpcc.lua --mysql-host=10.30.2.5 --mysql-user=sbtest --mysql-password=sbtest --mysql-db=sbtest --time=300 --threads=64 --report-interval=1 --tables=100 --scale=10 --db-driver=mysql --use_fk=0 --force_pk=1 --trx_level=RC prepare |
The resulted dataset is about 100GB.
Group Replication, Load Time
The time to finish the script is 61 minutes, 19 seconds.
Let’s review how the network was loaded on a secondary node in Group Replication during the execution:
Average network traffic: 19.02 MiB/sec
PXC 5.7.28, Load Time
The time to finish the script is 39 minutes, 27 seconds.
Average network traffic: 29.81 MiB/sec
PXC 8.0.15 Experimental, Load Time
The time to finish the script is 43 minutes, 22 seconds.
Average network traffic: 27.35 MiB/sec
One Node PXC 5.7.28 Load Time
To see how PXC would perform without network interactions, I loaded data into a one node PXC cluster and it took 36 minutes, 34 seconds. So there is a minimal network overhead for PXC 5.7 (36 minutes for one node vs 39 minutes for three nodes).
Node Joining
The next experiment I wanted to perform is to see how long it would take for a new node to join the cluster with data loaded in the previous part. The Group Replication supports two methods to catch-up: incremental (loading data from binary logs) and the clone plugin (physical copy of data).
Let’s measure time for a new node to join to catch-up with both methods:
Incremental
1 2 3 4 5 |
Start: 2020-01-08T17:05:04.934618Z 60 [System] [MY-010562] [Repl] Slave I/O thread for channel 'group_replication_recovery': connected to master 'mysql_innodb_cluster_2@172.16.0.1:3306',replication started in log 'FIRST' at position 4 End: 2020-01-08T18:57:02.350199Z 59 [Note] [MY-011585] [Repl] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs. |
It took 1 hour and 52 minutes for a node to join and apply binary logs.
Incremental State Transfer in Percona XtraDB Cluster
It might not be obvious, but actually it is possible to have an incremental state transfer for big dataset changes in Percona XtraDB Cluster too, we just need to use a big enough gcache.
For testing purposes, I will set wsrep_provider_options=”gcache.size=150G” to check how long it will take to ship and apply IST in PXC.
Log extract:
1 2 3 4 5 6 7 8 9 10 11 12 |
2020-01-16T20:12:03.261978Z 2 [Note] WSREP: Receiving IST: 155275 writesets, seqnos 155273-310548 2020-01-16T20:12:03.262165Z 0 [Note] WSREP: Receiving IST... 0.0% ( 0/155275 events) complete. …. 2020-01-16T20:41:23.777620Z 0 [Note] WSREP: Receiving IST... 99.7% (154768/155275 events) complete. 2020-01-16T20:41:32.391378Z 0 [Note] WSREP: Receiving IST...100.0% (155275/155275 events) complete. 2020-01-16T20:41:32.723172Z 2 [Note] WSREP: IST received: fd361be5-3889-11ea-8450-c3f366733bd0:310548 2020-01-16T20:41:32.723546Z 0 [Note] WSREP: 0.0 (node3): State transfer from 2.0 (node4) complete. 2020-01-16T20:41:32.723576Z 0 [Note] WSREP: SST leaving flow control 2020-01-16T20:41:32.723585Z 0 [Note] WSREP: Shifting JOINER -> JOINED (TO: 310548) 2020-01-16T20:41:32.723838Z 0 [Note] WSREP: Member 0.0 (node3) synced with group. 2020-01-16T20:41:32.723850Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 310548) |
In total it took 29 minutes 30 seconds to transfer and apply IST. So it was four times faster to apply IST than to apply binary logs.
Clone
I show the full log during the clone process, as it contains interesting information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
2020-01-08T20:00:58.102913Z 113 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Task Connect.' 2020-01-08T20:00:58.108244Z 113 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Master ACK Connect.' 2020-01-08T20:00:58.108311Z 113 [Note] [MY-013457] [InnoDB] Clone Apply Begin Master Version Check 2020-01-08T20:00:58.117149Z 113 [Note] [MY-013457] [InnoDB] Clone Apply Version End Master Task ID: 0 Passed, code: 0: 2020-01-08T20:00:58.117188Z 113 [Note] [MY-013457] [InnoDB] Clone Apply Begin Master Task 2020-01-08T20:00:58.117478Z 113 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Started 2020-01-08T20:00:58.117498Z 113 [Note] [MY-011977] [InnoDB] Clone Drop all user data 2020-01-08T20:00:58.194686Z 113 [Note] [MY-011977] [InnoDB] Clone: Fix Object count: 178 task: 0 2020-01-08T20:00:58.231760Z 113 [Note] [MY-011977] [InnoDB] Clone Drop User schemas 2020-01-08T20:00:58.231861Z 113 [Note] [MY-011977] [InnoDB] Clone: Fix Object count: 5 task: 0 2020-01-08T20:00:58.234517Z 113 [Note] [MY-011977] [InnoDB] Clone Drop User tablespaces 2020-01-08T20:00:58.234829Z 113 [Note] [MY-011977] [InnoDB] Clone: Fix Object count: 6 task: 0 2020-01-08T20:00:58.238720Z 113 [Note] [MY-011977] [InnoDB] Clone Drop: finished successfully 2020-01-08T20:00:58.238777Z 113 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Finished 2020-01-08T20:00:58.433336Z 113 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Command COM_INIT.' 2020-01-08T20:00:58.539294Z 113 [Note] [MY-013458] [InnoDB] Clone Apply State Change : Number of tasks = |