Schema changes are the big challenges in Galera replication. So, it is recommended to understand the schema changes operation for everyone who uses the Percona XtraDB Cluster (PXB)/Galera clusters. In this blog, I am going to explain the operation and impact of the various schema changes methods used in the PXB/Galera cluster.
For testing:
|
1 |
mysql> select @@wsrep_cluster_addressG<br>*************************** 1. row ***************************<br>@@wsrep_cluster_address: gcomm://pxc81,pxc82,pxc83<br>1 row in set (0.00 sec)<br><br>mysql> select @@version, @@version_commentG<br>*************************** 1. row ***************************<br> @@version: 8.0.19-10<br>@@version_comment: Percona XtraDB Cluster (GPL), Release rel10, Revision 727f180, WSREP version 26.4.3<br>1 row in set (0.00 sec) |
|
1 |
mysql> pager grep alter<br>PAGER set to 'grep alter'<br><br>mysql> show processlist;<br>| 19 | root | localhost | schema_changes | Query | 18 | altering table | alter table sbtest1 add index idx_c(c) | 0 | 0 |<br>7 rows in set (0.00 sec)<br><br>mysql> kill 19;<br>ERROR 1095 (HY000): You are not owner of thread 19 |
TOI: Total Order Isolation
|
1 |
mysql> select @@wsrep_OSU_methodG<br>*************************** 1. row ***************************<br>@@wsrep_OSU_method: TOI<br>1 row in set (0.00 sec) |
Example:
I am going to run the below ALTER on “pxc81”.
|
1 |
alter table sbtest1 add index idx_c(c) |
After initiating the ALTER on pxc81, My processlist states the COMMIT and UPDATE ( from sysbench ) statements are paused. Only ALTER is in progress. The COMMIT and UPDATE will be resumed once the ALTER is completed.
|
1 |
| 17 | root | localhost | schema_changes | Execute | 15 | closing tables | COMMIT | 0 | 0 |<br>| 17 | root | localhost | schema_changes | Execute | 15 | updating | UPDATE sbtest1 SET c='91668836759-30934071579-18064439108-53267873872-79461377960-32104006456-143369 | 0 | 1 |<br>| 24 | root | localhost | schema_changes | Query | 15 | altering table | alter table sbtest1 add index idx_c(c) | 0 | 0 | |
But, still, the SELECT statement can be run with “wsrep_sync_wait != 1” because “wsrep_sync_wait = 1” needs the casualty checks from other nodes. So, it will fail.
SELECT with “wsrep_sync_wait=1”
|
1 |
| 1 | system user | | schema_changes | Query | 0 | altering table | alter table sbtest1 add index idx_c(c) | 0 | 0 |<br>| 15 | root | localhost | schema_changes | Query | 40 | starting | select * from sbtest1 where id=1 | 0 | 0 |<br><br>mysql> select * from sbtest1 where id=1;<br>ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction |
TOI can be the right choice for quick operations.
RSU – Rolling Schema Upgrade
Example:
At pxc82, I am going to execute the ALTER.
Session 1: (setting up the value to RSU – session-level)
|
1 |
mysql> set wsrep_OSU_method=RSU;<br>Query OK, 0 rows affected (0.09 sec) |
Session 2: (checking the node status)
|
1 |
mysql> show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';<br>+---------------------------+--------+<br>| Variable_name | Value |<br>+---------------------------+--------+<br>| wsrep_local_recv_queue | 0 |<br>| wsrep_local_state_comment | Synced |<br>+---------------------------+--------+<br>2 rows in set (0.00 sec) |
Session 1: (executing the ALTER )
|
1 |
mysql> alter table sbtest1 add index idx_c(c); |
Session 2: (checking again the node status )
Here the node went to Donor/Desynced state once the ALTER started. You can see the queue also keeps increasing.
|
1 |
mysql> nopager; show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';<br>PAGER set to stdout<br>+---------------------------+----------------+<br>| Variable_name | Value |<br>+---------------------------+----------------+<br>| wsrep_local_recv_queue | 2053 |<br>| wsrep_local_state_comment | Donor/Desynced |<br>+---------------------------+----------------+<br>2 rows in set (0.21 sec) |
Session 1: (ALTER completed)
|
1 |
mysql> alter table sbtest1 add index idx_c(c);<br>Query OK, 0 rows affected (2 min 6.52 sec)<br>Records: 0 Duplicates: 0 Warnings: 0 |
Session 2: (Node synced to cluster)
|
1 |
mysql> show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';<br>+---------------------------+--------+<br>| Variable_name | Value |<br>+---------------------------+--------+<br>| wsrep_local_recv_queue | 0 |<br>| wsrep_local_state_comment | Synced |<br>+---------------------------+--------+<br>2 rows in set (0.00 sec) |
This step needs to be executed in pxc81 and pxc83 as well. After completing on all nodes, we are good to allow the WRITEs for that table.
The RSU method is not truly disruption-free, as there are few bugs reported regarding RSU. Users should be very clear and careful about executing the RSU for schema updates:
https://jira.percona.com/browse/PXC-2620
https://jira.percona.com/browse/PXC-2293
https://jira.percona.com/browse/PXC-1980
So far, we have 3 algorithms,
With TOI:
“ALGORITHM = INPLACE / COPY” still pauses the cluster during the operation. Galera doesn’t allow transactions when an ALTER TABLE statement is run. So if you are using TOI, any ALTER TABLE will block all transactions on all nodes.
|
1 |
| 17 | root | localhost | schema_changes | Execute | 12 | closing tables | COMMIT | 0 | 0 |<br>| 18 | root | localhost | schema_changes | Execute | 12 | closing tables | COMMIT | 0 | 0 |<br>| 32 | root | localhost | schema_changes | Query | 13 | altering table | alter table sbtest1 add index idx_c(c), algorithm=inplace, |
“ALGORITHM=INSTANT” is supported and faster in TOI.
|
1 |
mysql> alter table sbtest1 add column test_Ins int , algorithm=instant;<br>Query OK, 0 rows affected (0.24 sec)<br>Records: 0 Duplicates: 0 Warnings: 0 lock=none |
With RSU:
“ALGORITHM = INPLACE/COPY” is still not beneficial on RSU. It pauses the Galera replication and takes the node to Desync.
|
1 |
mysql> show processlist;<br>| 62 | root | localhost | schema_changes | Query | 51 | altering table | alter table sbtest1 add index idx_c(c), algorithm=inplace, lock=none | 0 | 0 |<br>5 rows in set (0.06 sec)<br><br>mysql> nopager; show global status where Variable_name like 'wsrep_local_recv_queue' or Variable_name like 'wsrep_local_state_comment';<br>PAGER set to stdout<br>+---------------------------+----------------+<br>| Variable_name | Value |<br>+---------------------------+----------------+<br>| wsrep_local_recv_queue | 7335 |<br>| wsrep_local_state_comment | Donor/Desynced |<br>+---------------------------+----------------+<br>2 rows in set (0.03 sec) |
“ALGORITHM=INSTANT” is supported and faster in RSU. But, still, you can use TOI to avoid the additional work.
|
1 |
mysql> alter table sbtest1 add column test_Inss int , algorithm=instant;<br>Query OK, 0 rows affected (0.19 sec)<br>Records: 0 Duplicates: 0 Warnings: 0<br><br>mysql> select @@wsrep_OSU_method;<br>+--------------------+<br>| @@wsrep_OSU_method |<br>+--------------------+<br>| RSU |<br>+--------------------+<br>1 row in set (0.02 sec) |
I would suggest using the “ALGORITHM = INSTANT ” with TOI wherever you can. But, make sure you have the MySQL 8.x + version. Unfortunately, “ALGORITHM=INSTANT” currently only supports adding new columns.
pt-osc : Percona-online-schema-change
Personally, I like this approach very much and use this mostly in production environments. Pt-osc provides non-blocking schema upgrades on all nodes in one shot. This should be used with the TOI method. The action flow will be like this:
For the below ALTER,
alter table schema_changes.sbtest1 add index idx_test_Ins(test_Ins);
Pt-osc flow in SQL syntax:
|
1 |
Creating new table...<br>CREATE TABLE `schema_changes`.`_sbtest1_new` (<br> `id` int NOT NULL AUTO_INCREMENT,<br> `k` int NOT NULL DEFAULT '0',<br> `c` char(120) NOT NULL DEFAULT '',<br> `pad` char(60) NOT NULL DEFAULT '',<br> `test_Ins` int DEFAULT NULL,<br> PRIMARY KEY (`id`),<br> KEY `k_1` (`k`)<br>) ENGINE=InnoDB AUTO_INCREMENT=20400335 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci<br>Created new table schema_changes._sbtest1_new OK.<br>Altering new table...<br>ALTER TABLE `schema_changes`.`_sbtest1_new` add index idx_test_Ins(test_Ins)<br>Altered `schema_changes`.`_sbtest1_new` OK.<br>Not creating triggers because this is a dry run.<br>Not copying rows because this is a dry run.<br>INSERT LOW_PRIORITY IGNORE INTO `schema_changes`.`_sbtest1_new` (`id`, `k`, `c`, `pad`, `test_ins`) SELECT `id`, `k`, `c`, `pad`, `test_ins` FROM `schema_changes`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 9253 copy nibble*/<br>SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `schema_changes`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) ORDER BY `id` LIMIT ?, 2 /*next chunk boundary*/<br>Not swapping tables because this is a dry run.<br>Not dropping old table because this is a dry run.<br>Not dropping triggers because this is a dry run.<br>DROP TRIGGER IF EXISTS `schema_changes`.`pt_osc_schema_changes_sbtest1_del`<br>DROP TRIGGER IF EXISTS `schema_changes`.`pt_osc_schema_changes_sbtest1_upd`<br>DROP TRIGGER IF EXISTS `schema_changes`.`pt_osc_schema_changes_sbtest1_ins`<br>2020-09-30T08:31:17 Dropping new table...<br>DROP TABLE IF EXISTS `schema_changes`.`_sbtest1_new`;<br>2020-09-30T08:31:17 Dropped new table OK. |
Pt-osc provides several options to perform the effective operations. You can control the connections, active threads, load, chunk size etc ..
For Galera, we have the option “–max-flow-ctrl”. The option will check the average time cluster spent pausing for FC and make the tool pause if it goes over the percentage indicated in the option. By default, the tool will not check the FC.
|
1 |
[root@pxc81 log]# less /bin/pt-online-schema-change | grep -i pausing<br> print STDERR "Pausing because PXC Flow Control is activen";<br> print STDERR "Pausing because " |
To make the schema changes on FOREIGN KEY tables, I would suggest using the “alter-foreign-keys-method = rebuild_constraints”. This helps to maintain the consistency of the schema and its relations. In this approach, before dropping the old table, it runs ALTER on all the child tables to drop existing FK and re-add new FK constraints that points to the columns from the new table. Again, adding and dropping the FOREIGN KEY will be the direct ALTER using TOI.
Gh-ost is doing a similar approach like “pt-osc”. It also helps to do the non-blocking ALTERs on all cluster nodes in one shot. The main difference is gh-ost is triggerless. Gh-ost uses the binary log to track the changes. So you need the following variables and thresholds to perform the gh-ost operation.
|
1 |
log-bin=sakthi-bin<br>binlog-format=ROW<br>log-slave-updates=ON |
The flow will be like,
Example:
|
1 |
[root@pxc81 schema_changes]# gh-ost --alter="add index idx_test_Inss(test_Ins)" --database=schema_changes --table=sbtest1 --user=root --password=Jesus@7sakthI --allow-on-master --execute<br>[2020/09/30 09:40:56] [info] binlogsyncer.go:133 create BinlogSyncer with config {99999 mysql 127.0.0.1 3306 root false false <nil> false UTC true 0 0s 0s 0 false}<br>[2020/09/30 09:40:56] [info] binlogsyncer.go:354 begin to sync binlog from position (binlog.000027, 196850993)<br>[2020/09/30 09:40:56] [info] binlogsyncer.go:203 register slave for master server 127.0.0.1:3306<br>[2020/09/30 09:40:56] [info] binlogsyncer.go:723 rotate to (binlog.000027, 196850993)<br># Migrating `schema_changes`.`sbtest1`; Ghost table is `schema_changes`.`_sbtest1_gho`<br># Migrating pxc81:3306; inspecting pxc81:3306; executing on pxc81<br># Migration started at Wed Sep 30 09:40:56 +0000 2020<br># chunk-size: 1000; max-lag-millis: 1500ms; dml-batch-size: 10; max-load: ; critical-load: ; nice-ratio: 0.000000<br># throttle-additional-flag-file: /tmp/gh-ost.throttle <br># Serving on unix socket: /tmp/gh-ost.schema_changes.sbtest1.sock<br>Copy: 0/6563240 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: binlog.000027:196853401; Lag: 0.02s, State: migrating; ETA: N/A<br>Copy: 0/6563240 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: binlog.000027:196858195; Lag: 0.01s, State: migrating; ETA: N/A<br>Copy: 22000/6563240 0.3%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 2s(copy); streamer: binlog.000027:201067135; Lag: 0.01s, State: migrating; ETA: 9m58s<br><br>.......<br><br>Copy: 5682000/6563240 86.6%; Applied: 0; Backlog: 0/1000; Time: 16m10s(total), 16m10s(copy); streamer: binlog.000028:213168607; Lag: 0.01s, State: migrating; ETA: 2m30s<br>Copy: 6563000/6563240 100.0%; Applied: 0; Backlog: 0/1000; Time: 20m20s(total), 20m20s(copy); streamer: binlog.000028:382677405; Lag: 0.01s, State: migrating; ETA: 0s |
Gh-ost also provides several options to perform effective operations. You can control the connections, active threads, load, chunk size, etc.
But unfortunately, “–max-flow-ctl” option is not available in gh-ost.
So, finally, I would say,
Percona XtraDB Cluster is a cost-effective and robust clustering solution created to support your business-critical data. It gives you the benefits and features of MySQL along with the added enterprise features of Percona Server for MySQL.