Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

How to Perform Compatible Schema Changes in Percona XtraDB Cluster (Advanced Alternative)?

March 25, 2019

Author

Share this Post:

PXC schema changes options If you are using Galera replication, you know that schema changes may be a serious problem. With its current implementation, there is no way even a simple ALTER will be unobtrusive for live production traffic. It is a fact that with the default TOI alter method, Percona XtraDB Cluster (PXC) cluster suspends writes in order to execute the ALTER in the same order on all nodes.

For factual data structure changes, we have to adapt to the limitations, and either plan for a maintenance window, or use pt-online-schema-change, where interruptions should be very short. I suggest you be extra careful here, as normally you cannot kill an ongoing ALTER query in Galera cluster.

For schema compatible changes, that is, ones that cannot break ROW replication when the writer node and applier nodes have different metadata, we can consider using the Rolling Schema Update (RSU) method. An example of 100% replication-safe DDL is OPTIMIZE TABLE (aka noop-ALTER). However, the following are safe to consider too:

- adding and removing secondary index,

- renaming an index,

- changing the ROW_FORMAT (for example enabling/disabling table compression),

- changing the KEY_BLOCK_SIZE(compression property).

However, a lesser known fact is that even using the RSU method or pt-online-schema-change for the above may not save us from some unwanted disruptions.

RSU and Concurrent Queries

Let’s take a closer look at a very simple scenario with noop ALTER. We will set wsrep_OSU_method to RSU to avoid a cluster-wide stall. In fact, this mode turns off replication for the following DDL (and only for DDL), so you have to remember to repeat the same ALTER on every cluster member later.

For simplicity, let’s assume there is only one node used for writes. In the first client session, we change the method accordingly to prepare for DDL:

node1 > set wsrep_OSU_method=RSU;
Query OK, 0 rows affected (0.00 sec)

node1 > select @@wsrep_OSU_method,@@wsrep_on,@@wsrep_desync;
+--------------------+------------+----------------+
| @@wsrep_OSU_method | @@wsrep_on | @@wsrep_desync |
+--------------------+------------+----------------+
| RSU                |          1 |              0 |
+--------------------+------------+----------------+
1 row in set (0.00 sec)

node1 > set wsrep_OSU_method=RSU;

Query OK, 0 rows affected (0.00 sec)

node1 > select @@wsrep_OSU_method,@@wsrep_on,@@wsrep_desync;

+--------------------+------------+----------------+

| @@wsrep_OSU_method | @@wsrep_on | @@wsrep_desync |

+--------------------+------------+----------------+

| RSU | 1 | 0 |

+--------------------+------------+----------------+

1 row in set (0.00 sec)

(By the way, as seen above, the desync mode is not enabled yet, as it will be automatically enabled around the DDL query only, and disabled right after it finishes).

In a second client session, we start a long enough SELECT query:

node1 > select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000;
...

1 2	node1 > select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000; ...

And while it’s ongoing, let’s rebuild the table:

node1 > alter table db1.sbtest1 engine=innodb;
Query OK, 0 rows affected (0.98 sec)
Records: 0 Duplicates: 0 Warnings: 0

node1 > alter table db1.sbtest1 engine=innodb;

Query OK, 0 rows affected (0.98 sec)

Records: 0 Duplicates: 0 Warnings: 0

Surprisingly, immediately the client in the second session receives its SELECT failure:

ERROR 1213 (40001): WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction

1	ERROR 1213 (40001): WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction

So, even a simple SELECT is aborted if it conflicts with the local, concurrent ALTER (RSU)… We can see more details in the error log:

2018-12-04T21:39:17.285108Z 0 [Note] WSREP: Member 0.0 (node1) desyncs itself from group
2018-12-04T21:39:17.285124Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 471796)
2018-12-04T21:39:17.305018Z 12 [Note] WSREP: Provider paused at 7bf59bb4-996d-11e8-b3b6-8ed02cd38513:471796 (30)
2018-12-04T21:39:17.324509Z 12 [Note] WSREP: --------- CONFLICT DETECTED --------
2018-12-04T21:39:17.324532Z 12 [Note] WSREP: cluster conflict due to high priority abort for threads:

2018-12-04T21:39:17.324535Z 12 [Note] WSREP: Winning thread:
THD: 12, mode: total order, state: executing, conflict: no conflict, seqno: -1
SQL: alter table db1.sbtest1 engine=innodb

2018-12-04T21:39:17.324537Z 12 [Note] WSREP: Victim thread:
THD: 11, mode: local, state: executing, conflict: no conflict, seqno: -1
SQL: select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000

2018-12-04T21:39:17.324542Z 12 [Note] WSREP: MDL conflict db=db1 table=sbtest1 ticket=MDL_SHARED_READ solved by abort
2018-12-04T21:39:17.324544Z 12 [Note] WSREP: --------- CONFLICT DETECTED --------
2018-12-04T21:39:17.324545Z 12 [Note] WSREP: cluster conflict due to high priority abort for threads:

2018-12-04T21:39:17.324547Z 12 [Note] WSREP: Winning thread:
THD: 12, mode: total order, state: executing, conflict: no conflict, seqno: -1
SQL: alter table db1.sbtest1 engine=innodb

2018-12-04T21:39:17.324548Z 12 [Note] WSREP: Victim thread:
THD: 11, mode: local, state: executing, conflict: must abort, seqno: -1
SQL: select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000

2018-12-04T21:39:18.517457Z 12 [Note] WSREP: resuming provider at 30
2018-12-04T21:39:18.517482Z 12 [Note] WSREP: Provider resumed.
2018-12-04T21:39:18.518310Z 0 [Note] WSREP: Member 0.0 (node1) resyncs itself to group
2018-12-04T21:39:18.518342Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 471796)
2018-12-04T21:39:18.519077Z 0 [Note] WSREP: Member 0.0 (node1) synced with group.
2018-12-04T21:39:18.519099Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 471796)
2018-12-04T21:39:18.519119Z 2 [Note] WSREP: Synchronized with group, ready for connections
2018-12-04T21:39:18.519126Z 2 [Note] WSREP: Setting wsrep_ready to true

2018-12-04T21:39:17.285108Z 0 [Note] WSREP: Member 0.0 (node1) desyncs itself from group

2018-12-04T21:39:17.285124Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 471796)

2018-12-04T21:39:17.305018Z 12 [Note] WSREP: Provider paused at 7bf59bb4-996d-11e8-b3b6-8ed02cd38513:471796 (30)

2018-12-04T21:39:17.324509Z 12 [Note] WSREP: --------- CONFLICT DETECTED --------

2018-12-04T21:39:17.324532Z 12 [Note] WSREP: cluster conflict due to high priority abort for threads:

2018-12-04T21:39:17.324535Z 12 [Note] WSREP: Winning thread:

THD: 12, mode: total order, state: executing, conflict: no conflict, seqno: -1

SQL: alter table db1.sbtest1 engine=innodb

2018-12-04T21:39:17.324537Z 12 [Note] WSREP: Victim thread:

THD: 11, mode: local, state: executing, conflict: no conflict, seqno: -1

SQL: select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000

2018-12-04T21:39:17.324542Z 12 [Note] WSREP: MDL conflict db=db1 table=sbtest1 ticket=MDL_SHARED_READ solved by abort

2018-12-04T21:39:17.324544Z 12 [Note] WSREP: --------- CONFLICT DETECTED --------

2018-12-04T21:39:17.324545Z 12 [Note] WSREP: cluster conflict due to high priority abort for threads:

2018-12-04T21:39:17.324547Z 12 [Note] WSREP: Winning thread:

THD: 12, mode: total order, state: executing, conflict: no conflict, seqno: -1

SQL: alter table db1.sbtest1 engine=innodb

2018-12-04T21:39:17.324548Z 12 [Note] WSREP: Victim thread:

THD: 11, mode: local, state: executing, conflict: must abort, seqno: -1

SQL: select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000

2018-12-04T21:39:18.517457Z 12 [Note] WSREP: resuming provider at 30

2018-12-04T21:39:18.517482Z 12 [Note] WSREP: Provider resumed.

2018-12-04T21:39:18.518310Z 0 [Note] WSREP: Member 0.0 (node1) resyncs itself to group

2018-12-04T21:39:18.518342Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 471796)

2018-12-04T21:39:18.519077Z 0 [Note] WSREP: Member 0.0 (node1) synced with group.

2018-12-04T21:39:18.519099Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 471796)

2018-12-04T21:39:18.519119Z 2 [Note] WSREP: Synchronized with group, ready for connections

2018-12-04T21:39:18.519126Z 2 [Note] WSREP: Setting wsrep_ready to true

Another example – a simple sysbench test, during which I did noop ALTER in RSU mode:

# sysbench /usr/share/sysbench/oltp_read_only.lua --table-size=1000 --tables=8 --mysql-db=db1 --mysql-user=root --threads=8 --time=200 --report-interval=1 --events=0 --db-driver=mysql run
sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 8
Report intermediate results every 1 second(s)
Initializing random number generator from current time

Initializing worker threads...

Threads started!

[ 1s ] thds: 8 tps: 558.37 qps: 9004.30 (r/w/o: 7880.62/0.00/1123.68) lat (ms,95%): 18.28 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 8 tps: 579.01 qps: 9290.22 (r/w/o: 8130.20/0.00/1160.02) lat (ms,95%): 17.01 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 8 tps: 597.36 qps: 9528.89 (r/w/o: 8335.17/0.00/1193.72) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00
FATAL: mysql_stmt_store_result() returned error 1317 (Query execution was interrupted)
FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:432: SQL error, errno = 1317, state = '70100': Query execution was interrupted

# sysbench /usr/share/sysbench/oltp_read_only.lua --table-size=1000 --tables=8 --mysql-db=db1 --mysql-user=root --threads=8 --time=200 --report-interval=1 --events=0 --db-driver=mysql run

sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:

Number of threads: 8

Report intermediate results every 1 second(s)

Initializing random number generator from current time

Initializing worker threads...

Threads started!

[ 1s ] thds: 8 tps: 558.37 qps: 9004.30 (r/w/o: 7880.62/0.00/1123.68) lat (ms,95%): 18.28 err/s: 0.00 reconn/s: 0.00

[ 2s ] thds: 8 tps: 579.01 qps: 9290.22 (r/w/o: 8130.20/0.00/1160.02) lat (ms,95%): 17.01 err/s: 0.00 reconn/s: 0.00

[ 3s ] thds: 8 tps: 597.36 qps: 9528.89 (r/w/o: 8335.17/0.00/1193.72) lat (ms,95%): 15.83 err/s: 0.00 reconn/s: 0.00

FATAL: mysql_stmt_store_result() returned error 1317 (Query execution was interrupted)

FATAL: `thread_run' function failed: /usr/share/sysbench/oltp_common.lua:432: SQL error, errno = 1317, state = '70100': Query execution was interrupted

So, SELECT queries are aborted to resolve MDL lock request that a DDL in RSU needs immediately. This of course applies to INSERT, UPDATE and DELETE as well. That’s quite an intrusive way to accomplish the goal…

“Manual RSU”

Let’s try a “manual RSU” workaround instead. In fact, we can achieve the same isolated DDL execution as in RSU, by putting a node in desync mode (to avoid flow control) and disabling replication for our session. That way, the ALTER will only be executed in that particular node.

Session 1:

node1 > set wsrep_OSU_method=TOI; set global wsrep_desync=1; set wsrep_on=0;
Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

node1 > select @@wsrep_OSU_method,@@wsrep_on,@@wsrep_desync;
+--------------------+------------+----------------+
| @@wsrep_OSU_method | @@wsrep_on | @@wsrep_desync |
+--------------------+------------+----------------+
| TOI                |          0 |              1 |
+--------------------+------------+----------------+
1 row in set (0.00 sec)

node1 > set wsrep_OSU_method=TOI; set global wsrep_desync=1; set wsrep_on=0;

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

node1 > select @@wsrep_OSU_method,@@wsrep_on,@@wsrep_desync;

+--------------------+------------+----------------+

| @@wsrep_OSU_method | @@wsrep_on | @@wsrep_desync |

+--------------------+------------+----------------+

| TOI | 0 | 1 |

+--------------------+------------+----------------+

1 row in set (0.00 sec)

Session 2:

node1 > select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000;
+-----------+
| count(*)  |
+-----------+
| 423680000 |
+-----------+
1 row in set (14.07 sec)

node1 > select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000;

+-----------+

| count(*) |

+-----------+

| 423680000 |

+-----------+

1 row in set (14.07 sec)

Session 1:

node1 > alter table db1.sbtest1 engine=innodb;
Query OK, 0 rows affected (13.52 sec)
Records: 0 Duplicates: 0 Warnings: 0

node1 > alter table db1.sbtest1 engine=innodb;

Query OK, 0 rows affected (13.52 sec)

Records: 0 Duplicates: 0 Warnings: 0

Session 3:

node1 > select id,command,time,state,info from information_schema.processlist where user="root";
+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+
| id | command | time | state                           | info |
+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+
| 11 | Query   | 9    | Sending data                    | select count(*) from db1.sbtest1 a join db1.sbtest1 b where a.id<10000 |
| 12 | Query   | 7    | Waiting for table metadata lock | alter table db1.sbtest1 engine=innodb |
| 17 | Query   | 0    | executing                       | select id,command,time,state,info from information_schema.processlist where user="root" |
+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

node1 > select id,command,time,state,info from information_schema.processlist where user="root";
+----+---------+------+----------------+-----------------------------------------------------------------------------------------+
| id | command | time | state          | info |
+----+---------+------+----------------+-----------------------------------------------------------------------------------------+
| 11 | Sleep   | 14   |                | NULL |
| 12 | Query   | 13   | altering table | alter table db1.sbtest1 engine=innodb |
| 17 | Query   | 0    | executing      | select id,command,time,state,info from information_schema.processlist where user="root" |
+----+---------+------+----------------+-----------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

node1 > select id,command,time,state,info from information_schema.processlist where user="root";

+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+

+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+

+----+---------+------+---------------------------------+-----------------------------------------------------------------------------------------+

3 rows in set (0.00 sec)

node1 > select id,command,time,state,info from information_schema.processlist where user="root";

+----+---------+------+----------------+-----------------------------------------------------------------------------------------+

+----+---------+------+----------------+-----------------------------------------------------------------------------------------+

| 11 | Sleep | 14 | | NULL |

+----+---------+------+----------------+-----------------------------------------------------------------------------------------+

3 rows in set (0.00 sec)

In this case, there was no interruption, the ALTER waited for it’s MDL lock request to succeed gracefully, and did it’s job when it became possible.

Remember, you have to execute the same commands on the rest of the nodes to make them consistent – even for noop-alter, it’s important to make the nodes consistent in terms of table size on disk.

Kill Problem

Another fact is that you cannot cancel or kill a DDL query executed in RSU or in TOI method:

node1 > kill query 12;
ERROR 1095 (HY000): You are not owner of thread 12

1 2	node1 > kill query 12; ERROR 1095 (HY000): You are not owner of thread 12

This may be an annoying problem when you need to unblock a node urgently. Fortunately, the workaround with wsrep_on=0 also allows to kill an ALTER without that restriction:

Session 1:

node1 > kill query 22;
Query OK, 0 rows affected (0.00 sec)

1 2	node1 > kill query 22; Query OK, 0 rows affected (0.00 sec)

Session 2:

node1 > alter table db1.sbtest1 engine=innodb;
ERROR 1317 (70100): Query execution was interrupted

1 2	node1 > alter table db1.sbtest1 engine=innodb; ERROR 1317 (70100): Query execution was interrupted

Summary

The RSU method may be more intrusive then you’d expect. For schema compatible changes, it is worth considering “manual RSU” with set global wsrep_desync=1; set wsrep_on=0;
When using it though, please remember that wsrep_on applies to all types of writes, both DDL and DML, so be extra careful to set it back to 1 after the ALTER is done. So the procedure will look like this:

SET GLOBAL wsrep_desync=1;
SET wsrep_on=0;
ALTER ...  /* compatible schema change only! */
SET wsrep_on=1;
SET GLOBAL wsrep_desync=0;

SET GLOBAL wsrep_desync=1;

SET wsrep_on=0;

ALTER ... /* compatible schema change only! */

SET wsrep_on=1;

SET GLOBAL wsrep_desync=0;

Incidentally, as in my opinion the current RSU behavior is unnecessarily intrusive, I have filed this change suggestion: https://jira.percona.com/browse/PXC-2293

—
Photo by Pierre Bamin on Unsplash

0 0 votes

Article Rating

2 Comments

Oldest

Newest Most Voted

zarere

6 years ago

Nice article Przemysław,

I think we can achieve the same in PXC with:

select count(*) from sysbench.sbtest1 a join sysbench.sbtest1 b where a.id<10000 on 1-session
optimize local table sbtest1 on the 2-nd session

optimize local will be run only on the local node and will not cause issues on the other nodes in PXC.

Stofa

6 years ago

Hi Malkowski,

Great help for DDL issues on Galeda cluster
Concerning the “Manual RSU” solution, what happens if in Session 2, we have a transaction running ( INSERT, UPDATE , DELETE ) on the table modified ( by alter ) in Session 1.
Should we stop all write on db1.sbtest1 table in the Node where “Manual RSU” is performed as required by RSU method ( SET SESSION wsrep OSU method = ’RSU’ ) ?
From what i tested, NO DML done on db1.sbtest1 in other sessions different from which the alter command is running will be aborted but all are fully replicated to other nodes. Do you confirm ?

Thanks.