Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

pt-online-schema-change (if misused) can’t save the day

May 24, 2016

Author

Share this Post:

In this blog post we’ll discuss pt-online-schema-change, and how to correctly use it.

Always use pt-osc?

Altering large tables can be still a problematic DBA task, even now after we’ve improved Online DDL features in MySQL 5.6 and 5.7. Some ALTER types are still not online, or sometimes just too expensive to execute on busy production master.

So in some cases, we may want to apply an ALTER first on slaves, taking them out of traffic pool one by one and bringing them back after the ALTER is done. In the end, we can promote one of the already altered slaves to be new master, so that the downtime/maintenance time is greatly minimized. The ex-master can be altered later, without affecting production. Of course, this method works best when the schema change is backwards-compatible.

So far so good, but there is another problem. Let’s say the table is huge, and ALTER takes a lot of time on the slave. When it is a DML-blocking type ALTER (perhaps when using MySQL 5.5.x or older, etc.), there will be a long slave lag (if the table is being written by replication SQL thread at the same time, for example). So what do we to speed up the process and avoid the altered slave lag? One temptation that could tempt you is why not use pt-online-schema-change on the slave, which can do the ALTER in a non-blocking fashion?

Let’s see how it that would work. I need to rebuild big table on slave using MySQL version 5.6.16 (“null alter” was made online since 5.6.17) to reclaim disk space after some rows are deleted.

This example demonstrates the process (db1 is the master, db2 is the slave):

[root@db2 ~]# pt-online-schema-change --execute --alter "engine=innodb" D=db1,t=sbtest1
 No slaves found.  See --recursion-method if host db2 has slaves.
 Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
 Operation, tries, wait:
 analyze_table, 10, 1
 copy_rows, 10, 0.25
 create_triggers, 10, 1
 drop_triggers, 10, 1
 swap_tables, 10, 1
 update_foreign_keys, 10, 1
 Altering `db1`.`sbtest1`...
 Creating new table...
 Created new table db1._sbtest1_new OK.
 Altering new table...
 Altered `db1`.`_sbtest1_new` OK.
 2016-05-16T10:50:50 Creating triggers...
 2016-05-16T10:50:50 Created triggers OK.
 2016-05-16T10:50:50 Copying approximately 591840 rows...
 Copying `db1`.`sbtest1`:  51% 00:28 remain

(...)

[root@db2 ~]# pt-online-schema-change --execute --alter "engine=innodb" D=db1,t=sbtest1

No slaves found. See --recursion-method if host db2 has slaves.

Not checking slave lag because no slaves were found and --check-slave-lag was not specified.

Operation, tries, wait:

analyze_table, 10, 1

copy_rows, 10, 0.25

create_triggers, 10, 1

drop_triggers, 10, 1

swap_tables, 10, 1

update_foreign_keys, 10, 1

Altering `db1`.`sbtest1`...

Creating new table...

Created new table db1._sbtest1_new OK.

Altering new table...

Altered `db1`.`_sbtest1_new` OK.

2016-05-16T10:50:50 Creating triggers...

2016-05-16T10:50:50 Created triggers OK.

2016-05-16T10:50:50 Copying approximately 591840 rows...

Copying `db1`.`sbtest1`: 51% 00:28 remain

(...)

The tool is still working during the operation, and the table receives some writes on master:

db1 {root} (db1) > update db1.sbtest1 set k=k+2 where id<100;
 Query OK, 99 rows affected (0.06 sec)
 Rows matched: 99  Changed: 99  Warnings: 0

db1 {root} (db1) > update db1.sbtest1 set k=k+2 where id<100;
 Query OK, 99 rows affected (0.05 sec)
 Rows matched: 99  Changed: 99  Warnings: 0

db1 {root} (db1) > update db1.sbtest1 set k=k+2 where id<100;

Query OK, 99 rows affected (0.06 sec)

Rows matched: 99 Changed: 99 Warnings: 0

db1 {root} (db1) > update db1.sbtest1 set k=k+2 where id<100;

Query OK, 99 rows affected (0.05 sec)

Rows matched: 99 Changed: 99 Warnings: 0

which are applied on slave right away, as the table allows writes all the time.

(...)
 Copying `db1`.`sbtest1`:  97% 00:01 remain
 2016-05-16T10:51:53 Copied rows OK.
 2016-05-16T10:51:53 Analyzing new table...
 2016-05-16T10:51:53 Swapping tables...
 2016-05-16T10:51:53 Swapped original and new tables OK.
 2016-05-16T10:51:53 Dropping old table...
 2016-05-16T10:51:53 Dropped old table `db1`.`_sbtest1_old` OK.
 2016-05-16T10:51:53 Dropping triggers...
 2016-05-16T10:51:53 Dropped triggers OK.
 Successfully altered `db1`.`sbtest1`.

(...)

Copying `db1`.`sbtest1`: 97% 00:01 remain

2016-05-16T10:51:53 Copied rows OK.

2016-05-16T10:51:53 Analyzing new table...

2016-05-16T10:51:53 Swapping tables...

2016-05-16T10:51:53 Swapped original and new tables OK.

2016-05-16T10:51:53 Dropping old table...

2016-05-16T10:51:53 Dropped old table `db1`.`_sbtest1_old` OK.

2016-05-16T10:51:53 Dropping triggers...

2016-05-16T10:51:53 Dropped triggers OK.

Successfully altered `db1`.`sbtest1`.

Done! No slave lag, and the table is rebuilt. But . . . let’s just make sure data is consistent between the master and slave (you can use pt-table-checksum):

db1 {root} (db1) > select max(k) from db1.sbtest1 where id<100;
 +--------+
 | max(k) |
 +--------+
 | 392590 |
 +--------+
 1 row in set (0.00 sec)

db2 {root} (test) > select max(k) from db1.sbtest1 where id<100;
 +--------+
 | max(k) |
 +--------+
 | 392586 |
 +--------+
 1 row in set (0.00 sec)

db1 {root} (db1) > select max(k) from db1.sbtest1 where id<100;

+--------+

| max(k) |

+--------+

| 392590 |

+--------+

1 row in set (0.00 sec)

db2 {root} (test) > select max(k) from db1.sbtest1 where id<100;

+--------+

| max(k) |

+--------+

| 392586 |

+--------+

1 row in set (0.00 sec)

No, it is not! The slave is clearly missing the updates that happened during a pt-osc run. Why?

The explanation is simple. The pt-online-schema-change relies on triggers. The triggers are used to make the writes happening to the original table also populate to the temporary table copy, so that both tables are consistent when the final table switch happens at the end of the process. So what is the problem here? It’s the binary log format: in ROW based replication, the triggers are not fired on the slave! And my master is running in ROW mode:

db1 {root} (db1) > show variables like 'binlog_format';
 +---------------+-------+
 | Variable_name | Value |
 +---------------+-------+
 | binlog_format | ROW |
 +---------------+-------+
 1 row in set (0.01 sec)

db1 {root} (db1) > show variables like 'binlog_format';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| binlog_format | ROW |

+---------------+-------+

1 row in set (0.01 sec)

So, if I used pt-online-schema-change on the master, the data inconsistency problem doesn’t happen. But using it on the slave is just dangerous!

Conclusion

Whenever you use pt-online-schema-change, make sure you are not executing it on a slave instance. For that reason, I escalated this bug report: https://bugs.launchpad.net/percona-toolkit/+bug/1221372. Also in many cases, using a normal ALTER will work well enough. As in my example, to rebuild the table separately on each slave in lockless mode, I would just need to upgrade to the more recent 5.6 version.

BTW, if you’re wondering about Galera replication (used in Percona XtraDB Cluster, etc.) since it also uses a ROW-based format, it’s not a problem. The pt-osc triggers are created in all nodes thanks to synchronous write-anywhere replication nature. It does not matter which node you start pt-online-schema-change on, and which other nodes your applications writes on at the same time. No slaves, no problem! 🙂

0 0 votes

Article Rating

4 Comments

Oldest

Newest Most Voted

Piotr Mazurkiewicz

10 years ago

MariaDB has ability to run triggers on slave for row-based replication starting from 10.1.1. Which I have ported to PS 5.6 successfully.

Todd Lomaro

9 years ago

What if you have Master-Master enabled?

zhuomin chen

9 years ago

This should NOT be the issue, even though trigger on Original table on Slave is NOT firing,
Rows on New table on Master should be replicated to New table on Slave over via binlog.
I do not see the need for slave firing trigger at all.

Jimmy

Igor

8 years ago

Was using on pt-online-schema-change with –set-vars=”SQL_LOG_BIN=OFF” to prevent propagation to the connected slave. I was expecting the change will take place on 3 nodes Galera cluster but will be missed on connected Slave. Well, the change indeed happen on Galera cluster, but the table on 2 Galera nodes is EMPTY! BEWARE!!!