Starting from Percona Toolkit 3.6.0, pt-online-schema-change supports the option –where, so you can not only alter your table definition live but copy only rows that satisfy certain criteria.
It may happen that you may not need to copy all the data when changing table definition. For example, if the table is too big and you need only recent data.
To achieve this task, run pt-online-schema-change with options –where=’YOUR WHERE CONDITION’ –no-drop-new-table –no-swap-tables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
$ ./bin/pt-online-schema-change h=127.0.0.1,P=12345,u=msandbox,p=msandbox,D=test,t=comments > --alter="key_block_size=4 row_format=compressed" --where="created_at < '2024-01-01 00:00:00'" > --execute --no-swap-tables --no-drop-new-table --new-table-name=comments_2023 Found 2 slaves: s76 -> 127.0.0.1:12346 s76 -> 127.0.0.1:12347 Will check slave lag on: s76 -> 127.0.0.1:12346 s76 -> 127.0.0.1:12347 Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `test`.`comments`... Creating new table... Created new table test.comments_2023 OK. Altering new table... Altered `test`.`comments_2023` OK. 2024-06-06T15:51:32 Creating triggers... 2024-06-06T15:51:32 Created triggers OK. 2024-06-06T15:51:32 Copying approximately 9964 rows... 2024-06-06T15:51:32 Copied rows OK. Not dropping old table because --no-swap-tables was specified. 2024-06-06T15:51:32 Dropping triggers... 2024-06-06T15:51:32 Dropped triggers OK. Not dropping the new table `test`.`comments_2023` because --no-drop-new-table was specified. To drop the new table, execute: DROP TABLE IF EXISTS `test`.`comments_2023`; Successfully altered `test`.`comments`. |
The code snippet above will create a new table, comments_2023, and will leave the original table untouched.
Options –no-drop-new-table –no-swap-tables are needed to protect rest of the data in the original table. pt-online-schema-change will refuse to execute if you start it with option –where but omit these safety options.
1 2 |
$ ./bin/pt-online-schema-change h=127.0.0.1,P=12345,u=msandbox,p=msandbox,D=test,t=comments > --alter="add fulltext index comment(comment)" --where="created_at >= '2024-01-01 00:00:00'" --execute |
Using option –where together with –drop-new-table and –swap-tables may lead to data loss, therefore this operation is only allowed if option –force also specified. Aborting.
If you are sure you do not need to keep data, not matching WHERE clause, you can override this safety check with option –force:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
$ ./bin/pt-online-schema-change h=127.0.0.1,P=12345,u=msandbox,p=msandbox,D=test,t=comments > --alter="add fulltext index comment(comment)" --where="created_at >= '2024-01-01 00:00:00'" --execute --force Found 2 slaves: s76 -> 127.0.0.1:12346 s76 -> 127.0.0.1:12347 Will check slave lag on: s76 -> 127.0.0.1:12346 s76 -> 127.0.0.1:12347 Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `test`.`comments`... Creating new table... Created new table test._comments_new OK. Altering new table... Altered `test`.`_comments_new` OK. 2024-06-06T15:54:54 Creating triggers... 2024-06-06T15:54:54 Created triggers OK. 2024-06-06T15:54:54 Copying approximately 9964 rows... 2024-06-06T15:54:54 Copied rows OK. 2024-06-06T15:54:54 Analyzing new table... 2024-06-06T15:54:54 Swapping tables... 2024-06-06T15:54:54 Swapped original and new tables OK. 2024-06-06T15:54:54 Dropping old table... 2024-06-06T15:54:54 Dropped old table `test`.`_comments_old` OK. 2024-06-06T15:54:54 Dropping triggers... 2024-06-06T15:54:54 Dropped triggers OK. Successfully altered `test`.`comments`. |
As a result of the two operations shown, the database would have two tables: archived, with comments from the year 2023, and another, actual, with comments from the year 2024:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
mysql> select count(*), min(created_at), max(created_at) from comments_2023; +----------+---------------------+---------------------+ | count(*) | min(created_at) | max(created_at) | +----------+---------------------+---------------------+ | 5685 | 2023-06-07 19:04:27 | 2023-12-31 23:08:28 | +----------+---------------------+---------------------+ 1 row in set (0,00 sec) mysql> select count(*), min(created_at), max(created_at) from comments; +----------+---------------------+---------------------+ | count(*) | min(created_at) | max(created_at) | +----------+---------------------+---------------------+ | 4315 | 2024-01-01 01:10:10 | 2024-06-06 18:03:20 | +----------+---------------------+---------------------+ 1 row in set (0,00 sec) |
Note, that option –where does not have any effect on triggers that pt-online-schema-change creates to copy newly inserted or updated rows, and we do not recommend using it with conditions that may be affected by the ongoing queries.
Summary
Use option –where to copy only part of rows while modifying the table.
Use options –no-drop-new-table –no-swap-tables to prevent data loss.
Use option –new-table-name if you do not want to perform drop and swap tables operation manually and want to keep two tables instead.