Starting from Percona Toolkit 3.6.0, you can resume pt-online-schema-change if it was interrupted. This blog describes the prerequisites and usage of the new –resume option.
To restart the job, you need to know where it failed. This is why the first option you must use is –history. It instructs pt-online-schema-change to store its progress in the history table. The history table name is percona.pt_osc_history by default. You can overwrite this name with the option –history-table. The history table structure is as follows:
|
1 |
CREATE TABLE pt_osc_history (<br> job_id INT NOT NULL AUTO_INCREMENT,<br> db CHAR(64) NOT NULL,<br> tbl CHAR(64) NOT NULL,<br> new_table_name CHAR(64) DEFAULT NULL,<br> altr TEXT NOT NULL,<br> args TEXT NOT NULL,<br> lower_boundary TEXT,<br> upper_boundary TEXT,<br> done ENUM('yes','no') NOT NULL DEFAULT 'no',<br> ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,<br> PRIMARY KEY (job_id)<br>) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; |
Where
If your table index includes binary columns, such as BLOB or BINARY, create a history table with the option –binary-index. In this case, lower_boundary and upper_boundary will be of the BLOB datatype instead of TEXT. The same idea is used for the –replicate table in pt-table-checksum.
Once you run pt-online-schema-change with the –history option, it will create a record with the job details. If the job fails, you can easily find its unique identifier either in the tool output or in the history table and restart it.
Here is an example of the percona.pt_osc_history table content after a few pt-online-schema-change runs:
|
1 |
mysql> select * from percona.pt_osc_historyG<br>*************************** 1. row ***************************<br> job_id: 1<br> db: test<br> tbl: pt1717<br>new_table_name: __pt1717_new<br> altr: engine=INNODB<br> args: {"history":1,"execute":1,"progress":["time","5"],"drop-new-table":0,"max-lag":"10","chunk-size":"10","alter":"engine=INNODB","drop-triggers":0}<br>lower_boundary: 9991<br>upper_boundary: 10000<br> done: yes<br> ts: 2024-06-05 13:32:09<br>*************************** 2. row ***************************<br> job_id: 2<br> db: test<br> tbl: pt1717<br>new_table_name: __pt1717_new<br> altr: engine=INNODB<br> args: {"alter":"engine=INNODB","max-lag":"10","drop-new-table":0,"drop-triggers":0,"progress":["time","5"],"history":1,"chunk-size":"10","execute":1}<br>lower_boundary: NULL<br>upper_boundary: NULL<br> done: no<br> ts: 2024-06-05 13:32:09<br>*************************** 3. row ***************************<br> job_id: 3<br> db: test<br> tbl: pt1717<br>new_table_name: __pt1717_new<br> altr: engine=INNODB<br> args: {"progress":["time","5"],"execute":1,"chunk-size":"10","history":1,"drop-new-table":0,"alter":"engine=INNODB","max-lag":"10","drop-triggers":0}<br>lower_boundary: NULL<br>upper_boundary: NULL<br> done: no<br> ts: 2024-06-05 13:32:09<br>*************************** 4. row ***************************<br> job_id: 4<br> db: test<br> tbl: pt1717<br>new_table_name: __pt1717_new<br> altr: engine=INNODB<br> args: {"drop-triggers":0,"alter":"engine=INNODB","progress":["time","5"],"history":1,"chunk-size":"10","drop-new-table":0,"max-lag":"10","execute":1}<br>lower_boundary: 4901<br>upper_boundary: 4910<br> done: no<br> ts: 2024-06-05 13:32:12<br>4 rows in set (0,00 sec) |
In the output above,
|
1 |
done: yes |
|
1 |
lower_boundary: NULL<br>upper_boundary: NULL<br> done: no |
|
1 |
lower_boundary: 4901<br>upper_boundary: 4910<br> done: no |
To restart the job, run pt-online-schema-change with option –resume=ID_OF_YOUR_PREVIOUSLY_FAILED_JOB. The tool will check if the previously created table copy and its triggers are still in place and will start the job from the chunk where it failed before.
|
1 |
$ ./bin/pt-online-schema-change h=127.0.0.1,P=12345,u=msandbox,p=msandbox,D=test,t=pt1717 --execute --chunk-size 10 --max-lag 10 --alter 'engine=INNODB' --progress time,5 --no-drop-new-table --no-drop-triggers --history --resume=4<br>2024-06-05T13:54:11 Job 5 started.<br>Found 2 slaves:<br>s76 -> 127.0.0.1:12346<br>s76 -> 127.0.0.1:12347<br>Will check slave lag on:<br>s76 -> 127.0.0.1:12346<br>s76 -> 127.0.0.1:12347<br>Operation, tries, wait:<br> analyze_table, 10, 1<br> copy_rows, 10, 0.25<br> create_triggers, 10, 1<br> drop_triggers, 10, 1<br> swap_tables, 10, 1<br> update_foreign_keys, 10, 1<br>Altering `test`.`pt1717`...<br>Altering new table...<br>Altered `test`.`__pt1717_new` OK.<br>2024-06-05T13:54:11 Creating triggers...<br>2024-06-05T13:54:11 Created triggers OK.<br>2024-06-05T13:54:11 Copying approximately 9956 rows...<br>2024-06-05T13:54:14 Job 5 finished successfully.<br>2024-06-05T13:54:14 Copied rows OK.<br>2024-06-05T13:54:14 Analyzing new table...<br>2024-06-05T13:54:14 Swapping tables...<br>2024-06-05T13:54:14 Swapped original and new tables OK.<br>Not dropping old table because --no-drop-triggers was specified.<br>Not dropping triggers because --no-drop-triggers was specified. To drop the triggers, execute:<br>DROP TRIGGER IF EXISTS `test`.`pt_osc_test_pt1717_del`<br>DROP TRIGGER IF EXISTS `test`.`pt_osc_test_pt1717_upd`<br>DROP TRIGGER IF EXISTS `test`.`pt_osc_test_pt1717_ins`<br>Successfully altered `test`.`pt1717`. |
You can use option –history together with –resume or you can skip it. Note, that if you do not use option –history, you won’t be able to restart the job if it fails one more time.
Another prerequisite for option –resume is –no-drop-new-table and –no-drop-triggers. If a new table and triggers are dropped, pt-online-schema-change cannot restart the job because no previously copied data is present.
|
1 |
$ ./bin/pt-online-schema-change h=127.0.0.1,P=12345,u=msandbox,p=msandbox,D=test,t=pt1717 --execute --chunk-size 10 --max-lag 10 --alter 'engine=INNODB' --progress time,5 --history --resume=9<br>2024-06-05T13:58:10 Job 10 started.<br>Found 1 slaves:<br>s76 -> 127.0.0.1:12346<br>Will check slave lag on:<br>s76 -> 127.0.0.1:12346<br>Operation, tries, wait:<br> analyze_table, 10, 1<br> copy_rows, 10, 0.25<br> create_triggers, 10, 1<br> drop_triggers, 10, 1<br> swap_tables, 10, 1<br> update_foreign_keys, 10, 1<br>Altering `test`.`pt1717`...<br>New table `test`.`___pt1717_new` not found, restart operation from scratch<br>`test`.`pt1717` was not altered.<br>History saved. Job id: 10 |
If you want to resume pt-online-schema-change after the failure, use it with options –history –no-drop-new-table –no-drop-triggers. If the job fails, you can resume it by providing the job ID as an argument for the option –resume. You will find the ID of the failed job in the pt-online-schema-change output and the history table.
Resources
RELATED POSTS