Multithreaded ALTER TABLE with pt-online-schema-change and myloader

multithreaded alter tablept-online-schema-change emulates the way that MySQL alters tables internally, but it works on a copy of the table you wish to alter. It executes INSERT statements to import the data, that runs in a single connection to fill the new table. In this repository, there is a script called myloader_pt-osc.sh that uses myloader to execute parallel INSERTs, instead of using pt-online-schema-change internal tooling, but it also uses a patched version of pt-online-schema-change.

Patch pt-online-schema-change

There are 2 changes that we needed to perform on pt-online-schema-change:

1- Be able to stop the execution after the copy of the table is created
2- Tell pt-online-schema-change that no data needs to be imported

We are adding the –no-data to tell pt-online-schema-change to do not import the data to the new table. pt-online-schema-change is able to stop the execution with –pause-file option. The patch file includes the code to stop the execution when –no-data and –pause-file is used.

Procedure

The script is performing several tasks, the main ones are:

  • Creating a backup directory in a format that myloader is able to use.
  • Starting the pt-online-schema-change which is going to stop after creating the new table and triggers.
  • Start the myloader execution.
  • Resume pt-online-schema-change, after myloader finishes, to swap the tables.

Take into consideration that the script is not designed to support multi-column nor non-integer primary keys. Do not use it in production environments without testing it thoroughly.

Timings

A sysbench table with 10M rows was created to perform an alter table with “engine innodb” which is like an optimize table, I used these two executions to compare:

The chunk-size and the number of threads were variables to create these graphs:

 

pt-online-schema-change MySQL

 

In the next graph is the improvement in percentage:

 

myloader mysql

 

Use Cases

I consider that this method or strategy should be used on specific scenarios, as there are considerations when not to use it, for instance:

  • The pt-online-schema-change throttle mechanism will not slow down the insert, slaves might get behind.
  • As the intention is to use multiple threads, the server will receive more load and no mechanism in place to stop it.

Conclusion and Expectation

Basically, it is a pt-online-schema-change wrapper that uses myloader to perform the INSERTs in parallel which makes a faster alter table. It might be used when times need to be reduced and the load is not an issue.

I would like to see this implemented inside pt-online-schema-change as a possible strategy but until this happens, this might be an alternative.

Share this post

Comment (1)

  • laimisnd Reply

    Hi

    A very interesting approach of using the parallel myloader.

    One question.
    How is the shared lock/lost delete handled if myloader is used?
    Forget if I am a bit rusty but this is pt-osc sql:
    INSERT LOW_PRIORITY IGNORE INTO … SELECT … FORCE INDEX ( PRIMARY ) WHERE ( ( id >= ? ) ) AND ( ( id <= ? ) ) LOCK IN SHARE MODE

    You get the idea, lock + triggers + insert with ignore handles the concurency, lost delete mostly (but could be other phenomenas also).

    May 14, 2020 at 7:56 am

Leave a Reply