Redesign pt-online-schema-change

See also the launchpad blueprint. Some or all of this work is billable to customer issue 22109.

Most features are done but _not_ tested:

  1. (DONE) Don't disable the binary log, and don't provide a feature that will allow it. This should make the tool work for row-based replication too. It will also alter the table on all servers at once. This should also make bug 928961 obsolete (pt-online-schema-change truncates recipient tables in XtraDB Cluster with Galera)
  2. (DONE) Make the output less verbose. By default it should probably just print something like this:
       Creating temp table.... OK
       Creating triggers... OK 
       Copying rows... this can take a while
       Copying rows complete (1:19:29) OK
       Swapping old and new table... OK
       Dropping old table... OK
  3. (DONE) Auto-choose a temp table name that is based on the existing table name, adding leading underscores until there is no such table. In case there are multiple tools running at the same time, do this by trying to create the table and catching “table exists” errors and retrying. (Don't fall into a race condition.) Similarly, when getting ready to rename the original and new table, just choose an “old table” name with more and more leading underscores. But, warn if an old table exists from a previous run, so the user is aware that there is something that needs to be cleaned up. Currently it looks like the tool will just fail if this situation happens, which will leave the DBA to figure out how to complete the process of the schema change if anything is left in the middle of the progress.
  4. (DONE) Use “nibbling” instead of chunking, so any table that has an index can be altered. This should also solve the problem reported in bug 898695 (pt-online-schema-change add useless ORDER BY).
  5. (DONE) Check that the table has a primary key or unique index. The DELETE trigger will need this. It should specify a WHERE clause that contains all of the columns of the primary key (preferred) or a unique index.
  6. (DONE) Auto-throttle the rate of copying rows, to target a specific –chunk-time like pt-table-checksum does.
  7. (DONE) Drop the old table and cleanup by default at exit, unless there is an error.
  8. (DONE) Don't support “cleanup only” functionality. Let a DBA do that manually in case there is an error; this will simplify the code.
  9. (DONE) Set the innodb lock wait timeout as pt-table-checksum does.
  10. (DONE) Recurse to slaves as pt-table-checksum does. Add –recurse and –recursion-method.
  11. (DONE) Check for replication filters as pt-table-checksum does.
  12. (DONE) Check for oversized chunks as pt-table-checksum does.
  13. (DONE) Add a –max-load option similar to pt-table-checksum.
  14. Reconnect and retry on errors like pt-table-checksum. If retries are exceeded, stop with an error message indicating the chunk boundaries that were completed successfully. Store completed chunks into an auxiliary table named similarly to the temp table, but with a “_chunks” suffix or something like that. In the future we can use this for resuming. At present a DBA can “resume” manually.
  15. (DONE) Throttle when replicas lag, like pt-table-checksum. Requires –check-slave-lag and –max-lag.
  16. (DONE) I am not comfortable with –alter having multiple statements in it and splitting on semicolons. A decent DBA can write a single ALTER that does everything needed in one pass, and I think we are prone to parsing errors or something here. Simplify this: just run “ALTER TABLE $db.$tbl –alter”
  17. Investigate how this tool behaves on MySQL 5.5 with the new locking and metadata locking features.
  18. (DONE) Foreign keys:
    1. Automatically detect when a table has child tables (foreign keys that refer to it)
    2. If so, then if the user explicitly specified a method, use that.
    3. Otherwise automatically choose the best method for updating them.
      1. Use ALTER (the rebuild_constraints method) when every table's estimated number of rows (according to EXPLAIN) is small enough to be done in one nibble. The number of rows for a nibble should be based on the parent table's number of rows and the total time it took to alter the parent table. Then multiply –chunk-time by –chunk-size-limit, which should default to 4. (This is a magical number. I have observed that pt-online-schema-change works about 4 times as slowly as ALTER TABLE.) If the result is that we estimate the child tables can all be ALTER'ed in the same amount of time as a nibble of the parent table, then ALTER it.
      2. Otherwise, use the drop_swap method.
  19. Ensure that AUTO_INCREMENT is preserved for the new table. This will solve bug 898517.
  20. (DONE) Simplify tool further by removing options related to the new table. Make the tool always auto-choose a new table.
  21. As part of all of the above, change these options:
    1. Remove –bin-log (DONE)
    2. Remove –cleanup-and-exit (DONE)
    3. Remove –create-tmp-table (DONE)
    4. Make –drop-old-table negatable and TRUE by default (DONE)
    5. Rename –rename-tables to –swap-tables (DONE)
    6. Make –update-foreign-keys-method automatic unless specified (DONE)
    7. Remove –sleep (DONE)
    8. Add –chunk-size (DONE)
    9. Add –chunk-size-limit (DONE)
    10. Add –chunk-time (DONE)
  22. (DONE) Add a –critical-load option similar to –max-load, but when it is reached, the tool will abort the job, remove the triggers, and drop the temporary table. The default threshold will be Threads_running=50.
  23. (DONE) Make the tool use unique trigger names so that multiple tools can work on several tables at once. Choose trigger names based on the table name. Don't do leading underscores strategy; that is unnecessary because it would only happen if two alters were happening on the same table at once.
  24. (DONE) If the tool cancels or exits without cleaning up triggers and temp tables (for example a CTRL-C, which we would hopefully want to resume in a future version), print out the DROP syntax for the triggers and temp table it left behind. (In other words: always clean up unless the tool was interrupted, in which case, leave stuff so it can be resumed, but print the SQL to clean up.)
  25. After the tool finishes, use SHOW TABLES and SHOW TRIGGERS to verify that all of the _new and _old tables and triggers are really gone. For example, if the tool creates sakila._actor_new and then renames to actor and _actor_old, verify that actor exists, and that _actor_new and _actor_old don't. Verify that actor doesn't have triggers. Notify the user of anything that's unexpected.

Daniel's notes

  1. “tmp table” was poorly named: it's now “new table” because that's what it is; it was never temporary
  2. “rename tables' was also poorly named: they're not just renamed, they're swapped, so –rename-tables is now –swap-tables
  3. –child-tables had a magical value “auto_detect” which was a bad idea: auto-find child tables unless –child-tables specified
  4. Added –dry-run and tweaked how –print and –execute work to be more like the same options in pt-table-sync; explanation:
    • –dry-run: Do all the work but do not touch the original table. This means that the new table is created and altered, and other stuff happens internally but the SQL simply isn't executed. This allows the user to really try the tool before –execute. Presuming nothing weird happens, if –dry-run passes, then –execute will work, except if maybe something breaks the copy process.
    • –print: Print alter-related SQL statements. By “alter-realted” this means queries involved in the alter process, not all the queries to check for triggers, check slave lag, etc.–in other works, print the important queries. With –dry-run, this really shows what would be done.
    • –execute: Execute that queries –print prints; do the alter: copy rows, swap the tables, etc. This is mutually exclusive with –dry-run.
  5. “drop_old_table” (the –update-foreign-keys-method) was poorly named since it doesn't really drop the old table, it drops the original table, then renames the new in its place; so there's never an old table, which is why this is risky; it's now called “drop_swap”

Copying rows

Any error during the copy phrase causes the tool to abort because, unlike pt-table-checksum, if a single chunk is missed that means the new table will be missing that many rows which is a big problem. So it's all or nothing until a resume solution is implemented (not currently done). Thus I raised –retries to 3; maybe this needs to be higher.

If copying fails, the original table shouldn't be affected because the triggers we put on it are just applying changes to the new table (while we're copying rows to it) and not doing anything on the original table.

ptdev/blueprints/redesign-pt-online-schema-change.txt · Last modified: 2015/08/05 15:37 (external edit)
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Contact Us 24 Hours A Day
SupportContact us 24×7
Emergency? Contact us for help now!
Sales North America(888) 316-9775 or
(208) 473-2904
+44-208-133-0309 (UK)
0-800-051-8984 (UK)
0-800-181-0665 (GER)
Training(855) 55TRAIN or
(925) 271-5054


Share This
]]> ]]>