You probably missed the news, but…

PT-1751: Adds –where param to pt-online-schema-change

This brings the possibility to perform what I would call an “inverted purge” because you are not actually purging rows from your multi-terabyte table, but rather, you copy the small percentage of rows you want to keep to a new table and then drop the original table. This not only saves you from doing far more DELETE operations but also saves you from blocking the OPTIMIZE TABLE or from running pt-online-schema-change anyway to rebuild the table.

We used to perform this type of operation in two steps: first run pt-archiver to purge the unwanted rows, and then run pt-online-schema-change to rebuild the table. This certainly worked, but the purge operation could take days and tends to trash the buffer pool, as it will read into memory all the pages that contain rows to be deleted (which for the majority of cases that we see in Support, is 70% or more of some very-large table).

The new –where option will work like regular pt-online-schema-change but will only copy part of rows that interest you into a new table, and then you will have to “manually” perform the tables swap. As mentioned before, we assume the number of rows to be retained is smaller than the number of rows to be purged, and thus, the time to copy is shorter than the time to DELETE and should also incur less buffer-pool trashing.

To test the process, we did this:

Create the schema:

Fill up with one row per day for 18 months.

Then run a simple loop that continues inserting one row every five seconds with the current time stamp so we can see if the effect of locks on the table and verify the triggers work as expected to keep the _new table updated:

 

And now comes what you all have been waiting for: run pt-online-schema-change to copy ONLY the rows we want to preserve. For our example, we’ll pretend we want to keep all data from 2024:

The output of pt-osc will look something like this:

Once the pt-osc process completes the chunks copy, the triggers will remain attached to the current (full) table, so the new table will continue to get updates as they arrive.

To complete the process, you should swap the tables. This is atomic and will wait for transactions to finish, get MDL lock, and perform both renames under the scope of the same MDL lock:

At this point, you could do LEFT JOIN between the old and new tables to verify there are no missing IDs.

After successfully swapping the new and the old table, the output looks like this:

Perfect! No missing rows!

The delayed value in “latest” tells us that the old table is no longer in sync! This is the expected status, so it’s no problem.

Notice that “earliest” is now 2024, confirming that the purge finished correctly, and “latest” shows fresh values confirming that writes arrive at the new table.

The new table has the latest data, plus we checked it had all the rows we wanted from the old table, so we can now drop the old table to instantly reclaim the space:

As a reminder: be mindful when dropping tables as there are still bugs that cause InnoDB to scan the Adaptive Hash Index as part of the process to purge pages from the table being removed, and it will basically block traffic while holding doing this, so if your AHI is very large it can lead to noticeable stalls, thus is best-done off-hours. See https://bugs.mysql.com/bug.php?id=91977 for the gory details.

So, using the new –where option you can reduce the total time and effort necessary to purge the rows and reclaim space, and it will be done with the least possible impact and locking.


Percona offers secure, tested, open source software complete with advanced features like backup, monitoring, and encryption only otherwise found in MySQL Enterprise Edition.

Learn Why Customers Choose Percona for MySQL

Subscribe
Notify of
guest

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
hcymysql

Your PT-OSC command line above is incorrect,the correct one should be: 

####=====================================================
shell> pt-online-schema-change –execute –alter=”ENGINE=INNODB” \
–where=”some_date_column > ‘2023-12-31′” \
–host=127.0.0.1 –port=3306 –user=admin –password=123456 \
–no-drop-old-table –no-drop-new-table –swap-tables –force \
D=mydb,t=mytable
####=====================================================

—–
The mytable table retains the data for the year 2024.
__mytable_new retains the original table data.

lee

This test is probably an example of changing the table after pt-osc, so maybe this example added the –no-swap-tables option,

1. Create New Table & Data Sink
2. Later inspection or Rename processing when no access to the table is made

* Swap-tables are right if you’re going to change it right away