MySQL Data ArchivalWe all know that data is important, and some businesses need historical data to be available all the time. The problem is that queries on large tables perform poorly if they are not properly optimized. We get many customer requests in Managed Services to purge/archive large tables, and to achieve it, we use pt-archiver.

Recently, we received a request to archive a large table, and the customer was worried about the downtime and performance issues during the archival.

We proposed a solution to the customer to archive the table using pt-archive. The idea is to archive old data to other tables and keep the latest data on the current table with minimal performance issues. All of the data will remain available and can be queried anytime.

Percona Toolkit

In the blog, I will not explain how to use the pt-archiver, but we will discuss a use case of the pt-archiver.

If you are unfamiliar with pt-archiver, please refer to “Want to archive tables? Use Percona Toolkit’s pt-archive “; it explains how pt-archiver works and various pt-archiver arguments.

We will test data archival to keep 100 days’ worth of data with a few simple steps for demonstration.

Note: This method uses timestamp datatype to filter the data.

  1. Create two dummy tables.
  2. Insert records in the source table.
  3. Archive the record from the source to the destination table using –where condition per business requirements.
  4. Rename the tables.
  5. Add Pt-archiver as a cron.

This diagram better illustrates the process.

Remember Date and –where the condition in this example is just a reference. Use the archiving condition in pt-archiver as per business requirements.

Let’s create a source table and insert records using mysql_random_data_load:

Let’s create the destination table;

(Using the following table for demonstration purposes only)

Create the following triggers using pt-online-schema-change 

Why triggers?

Any modifications to data in the original tables during the copy will be reflected in the new table because the pt-online-schema-change creates triggers on the original table to update the corresponding rows in the new table. 

(For demonstration purposes, I have added triggers created from the pt-online-schema-change test run)

Verify that all triggers have been created.

The output should be something like 

Copy the last 100 days of data using the pt-archiver. Verify with –dry-run

(Screen session can be used to perform pt-archiver if the table is large in size.)

The output should be something like this:

Let’s execute the pt-archiver:

(Following Pt-archiver will copy 100 days worth of the data to _new table and triggers will up to date the _new table.)

The pt-archiver output should be something like this:

Once the pt-archiver finishes, check the condition code of the pt-archiver:

(should be 0)

The next step is to check if the rows have been inserted into a new table and compare it with the original table. 

NOTE: The below results are just examples of tests; use the right date or where condition:

Why rename tables? 

The idea is to keep 100 days’ worth of data in the new table and rename it as the source table, and the original table with the _archive prefix will have all the data, including the last 100 days.

It should appear something like this:

Drop the triggers created using pt-online-schema-change . 

Once the table rename is completed, copy data from the blogpost table to _archive table.

Let’s add the pt-archiver command to cron to make the process automatic. (It is advisable to create the script, use the below archiver command, and test it.)

Now it is time to check if the cron was successful.

Check if the rows deleted have been copied to the test.blogpost_archive table, and deleted from the test.blogpost:

Verify if the blogpost table has 100 days of data, and the following query should return 0 rows:

Hope you found this use case of the pt-archiver helpful when you need to purge/archive large tables!

Notify of

Inline Feedbacks
View all comments