Run an ALTER TABLE for a huge table in Aurora

May 1, 2026
Author
Eduardo Krieg
Share this Post:

Recently, we received an alert for one of our Managed Services customers indicating that the auto_increment value for the table was 80% of its maximum capacity. The column was INT UNSIGNED, which has a limit of 4,294,967,295.

At 80%, we have enough time to change it to BIGINT.…. Right? Let’s see.

So we used pt-online-schema-change to perform the alter.

It started running at a good pace but slowed over time.

 

Why?

Well, let’s look at the definition of the table:

NOTE1: The index on long_column is for a varchar column with a length of 1000; it may not be required, and an index prefix may be more helpful here.

NOTE2: The index idx_short_col2 is duplicated, as it is covered by the index idx_short_column.

Those changes require testing and are out of scope for this emergency, but they are worth mentioning.

 

Table size:

Look at the indexes being way bigger than the data.

While the pt-online-schema-change runs, it copies the data to a new table. As the data is being copied, the secondary indexes must be maintained.

NOTE the huge index for a varchar(1000) that is ~1.5T in size. Maintaining such an index becomes increasingly expensive as the data size increases.

The pt-online-schema-change had been running for ~8 days, and its latest estimate was 53 more days, which we can’t afford, since the maximum value would be exceeded in ~15 days. 

 

So what do we do now?

We suggested canceling the pt-online-schema-change and creating an Aurora blue-green deployment.

Then perform the direct ALTER on the green cluster. And finally, when ready, do the failover.

 

Sounds good, doesn’t it?

 

First, we need to ensure that the new cluster (green) has the replica_type_conversions  parameter in its cluster parameter group to “ALL_NON_LOSSY, ALL_UNSIGNED” in order to be able to replicate from an int unsigned column to a bigint unsigned column.

So we tried that, it started too fast ~0.036% per minute, that’s 2 days. That’s great!

We left the process running over the weekend, but we noticed it started to slow down again… By Monday, it was advancing at ~0.01% every 5 mins, which gives an ETA of 34 days. 

Why? 

Again, using the direct ALTER MySQL copies the data to a temp table, and the bigger the data, the harder it is to maintain the indexes. 

Again, unacceptable.

Note that with the above 2 approaches, we lost ~12 days of precious time, and the deadline for auto_increment exhaustion was approaching.

Then we thought: What if we drop the secondary indexes, do the alter, and then add the indexes back?

In theory, it should be faster, as:

  • Dropping the indexes is a metadata-only operation with ONLINE DDL.
  • Altering the column datatype from INT to BIGINT is not an ONLINE operation, but the fact that it doesn’t have to update secondary indexes during row copying to a new temporary table prevents the slowdown.
  • Adding back the secondary indexes is an ONLINE DDL operation:

 

“Online DDL support for adding secondary indexes means that you can generally speed the overall process of creating and loading a table and associated indexes by creating the table without secondary indexes, then adding secondary indexes after the data is loaded.”

https://dev.mysql.com/doc/refman/8.4/en/innodb-online-ddl-operations.html

So let’s do this:

The deletion of the indexes was really quick, as expected (metadata-only operation):

 

Then the change of the datatype:

 

Looks very promising!!!

 

The final step, add back the indexes:

 

Why?

Well, the INPLACE operation uses the tmp dir to write sort files. In Aurora, there are certain limits for the temporary space based on the instance type

In a regular MySQL instance, we can modify the innodb_tmpdir to another location with enough disk space; however, in Aurora, the parameter is not modifiable, which could have made the whole process easier.

Even with a larger instance type, it’s hard to create the 1.5T index without breaking open the piggy bank.

 

Last resort, add the indexes back with the COPY algorithm:

 

Why does it work? Because ALTER TABLE using the COPY algorithm uses the datadir as the destination for the temporary table, the rows are copied there. It doesn’t have the limitation of the temporary directory mentioned above.

We were able to make it on time about 4 days before the auto_increment exhaustion, preventing downtime.

 

In retrospective we could have used the following approach to avoid the use of the blue/green deployment:

  1. Perform a pt-online-schema-change on the main table, dropping the indexes, and changing the column type to bigint. ( with –no-swap-tables –no-drop-old-table –no-drop-new-table –no-drop-triggers).
  2. Add the secondary indexes using the direct alter with the COPY algorithm in the _new table.
  3. Once the alter finishes, swap the tables and drop the triggers.

 

Conclusion:

What initially looked like an easy task with pt-online-schema-change, ended up being more complex. 

You need to check the data definition, the index sizes, the Aurora limits, and how the different algorithms work to make a decision on the best way to proceed with those tasks, specially on situations like these where you have the pressure of the auto_increment being exhausted and there’s risk of downtime if it is not done on time.

And of course, monitor auto_increment exhaustion for your tables, and use a reasonable threshold that gives you enough time to plan and change the table definition. You can use Percona Monitoring and Management for this, specifically on the MySQL > MySQL Table Details dashboard.

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved