Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live. Register

Downloads

Blog

Run an ALTER TABLE for a huge table in Aurora

May 1, 2026

Author

Share this Post:

Recently, we received an alert for one of our Managed Services customers indicating that the auto_increment value for the table was 80% of its maximum capacity. The column was INT UNSIGNED, which has a limit of 4,294,967,295.

At 80%, we have enough time to change it to BIGINT.…. Right? Let’s see.

So we used pt-online-schema-change to perform the alter.

It started running at a good pace but slowed over time.

Why?

Well, let’s look at the definition of the table:

mysql> show create table myschema.mytableG
*************************** 1. row ***************************
       Table: mytable
Create Table: CREATE TABLE `mytable` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `long_column` varchar(1000) NOT NULL,
  `state` tinyint unsigned NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `short_column` varchar(30) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_long_column` (`long_column`,`state`),
  KEY `idx_short_column` (`short_column`,`state`),
  KEY `idx_short_col2` (`short_column`)
) ENGINE=InnoDB AUTO_INCREMENT=4009973818 DEFAULT CHARSET=utf8mb3

mysql> show create table myschema.mytableG

*************************** 1. row ***************************

Table: mytable

Create Table: CREATE TABLE `mytable` (

`id` int unsigned NOT NULL AUTO_INCREMENT,

`long_column` varchar(1000) NOT NULL,

`state` tinyint unsigned NOT NULL,

`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

`short_column` varchar(30) NOT NULL,

PRIMARY KEY (`id`),

KEY `idx_long_column` (`long_column`,`state`),

KEY `idx_short_column` (`short_column`,`state`),

KEY `idx_short_col2` (`short_column`)

) ENGINE=InnoDB AUTO_INCREMENT=4009973818 DEFAULT CHARSET=utf8mb3

NOTE1: The index on long_column is for a varchar column with a length of 1000; it may not be required, and an index prefix may be more helpful here.

NOTE2: The index idx_short_col2 is duplicated, as it is covered by the index idx_short_column.

Those changes require testing and are out of scope for this emergency, but they are worth mentioning.

Table size:

+---------------+------------+------------+---------+----------+---------+----------+--------+
| TABLE_SCHEMA  | TABLE_NAME | TABLE_ROWS | DATA_GB | INDEX_GB | FREE_GB | TOTAL_GB | ENGINE |
+---------------+------------+------------+---------+----------+---------+----------+--------+
| myschema      | mytable    | 3906921584 |    1118 |     1790 |       0 |     2907 | InnoDB |
+---------------+------------+------------+---------+----------+---------+----------+--------+

+---------------+------------+------------+---------+----------+---------+----------+--------+

+---------------+------------+------------+---------+----------+---------+----------+--------+

| myschema | mytable | 3906921584 | 1118 | 1790 | 0 | 2907 | InnoDB |

+---------------+------------+------------+---------+----------+---------+----------+--------+

Look at the indexes being way bigger than the data.

mysql> SELECT database_name, table_name, index_name, ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 2) AS size_in_mb FROM mysql.innodb_index_stats WHERE stat_name = 'size' AND index_name != 'PRIMARY' and database_name='myschema' and table_name='mytable' ORDER BY size_in_mb DESC;
+---------------+------------+-------------------+------------+
| database_name | table_name | index_name        | size_in_mb |
+---------------+------------+-------------------+------------+
| myschema      | mytable    | idx_long_column   | 1583538.95 |
| myschema      | mytable    | idx_short_column  |  126432.98 |
| myschema      | mytable    | idx_short_col2    |  122699.95 |
+---------------+------------+-------------------+------------+
3 rows in set (0.01 sec)

mysql> SELECT database_name, table_name, index_name, ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 2) AS size_in_mb FROM mysql.innodb_index_stats WHERE stat_name = 'size' AND index_name != 'PRIMARY' and database_name='myschema' and table_name='mytable' ORDER BY size_in_mb DESC;

+---------------+------------+-------------------+------------+

+---------------+------------+-------------------+------------+

+---------------+------------+-------------------+------------+

3 rows in set (0.01 sec)

While the pt-online-schema-change runs, it copies the data to a new table. As the data is being copied, the secondary indexes must be maintained.

NOTE the huge index for a varchar(1000) that is ~1.5T in size. Maintaining such an index becomes increasingly expensive as the data size increases.

The pt-online-schema-change had been running for ~8 days, and its latest estimate was 53 more days, which we can’t afford, since the maximum value would be exceeded in ~15 days.

Copying `myschema`.`mytable`:  12% 53+16:48:01 remain
Copying `myschema`.`mytable`:  12% 53+16:48:30 remain
Copying `myschema`.`mytable`:  12% 53+16:48:59 remain
Copying `myschema`.`mytable`:  12% 53+16:49:26 remain
Copying `myschema`.`mytable`:  12% 53+16:49:53 remain
Copying `myschema`.`mytable`:  12% 53+16:50:19 remain
Copying `myschema`.`mytable`:  12% 53+16:50:49 remain
Copying `myschema`.`mytable`:  12% 53+16:51:17 remain
Copying `myschema`.`mytable`:  12% 53+16:51:45 remain

Copying `myschema`.`mytable`: 12% 53+16:48:01 remain

Copying `myschema`.`mytable`: 12% 53+16:48:30 remain

Copying `myschema`.`mytable`: 12% 53+16:48:59 remain

Copying `myschema`.`mytable`: 12% 53+16:49:26 remain

Copying `myschema`.`mytable`: 12% 53+16:49:53 remain

Copying `myschema`.`mytable`: 12% 53+16:50:19 remain

Copying `myschema`.`mytable`: 12% 53+16:50:49 remain

Copying `myschema`.`mytable`: 12% 53+16:51:17 remain

Copying `myschema`.`mytable`: 12% 53+16:51:45 remain

So what do we do now?

We suggested canceling the pt-online-schema-change and creating an Aurora blue-green deployment.

Then perform the direct ALTER on the green cluster. And finally, when ready, do the failover.

Sounds good, doesn’t it?

First, we need to ensure that the new cluster (green) has the replica_type_conversions parameter in its cluster parameter group to “ALL_NON_LOSSY, ALL_UNSIGNED” in order to be able to replicate from an int unsigned column to a bigint unsigned column.

So we tried that, it started too fast ~0.036% per minute, that’s 2 days. That’s great!

We left the process running over the weekend, but we noticed it started to slow down again… By Monday, it was advancing at ~0.01% every 5 mins, which gives an ETA of 34 days.

Why?

Again, using the direct ALTER MySQL copies the data to a temp table, and the bigger the data, the harder it is to maintain the indexes.

Again, unacceptable.

Note that with the above 2 approaches, we lost ~12 days of precious time, and the deadline for auto_increment exhaustion was approaching.

Then we thought: What if we drop the secondary indexes, do the alter, and then add the indexes back?

In theory, it should be faster, as:

Dropping the indexes is a metadata-only operation with ONLINE DDL.
Altering the column datatype from INT to BIGINT is not an ONLINE operation, but the fact that it doesn’t have to update secondary indexes during row copying to a new temporary table prevents the slowdown.
Adding back the secondary indexes is an ONLINE DDL operation:

“Online DDL support for adding secondary indexes means that you can generally speed the overall process of creating and loading a table and associated indexes by creating the table without secondary indexes, then adding secondary indexes after the data is loaded.”

https://dev.mysql.com/doc/refman/8.4/en/innodb-online-ddl-operations.html

So let’s do this:

The deletion of the indexes was really quick, as expected (metadata-only operation):

mysql> ALTER TABLE myschema.mytable DROP INDEX idx_long_column, DROP INDEX idx_short_column, DROP INDEX idx_short_col2;
Query OK, 0 rows affected (49.40 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE myschema.mytable DROP INDEX idx_long_column, DROP INDEX idx_short_column, DROP INDEX idx_short_col2;

Query OK, 0 rows affected (49.40 sec)

Records: 0 Duplicates: 0 Warnings: 0

Then the change of the datatype:

mysql> ALTER TABLE myschema.mytable CHANGE COLUMN id id bigint unsigned NOT NULL AUTO_INCREMENT;
Query OK, 4058047205 rows affected (13 hours 9 min 10.62 sec)
Records: 4058047205  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE myschema.mytable CHANGE COLUMN id id bigint unsigned NOT NULL AUTO_INCREMENT;

Query OK, 4058047205 rows affected (13 hours 9 min 10.62 sec)

Records: 4058047205 Duplicates: 0 Warnings: 0

Looks very promising!!!

The final step, add back the indexes:

mysql> ALTER TABLE myschema.mytable ADD INDEX `idx_long_column` (`long_column`,`state`), ADD INDEX `idx_short_column` (`short_column`,`state`), ADD INDEX `short_col2` (`short_column`);
ERROR 1878 (HY000): Temporary file write failure.

1 2	mysql> ALTER TABLE myschema.mytable ADD INDEX `idx_long_column` (`long_column`,`state`), ADD INDEX `idx_short_column` (`short_column`,`state`), ADD INDEX `short_col2` (`short_column`); ERROR 1878 (HY000): Temporary file write failure.

Why?

Well, the INPLACE operation uses the tmp dir to write sort files. In Aurora, there are certain limits for the temporary space based on the instance type.

In a regular MySQL instance, we can modify the innodb_tmpdir to another location with enough disk space; however, in Aurora, the parameter is not modifiable, which could have made the whole process easier.

Even with a larger instance type, it’s hard to create the 1.5T index without breaking open the piggy bank.

Last resort, add the indexes back with the COPY algorithm:

mysql> ALTER TABLE myschema.mytable ALGORITHM=COPY, ADD INDEX `idx_long_column` (`long_column`,`state`), ADD INDEX `idx_short_column` (`short_column`,`state`), ADD INDEX `idx_short_col2` (`short_column`);
Query OK, 4147498819 rows affected (6 days 1 hour 55 min 57.00 sec)
<span style="font-weight: 400;">Records: 4147498819  Duplicates: 0  Warnings: 0</span>

mysql> ALTER TABLE myschema.mytable ALGORITHM=COPY, ADD INDEX `idx_long_column` (`long_column`,`state`), ADD INDEX `idx_short_column` (`short_column`,`state`), ADD INDEX `idx_short_col2` (`short_column`);

Query OK, 4147498819 rows affected (6 days 1 hour 55 min 57.00 sec)

<span style="font-weight: 400;">Records: 4147498819 Duplicates: 0 Warnings: 0</span>

Why does it work? Because ALTER TABLE using the COPY algorithm uses the datadir as the destination for the temporary table, the rows are copied there. It doesn’t have the limitation of the temporary directory mentioned above.

We were able to make it on time about 4 days before the auto_increment exhaustion, preventing downtime.

In retrospective we could have used the following approach to avoid the use of the blue/green deployment:

Perform a pt-online-schema-change on the main table, dropping the indexes, and changing the column type to bigint. ( with –no-swap-tables –no-drop-old-table –no-drop-new-table –no-drop-triggers).
Add the secondary indexes using the direct alter with the COPY algorithm in the _new table.
Once the alter finishes, swap the tables and drop the triggers.

Conclusion:

What initially looked like an easy task with pt-online-schema-change, ended up being more complex.

You need to check the data definition, the index sizes, the Aurora limits, and how the different algorithms work to make a decision on the best way to proceed with those tasks, specially on situations like these where you have the pressure of the auto_increment being exhausted and there’s risk of downtime if it is not done on time.

And of course, monitor auto_increment exhaustion for your tables, and use a reasonable threshold that gives you enough time to plan and change the table definition. You can use Percona Monitoring and Management for this, specifically on the MySQL > MySQL Table Details dashboard.

5 1 vote

Article Rating

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

tom

18 days ago

The lock-up time is very long. I am curious about how to persuade customers to accept such a long lock-up time.
And if you consider creating a new table, inserting data, and renaming the table, will it be faster?

Author

Eduardo Krieg

17 days ago

Reply to tom

HiTom, this is why we used a blue/green deployment, as the lock table in the live production environment is not acceptable.
Also, for the other approach using pt-osc, and then adding the indexes on the _new table, the live table wouldn’t be blocked.
In your approach, the insert of the data would have taken long since it has to maintain the secondary indexes, in fact, it is what the first pt-online-schema-change would do.