Improved InnoDB fast index creation

PREVIOUS POST
NEXT POST

One of the serious limitations in the fast index creation feature introduced in the InnoDB plugin is that it only works when indexes are explicitly created using ALTER TABLE or CREATE INDEX. Peter has already blogged about it before, here I’ll just briefly reiterate other cases that might benefit from that feature:

  • when ALTER TABLE does require copying the data into a temporary table, secondary indexes are updated by inserting individual rows rather than sorting;
  • OPTIMIZE TABLE could be faster if secondary indexes were temporarily dropped and then recreated using fast index creation;
  • dumps produced by mysqldump first create tables with all secondary indexes and then load the data, which is also inefficient.

Percona Server as of versions 5.1.56 and 5.5.11 allows utilizing fast index creation for all of the above cases, which can potentially speed them up greatly. This feature is controlled by the expand_fast_index_creation system variable which is OFF by default.

Let’s look at each of the above cases in more detail.

 

ALTER TABLE

By temporarily dropping secondary indexes from the new table before copying the data, and then recreating them later, ALTER TABLE can take advantage of the fast index creation feature even when it has to copy the entire table.

To illustrate this, I have performed a number of simple benchmarks. Let’s start with a table containing 4 million rows and one secondary key:

Let’s trigger a table rebuild by adding a new column and see what execution time is like when the default method is used:

Now let’s see how performance is affected when turning expand_fast_index_creation on. Here and in later examples I’m extending the VARCHAR column to trigger table rebuilds without affecting the table size.

As seen from the SHOW PROFILE output, copying the data to a temporary table without updating indexes took 18 seconds, and about the same time was spent on rebuilding the index using fast index creation. So we have 36 seconds in total which is about 1.7 times faster than updating indexes by insertion.

Let’s see if having more secondary indexes in the table makes any difference:

So with 3 secondary indexes expand_fast_index_creation gave us a 2.6x speedup.

Also note that unlike the default method, where the execution time is proportional to the number of indexes, with fast index creation the time required to copy the data to a temporary table is constant. The reason is that when using merge sort, InnoDB has to scan the clustered index only once, even though the actual sorting is done separately for each index.

The above has a couple of important implications:

  • when the data does not fit in the buffer pool, fast index creation provides even better performance as compared to the default method, because it does not have to do random disk seeks to fetch secondary index pages to the buffer pool. A benchmark is worth a thousand words, so let’s repeat the last test with innodb_buffer_pool_sizeset to approximately 1/10th of the dataset:

    So, as expected, a small buffer pool had a huge impact on ALTER TABLE with the optimization disabled, and absolutely no effect on the optimized case, which resulted in an almost 8x speedup.
  • having tmpdir on a fast storage is essential for expand_fast_index_creation, because temporary files for merge-sorting are created in tmpdir. The constant “copying to tmp table” part will not be affected by a slow tmpdir, but rebuilding the indexes will obviously take longer.

Another important thing that is worth mentioning is fragmentation. Fast index creation results in much less fragmented indexes because records are inserted in the correct order into sequentially allocated pages after merge-sorting. So besides optimizing DDL directly, expand_fast_index_creation may also optimize index access for subsequent DML statements. In my test setup I got about 178 MB index size after fast index creation as reported by SHOW TABLE STATUS versus 265 MB index size with the optimization disabled.

OPTIMIZE TABLE

OPTIMIZE TABLE is mapped to ALTER TABLE ... ENGINE=InnoDB for InnoDB tables and thus, is just a special case of the previous one:

mysqldump

Quoting the fast index creation chapter in the MySQL manual:

“… you can generally speed the overall process of creating and loading
an indexed table by creating the table with only the clustered index,
and adding the secondary indexes after the data is loaded.”

mysqldump in Percona Server supports the new option --innodb-optimize-keys which does just that, i.e. it tries to optimize dumps of InnoDB tables by first creating the table with only the clustered index and adding the secondary indexes after the data dump when possible (see Caveats below).

Let’s compare the restore time for a regular dump with a dump created with --innodb-optimize-keys (the test database contained only the table I used in my previous examples):

Caveats:

As I mentioned previously, InnoDB fast index creation uses temporary files in tmpdir for all indexes being created. So make sure you have enough tmpdir space when using expand_fast_index_creation. It is a session variable, so you can temporarily switch it off if you are short on tmpdir space and/or don’t want this optimization to be used for a specific table.

There’s also a number of cases when this optimization is not applicable:

  • UNIQUE indexes in ALTER TABLE are ignored to enforce uniqueness where necessary when copying the data to a temporary table;
  • ALTER TABLE and OPTIMIZE TABLE always process tables containing foreign keys as if expand_fast_index_creation is OFF to avoid dropping keys that are part of a FOREIGN KEY constraint;
  • mysqldump --innodb-optimize-keys ignores foreign keys because InnoDB requires a full table rebuild on foreign key changes. So adding them back with a separate ALTER TABLE after restoring the data from a dump would actually make the restore slower;
  • mysqldump --innodb-optimize-keys ignores indexes on AUTO_INCREMENT columns, because they must be indexed, so it is impossible to temporarily drop the corresponding index;
  • mysqldump --innodb-optimize-keys ignores the first UNIQUE index on non-nullable columns when the table has no PRIMARY KEY defined, because in this case InnoDB picks such an index as the clustered one.
PREVIOUS POST
NEXT POST

Comments

  1. says

    Alexey,

    Thank you for your work on this feature and now blog post. Great to notice not only this feature allows to optimize/load tables faster but also tables end up to have smaller more efficient indexes.

  2. Steve Jackson says

    Hi Alexey

    expand_fast_index_creation sys var doesnt exist on 5.5.12…

    your blog post says >=5.5.11

    am I missing something?

  3. Alexey Kopytov says

    Hi Steve,

    You are right. Even though the functionality was introduced in 5.1.56 and 5.5.11, it became optional only in 5.1.59 and 5.5.16. That is, in 5.5.12 it is always enabled.

Leave a Reply

Your email address will not be published. Required fields are marked *