ALTER TABLE: Creating Index by Sort and Buffer Pool Size

ALTER TABLE: Creating Index by Sort and Buffer Pool Size


Today I was looking at the ALTER TABLE performance with fast index creation and without it with different buffer pool sizes. Results are pretty interesting. I used modified Sysbench table for these tests because original table as initially created only has index on column K which initially contains only zeros, which means index is very small and also very fast to build by insertion as insertions happen in the “end” of index tree. I’ve updated column to have bunch of long random strings update sbtest set c=concat(sha1(rand()),’-‘,sha1(rand()),’-‘,sha1(rand()),’-‘,sha1(rand()),’-‘,sha1(rand())); and added key on column C: alter table sbtest add key c(c); The box I’m using for test is rather old box with 8GB of RAM and 4*SATA disks in RAID10. I used 10mil row table which would look as following in terms of data and index size:

So we have about 3.7GB total size out of which 1.4GB is the Index.

First Lets look at results of Adding and Dropping column for 128M buffer pool, sized to illustrate the case of index being significantly larger than amount of memory.

Blue Columns in this case is the default configuration, second is with expand_fast_index_creation=1 which builds table with primary key only when creating a table and when builds all indexes by sort. The speedup we can observe is over 20 times. Note this benefit only applies to the case when you’re adding/dropping/changing columns as if you just add index it will be done by sort even without this additional Percona Server feature.

Lets now see with 6GB buffer pool which is large enough to fit complete table with index (but is a bit too small to keep 2 copies) In this case we’ll see the following picture:

Comparing these two graphs we can see few different observations:

Buffer Pool Size Matters There is 10x difference between the case when index fits in memory to when it does not. The difference can be even larger in the real world. Working with indexing, such as inserting data in random order can become very slow when index does not fit to memory any more.

Creating Index by sort is great Creating index by sort can provide over 20x performance improvement to the speed of ALTER TABLE, as this is not only step which ALTER TABLE does the improvement to the index creation speed itself has to be even larger. It is also offers better performance for in memory case, even though in this case the gain is just about 2x for this workload. Still substantial.

Sorting needs OS Cache It is not clearly seen from the graphs but with index built by sort enabled the performance of ALTER TABLE with 128MB was better than with 6GB. It looks like this does not make sense but in fact it does if you think about building index stage. On this box with 8GB of memory allocating 6GB to buffer pool made very little memory available for OS cache, so index build process required more IO than if only 128MB were used for buffer pool. The difference was rather small – less than 10% though, which is because disks are not that bad with sequential IO, which is what file merge process uses. This is important illustration as in many cases when people see performance reduction with large buffer pool they forget what bottleneck might be somewhere else where memory resources might be just needed more.

Take Away: If you’re having large tables and need to run ALTER TABLE which rebuilds the table or OPTIMIZE TABLE do not forget to enable expand_fast_index_creation it can make a whole process a lot faster.

P.S The tests were done with Percona Server 5.5.24-26.0


Share this post

Comments (5)

  • Bill Karwin Reply

    expand_fast_index_creation is a Percona Server feature, introduced in 5.1.56 and 5.5.11.

    June 27, 2012 at 3:13 pm
  • Sheeri Cabral Reply

    If this makes everything faster, what’s the reason for not having it on by default?

    September 18, 2012 at 12:00 pm
  • Peter Zaitsev Reply


    Creating index by sort had multiple bugs over the years and we want move to Percona Server to be as risk free as possible. You will not get all performance gains until you mess with options but risks are low. This is also one of those things which are not always better – I’ve seen cases when it actually slower – if data size is small enough you may have the case when data will fit in memory while sort-merge file will not because it uses fixed length which can take a lot more space for varchars.

    September 18, 2012 at 12:21 pm
  • Sheeri Cabral Reply

    That’s helpful, thanx!

    September 18, 2012 at 2:34 pm
  • Sravan Reply

    I am trying to load the data to the table which has no indexes, in first step i am populating the temporary table then i am trying to load the data by selecting from temporary table then insert into the main is very slow for 12 million records its taking almost 8 hrs of period 5 years, i have tried with divide into 5 chunks one for each year, even its taking around 2 hrs per chunk. Its taking too much time in the state of “sending” and “sort index creating”. please suggest me to sort it out and increase the performance.

    May 15, 2014 at 4:17 am

Leave a Reply