ALTER TABLE: Creating Index by Sort and Buffer Pool SizePeter Zaitsev
Today I was looking at the ALTER TABLE performance with fast index creation and without it with different buffer pool sizes. Results are pretty interesting. I used modified Sysbench table for these tests because original table as initially created only has index on column K which initially contains only zeros, which means index is very small and also very fast to build by insertion as insertions happen in the “end” of index tree. I’ve updated column to have bunch of long random strings update sbtest set c=concat(sha1(rand()),’-‘,sha1(rand()),’-‘,sha1(rand()),’-‘,sha1(rand()),’-‘,sha1(rand())); and added key on column C: alter table sbtest add key c(c); The box I’m using for test is rather old box with 8GB of RAM and 4*SATA disks in RAID10. I used 10mil row table which would look as following in terms of data and index size:
mysql> show table status like "sbtest" \G
*************************** 1. row ***************************
Create_time: 2012-06-27 13:04:56
1 row in set (0.00 sec)
So we have about 3.7GB total size out of which 1.4GB is the Index.
First Lets look at results of Adding and Dropping column for 128M buffer pool, sized to illustrate the case of index being significantly larger than amount of memory.
Blue Columns in this case is the default configuration, second is with expand_fast_index_creation=1 which builds table with primary key only when creating a table and when builds all indexes by sort. The speedup we can observe is over 20 times. Note this benefit only applies to the case when you’re adding/dropping/changing columns as if you just add index it will be done by sort even without this additional Percona Server feature.
Lets now see with 6GB buffer pool which is large enough to fit complete table with index (but is a bit too small to keep 2 copies) In this case we’ll see the following picture:
Comparing these two graphs we can see few different observations:
Buffer Pool Size Matters There is 10x difference between the case when index fits in memory to when it does not. The difference can be even larger in the real world. Working with indexing, such as inserting data in random order can become very slow when index does not fit to memory any more.
Creating Index by sort is great Creating index by sort can provide over 20x performance improvement to the speed of ALTER TABLE, as this is not only step which ALTER TABLE does the improvement to the index creation speed itself has to be even larger. It is also offers better performance for in memory case, even though in this case the gain is just about 2x for this workload. Still substantial.
Sorting needs OS Cache It is not clearly seen from the graphs but with index built by sort enabled the performance of ALTER TABLE with 128MB was better than with 6GB. It looks like this does not make sense but in fact it does if you think about building index stage. On this box with 8GB of memory allocating 6GB to buffer pool made very little memory available for OS cache, so index build process required more IO than if only 128MB were used for buffer pool. The difference was rather small – less than 10% though, which is because disks are not that bad with sequential IO, which is what file merge process uses. This is important illustration as in many cases when people see performance reduction with large buffer pool they forget what bottleneck might be somewhere else where memory resources might be just needed more.
Take Away: If you’re having large tables and need to run ALTER TABLE which rebuilds the table or OPTIMIZE TABLE do not forget to enable expand_fast_index_creation it can make a whole process a lot faster.
P.S The tests were done with Percona Server 5.5.24-26.0