One schema optimization we often do is extending index when there are queries which can use more key part. Typically this is safe operation, unless index length increases dramatically queries which can use index can also use prefix of the new index are they ? It turns there are special cases when this is not the case.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CREATE TABLE `idxitest` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `a` int(11) NOT NULL, `b` int(11) NOT NULL, PRIMARY KEY (`id`), KEY `a` (`a`) ) ENGINE=InnoDB AUTO_INCREMENT=6029313 DEFAULT CHARSET=latin1 mysql> select count(*) from idxitest where a=5 and b=5; +----------+ | count(*) | +----------+ | 60434 | +----------+ 1 row in set (0.69 sec) mysql> explain select count(*) from idxitest where a=5 and b=5; +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ | 1 | SIMPLE | idxitest | ref | a | a | 4 | const | 707820 | Using where | +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ 1 row in set (0.00 sec) |
The obvious optimization is to extend index from column (a) to column (a,b) right which will make it faster and should not hurt any other queries a lot, right ?
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
mysql> alter table idxitest drop key a,add key(a,b); Query OK, 0 rows affected (24.84 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select count(*) from idxitest where a=5 and b=5; +----------+ | count(*) | +----------+ | 60434 | +----------+ 1 row in set (0.02 sec) mysql> explain select count(*) from idxitest where a=5 and b=5; +----+-------------+----------+------+---------------+------+---------+-------------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+------+---------+-------------+--------+-------------+ | 1 | SIMPLE | idxitest | ref | a | a | 8 | const,const | 120640 | Using index | +----+-------------+----------+------+---------------+------+---------+-------------+--------+-------------+ 1 row in set (0.00 sec) |
Wow. The query runs 30 times faster – not only because it has to scan less rows but also because it is index covering query now – it does not need to access the data.
It turns out it is too early to celebrate. Application also had another query which previously ran so fast it hardly could be noticed. It however became a lot slower:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
# Old Schema mysql> select * from idxitest where a=100 order by id desc limit 1; +---------+-----+---+ | id | a | b | +---------+-----+---+ | 3000000 | 100 | 7 | +---------+-----+---+ 1 row in set (0.00 sec) mysql> explain select * from idxitest where a=100 order by id desc limit 1; +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ | 1 | SIMPLE | idxitest | ref | a | a | 4 | const | 126074 | Using where | +----+-------------+----------+------+---------------+------+---------+-------+--------+-------------+ 1 row in set (0.00 sec) # new Schema mysql> select * from idxitest where a=100 order by id desc limit 1; +---------+-----+---+ | id | a | b | +---------+-----+---+ | 3000000 | 100 | 7 | +---------+-----+---+ 1 row in set (1.01 sec) mysql> explain select * from idxitest where a=100 order by id desc limit 1; +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | idxitest | index | a | PRIMARY | 4 | NULL | 36 | Using where | +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) # The plan also can look something like this: mysql> explain select * from idxitest where a=100 order by id desc limit 1; +----+-------------+----------+------+---------------+------+---------+-------+------+------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+------+---------------+------+---------+-------+------+------------------------------------------+ | 1 | SIMPLE | idxitest | ref | a | a | 4 | const | 1 | Using where; Using index; Using filesort | +----+-------------+----------+------+---------------+------+---------+-------+------+------------------------------------------+ 1 row in set (0.01 sec) |
So why did this query become so much slower ? The reason is its plan benefits from Innodb specific feature – index entries being sorted by primary key for each complete key value. So when you have index (a) and id is a primary key the real index is (a,id) when we extend index to (a,b) it really becomes (a,b,id). So if there is a query which used both a and id key part from original index it will quite likely be unable to use new index to full extent.
What is solution ? You can have “redundant” indexes on (a) and (a,b) at the same time. This is something what suppose to work but it well often does not:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
CREATE TABLE `idxitest` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `a` int(11) NOT NULL, `b` int(11) NOT NULL, PRIMARY KEY (`id`), KEY `a` (`a`), KEY `a_2` (`a`,`b`) ) ENGINE=InnoDB AUTO_INCREMENT=6029313 DEFAULT CHARSET=latin1 mysql> select * from idxitest where a=100 order by id desc limit 1; +---------+-----+---+ | id | a | b | +---------+-----+---+ | 3000000 | 100 | 7 | +---------+-----+---+ 1 row in set (1.03 sec) mysql> explain select * from idxitest where a=100 order by id desc limit 1; +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | idxitest | index | a,a_2 | PRIMARY | 4 | NULL | 2247 | Using where | +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) |
MySQL Optimizer considers using both (a) and (a,b) indexes and in the end decides to use neither rather doing full index scan until it finds a=100. This looks like an optimizer glitch in this case because it estimates it will scan 2247 rows in the selected plan, while using (a) index you can get result scanning only 1 row guaranteed.
To really get things going you will need to use FORCE INDEX(a) to force MySQL optimizer using right plan.
These results mean you should be very careful applying index changes from mk-duplicate-key-checker key checker when it comes to redundant indexes. If you have query plans depending on Innodb ordering of data by primary key inside indexes they can become significantly affected.
Optimizer behavior may be different in different MySQL versions. These tests were done with 5.1.45 though I’ve seen same behavior with MySQL 5.0 too.