November 22, 2014

Extending Index for Innodb tables can hurt performance in a surprising way

One schema optimization we often do is extending index when there are queries which can use more key part. Typically this is safe operation, unless index length increases dramatically queries which can use index can also use prefix of the new index are they ? It turns there are special cases when this is not the case.

The obvious optimization is to extend index from column (a) to column (a,b) right which will make it faster and should not hurt any other queries a lot, right ?

Wow. The query runs 30 times faster – not only because it has to scan less rows but also because it is index covering query now – it does not need to access the data.

It turns out it is too early to celebrate. Application also had another query which previously ran so fast it hardly could be noticed. It however became a lot slower:

So why did this query become so much slower ? The reason is its plan benefits from Innodb specific feature – index entries being sorted by primary key for each complete key value. So when you have index (a) and id is a primary key the real index is (a,id) when we extend index to (a,b) it really becomes (a,b,id). So if there is a query which used both a and id key part from original index it will quite likely be unable to use new index to full extent.

What is solution ? You can have “redundant” indexes on (a) and (a,b) at the same time. This is something what suppose to work but it well often does not:

MySQL Optimizer considers using both (a) and (a,b) indexes and in the end decides to use neither rather doing full index scan until it finds a=100. This looks like an optimizer glitch in this case because it estimates it will scan 2247 rows in the selected plan, while using (a) index you can get result scanning only 1 row guaranteed.

To really get things going you will need to use FORCE INDEX(a) to force MySQL optimizer using right plan.

These results mean you should be very careful applying index changes from mk-duplicate-key-checker key checker when it comes to redundant indexes. If you have query plans depending on Innodb ordering of data by primary key inside indexes they can become significantly affected.

Optimizer behavior may be different in different MySQL versions. These tests were done with 5.1.45 though I’ve seen same behavior with MySQL 5.0 too.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Hi Peter,
    I’ve seen inconsistent behavior in MySQL/InnoDB with regard to PK being suffix of all indexes. That is, many times execution plans did not go as plan, just as you presented.
    “Index entries being sorted by primary key for each complete key value.” — although tests have (almost) always supported this, and despite the fact I sometimes rely on this for query optimizations, I was unsuccessful in finding documentation for that. Is this feature documented? Is it declared to be so? Or is it just current implementation? I’ll be grateful if you could provide with the docs.

    I may write about this in the future: I found temporary tables to behave differently with InnoDB plugin compressed format — but, again, inconsistently.

    Regards

  2. peter says:

    Shlomi,

    I’m not sure if this is documented. Though recent versions of MySQL optimizer relay on this behavior in multiple places. There also were number of bugs and support for this relatively recent – I think even in early MySQL 5.0 you would need to add id explicitly in the end of the index if you would like to sort by it.

    It is part of Innodb design to store rows by PK for each key value, I believe it would be a major change to change it.

  3. Peter, thanks.

    PS, instead of using FORCE INDEX, what I do is actually drop the “ORDER BY id”, when the desired order is ASC, and just rely on the fact that rows are retrieved by that order.

  4. peter says:

    Shlomi,

    Yes… This is why I used DESC in example. It is actually what was in the real query too :)

  5. No offence :D

  6. Rob Wultsch says:

    It seems like a more ideal way for Innodb to deal with secondary indexes is to explicitly require that the PK be set. I guess I am becoming an enemy of all things implicit.

  7. simon says:

    Not quite understand one thing here:

    SELECT * FROM idxitest WHERE a=100 ORDER BY id DESC LIMIT 1;

    As we know, the primary key is also the clustered index in InnoDB. If MySQL want to use the primary key to query the above statement, it can’t use the id column in the clustered index, is this right? IMO, Mysql has to scan all the index leaves to find all records with a = 100. At the same time, all the leaves are just the data pages. So in this situation, is there any difference between the full index scan and full table scan?

    Thanks.

  8. peter says:

    Simon,

    If there is only key on column (a) for each a value rows will be sorted by primary key in the data pages so they can be retrieved in the sorted order.

    Also you’re right in Innodb scanning PRIMARY KEY completely and full table scan is the same thing.

  9. simon says:

    Thanks peter, that’s more clear.

  10. Raghu Sastry says:

    To support what has been said above, i found this behavior and strengthened my understanding of the index usage. This was tested on 5.5/windows.

    CREATE TABLE cust2(
    loginname VARCHAR(14) NOT NULL,
    custname VARCHAR(255) NOT NULL,
    salary FLOAT NOT NULL,
    PRIMARY KEY uidx_name (custname,loginname),
    KEY idx_sal(salary),
    KEY idx_namesal (custname,salary)
    ) ENGINE=INNODB;

    INSERT INTO cust2 (custname,salary,loginname) VALUES (‘raghu’,1000,’raghu’),(‘imran’,2000,’imran’), (‘doofus’,500,’doofus’),(‘joey’,400,’joey’),(‘phoebe’,200,’phoebe’),(‘monica’,200,’monica’)

    EXPLAIN SELECT * FROM cust2 WHERE loginname=’raghu’ OR custname=’joey’

    initially i thought the primary key would be used (deliberately reversed the order in the where clause), but the index that was used was idx_sal. Needless to say, this index actually expanded to (sal,pk ) in this case sal,custname,loginname.

    “id”,”select_type”,”table”,”type”,”possible_keys”,”key”,”key_len”,”ref”,”rows”,”Extra”
    “1”,”SIMPLE”,”cust2″,”index”,”PRIMARY,idx_namesal”,”idx_sal”,”4″,\N,”6″,”Using where; Using index”

    reversing the order still used the same index
    EXPLAIN SELECT * FROM cust2 WHERE custname=’raghu’ OR loginname =’raghu’
    “id”,”select_type”,”table”,”type”,”possible_keys”,”key”,”key_len”,”ref”,”rows”,”Extra”
    “1”,”SIMPLE”,”cust2″,”index”,”PRIMARY,idx_namesal”,”idx_sal”,”4″,\N,”6″,”Using where; Using index”

    but changing the operator to “AND” made it use the primary key.
    EXPLAIN SELECT * FROM cust2 WHERE custname=’raghu’ AND loginname =’raghu’

    “id”,”select_type”,”table”,”type”,”possible_keys”,”key”,”key_len”,”ref”,”rows”,”Extra”
    “1”,”SIMPLE”,”cust2″,”const”,”PRIMARY,idx_namesal”,”PRIMARY”,”811″,”const,const”,”1″,””

    This indeed proves that we not only need to verify your queries but index behavior as well.

    Thanks!

Speak Your Mind

*