Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

The Optimization That (Often) Isn’t: Index Merge Intersection

December 14, 2012

Author

Ernie Souhrada

Insight for DBAs

Insight for Developers

MySQL

Share this Post:

Prior to version 5.0, MySQL could only use one index per table in a given query without any exceptions; folks that didn’t understand this limitation would often have tables with lots of single-column indexes on columns which commonly appeared in their WHERE clauses, and they’d wonder why the EXPLAIN plan for a given SELECT would show N possible index choices but only one index actually used.

Index Merge Intersection

To some extent, MySQL 5.0 and later changed this situation with the introduction of the “index merge” optimizer plan. The basic idea behind index merge is that for certain types of queries which contain WHERE clauses with columns that had single-column indexes on them, MySQL could sometimes make use of the multiple indexes. For instance, “SELECT foo FROM bar WHERE indexed_colA = X OR indexed_colB = Y” might use the index merge union algorithm, which would *simultaneously* (this is important, as we will see below) scan the indexes on indexed_colA and indexed_colB and then do a set-theoretic union of the two result sets. Replace that “OR” with an “AND” and you might see the set-theoretic intersection of the result sets. Sounds like this could be a significant performance win, perhaps. Sometimes it is. Other times, it is a major performance killer.

It’s fairly straightforward to tell when MySQL is doing this; run an EXPLAIN on your SELECT and you’ll see “index_merge” as the type of query and the type of index merge algorithm in the “Extra” field. Here’s an example of a table and a query which I ran into recently that did indeed attempt to use index_merge, but with disappointing results.

CREATE TABLE users (
  user_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  parent_id INT NOT NULL DEFAULT 0,
  status TINYINT UNSIGNED NOT NULL DEFAULT 0,
  user_type TINYINT UNSIGNED NOT NULL DEFAULT 0,
  username VARCHAR(20),
  ... other columns here ...
  PRIMARY KEY(user_id),
  INDEX `parent_id` (parent_id),
  INDEX `status` (status),
  INDEX `user_type` (user_type)
) ENGINE=InnoDB;

SELECT user_id FROM user WHERE user_type=2 AND status=1 AND parent_id=0 AND user_id > 2938575 ORDER BY user_id LIMIT 1;

CREATE TABLE users (

user_id INT UNSIGNED NOT NULL AUTO_INCREMENT,

parent_id INT NOT NULL DEFAULT 0,

status TINYINT UNSIGNED NOT NULL DEFAULT 0,

user_type TINYINT UNSIGNED NOT NULL DEFAULT 0,

username VARCHAR(20),

... other columns here ...

PRIMARY KEY(user_id),

INDEX `parent_id` (parent_id),

INDEX `status` (status),

INDEX `user_type` (user_type)

) ENGINE=InnoDB;

SELECT user_id FROM user WHERE user_type=2 AND status=1 AND parent_id=0 AND user_id > 2938575 ORDER BY user_id LIMIT 1;

The users table in question here had approximately 4.5M rows, and an EXPLAIN produced the following execution plan:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: users
         type: index_merge
possible_keys: PRIMARY,parent_id,status,user_type
          key: user_type,status,parent_id
      key_len: 2,2,4
          ref: NULL
         rows: 8100
        Extra: Using intersect(user_type,status,parent_id); Using where; Using index; Using filesort

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: users

type: index_merge

possible_keys: PRIMARY,parent_id,status,user_type

key: user_type,status,parent_id

key_len: 2,2,4

ref: NULL

rows: 8100

Extra: Using intersect(user_type,status,parent_id); Using where; Using index; Using filesort

At first glance, this might not look too bad. MySQL is using three different indexes to search approximately 8100 out of 4.5M rows to return the data we want. Since we’re only requesting the user_id column, which is the PK for this table and implicitly appended to each secondary index, our merged indexes are covering for the query, so that’s also good. When we ran it, however, it took 3 seconds. Not only that, but it came up at the top of a pt-query-digest report which showed that it was the most commonly executed query on the server by two orders of magnitude. Other queries were running maybe 1000 or 2000 times, but this one?

# Rank Query ID           Response time      Calls  R/Call   Apdx V/M   It
# ==== ================== ================== ====== ======== ==== ===== ==
#    1 0xCF90A421D9FB8717 1022239.0748 66.9% 311599   3.2806 0.50  0.02 SELECT user

# Rank Query ID Response time Calls R/Call Apdx V/M It

# ==== ================== ================== ====== ======== ==== ===== ==

# 1 0xCF90A421D9FB8717 1022239.0748 66.9% 311599 3.2806 0.50 0.02 SELECT user

Note the low V/M here. This query *always* takes about 3.3 seconds to execute. Yet it appears so benign. What gives?

One word: selectivity. Out of 4.5M rows in the table, 4M of them had status = 1, and 4.4M of them had parent_id = 0. Only 17K, however, matched user_type = 2. It’s worth noting here that the (parent_id, status, user_type) = (0, 1, 2) tuple chosen by pt-query-digest as the sample query wasn’t a fluke or unfortunately-chosen representative; in talking to the customer, I discovered that it’s this particular set of conditions which was *always* used by this query. We wouldn’t see, for instance, this query looking for a different status or a different user type. When we think about what this means in the context of index merge, it means that MySQL is really looking at a search space closer to 8.5M rows, not 8100, because it has to read the index entries for each of the columns in the WHERE clause that are included in this merge operation and then perform a set intersection on the results. Here’s one situation where trying to do multiple things at the same time ends up taking longer – if we stopped after retrieving the 17k rows matching user_type=2, we’d save ourselves a lot of work. Ooops.

The good news is that once identified, there are multiple ways to fix this issue, and they are all straightforward to implement. The bad (sort of) news is that each situation is going to be different, and there isn’t likely to be a one-size-fits-all silver bullet solution.

For example, one option would be to simply adjust the optimizer_switch configuration setting and disable index_merge_intersection. It’s a dynamically-adjustable setting and can be modified globally or per-session:

SET [GLOBAL|SESSION] optimizer_switch="index_merge_intersection=off";

1	SET [GLOBAL\|SESSION] optimizer_switch="index_merge_intersection=off";

This is the database equivalent of kicking the door in when you’ve forgotten your keys. Yes, it’s quick, and it will get you back in the house, but now you have no door, and your cats might get out. Or your neighbors will see you walking around in your underwear. Or any of a panoply of other unintended and potentially undesirable consequences, many of which you might not see right away. Disabling this setting might speed up this query (it did), but it might also negatively impact other queries which were benefitting from this optimization (it did that, too). Effective for quick testing (e.g., to find out if index merge is actually a problem) but probably not the best way to go for a permanent solution.

Another option is, of course, to change the indexing on the table. For instance, if 4.4M out of 4.5M rows have parent_id = 0, we might be tempted to remove the index on parent_id if queries for parent_id != 0 could be satisfied in some other way. After all, if an index isn’t there, MySQL can’t attempt to use it, but this has the same challenges that are present with the previous solution. What we do in order to help one query execute more quickly may end up creating unintended negative consequences elsewhere.

We could rewrite the query as a sub-select. This will work properly, and it won’t interfere with other queries, but IMHO it’s not very elegant and it’s a bit messy:

SELECT user_id FROM 
  (SELECT user_id,parent_id,status FROM users WHERE user_type=2 
   AND user_id > 2938575 ORDER BY user_id) t 
 WHERE t.parent_id=0 AND t.status = 1 LIMIT 1

SELECT user_id FROM

(SELECT user_id,parent_id,status FROM users WHERE user_type=2

AND user_id > 2938575 ORDER BY user_id) t

WHERE t.parent_id=0 AND t.status = 1 LIMIT 1

Finally, we can use index hints. Index hints are exactly what they sound like – “help” for the optimizer to favor or disfavor a particular index or set of indexes so that the query executes in a specific fashion. Index hints are fairly unintrusive and can be applied on a query-by-query basis, so there’s no need to go mucking about with server configuration, altering the table structure, or creating messy sub-selects. In cases where the optimizer just can’t get things right, and you, as the developer or DBA, know exactly how the server should be finding and manipulating data, they can be quite effective.

In the case of this query, since we know that it’s the index on (user_type) that is the one that MySQL should be looking at, we can rewrite the query like so, with the following result:

SELECT user_id FROM users USE INDEX(user_type) WHERE user_type=2 
     AND user_id > 2938575 AND parent_id=0 AND status=1 ORDER BY user_id LIMIT 1;

mysql> EXPLAIN SELECT user_id FROM users USE INDEX(user_type) 
       WHERE user_type=2 AND user_id > 2938575 AND parent_id=0 AND status=1 
       ORDER BY user_id LIMIT 1G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: users
         type: ref
possible_keys: user_type
          key: user_type
      key_len: 2
          ref: const
         rows: 32404
        Extra: Using where
1 row in set (0.00 sec)

SELECT user_id FROM users USE INDEX(user_type) WHERE user_type=2

AND user_id > 2938575 AND parent_id=0 AND status=1 ORDER BY user_id LIMIT 1;

mysql> EXPLAIN SELECT user_id FROM users USE INDEX(user_type)

WHERE user_type=2 AND user_id > 2938575 AND parent_id=0 AND status=1

ORDER BY user_id LIMIT 1G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: users

type: ref

possible_keys: user_type

key: user_type

key_len: 2

ref: const

rows: 32404

Extra: Using where

1 row in set (0.00 sec)

Making this change turns what was a 3 second query into a millisecond query. It’s hard to argue with success, although in many cases I recommend against using index hints for one very simple reason: stuff happens. Just a few possibilities:

1. A new version of MySQL is released that has some new optimizer functionality that you end up missing out on.

1. Junior developers might see USE INDEX() and suddenly start believing that they can always outsmart the optimizer by manually manipulating indexes in every query they write (not that I have ever seen this happen or anything).

1. The person that originally added the USE INDEX() leaves the company and take the knowledge and rationale behind it out the door.

1. The data change so much such that the index hint is now doing more harm than good.

Caveats aside, there are some very good reasons to use index hints for this particular query, and it was ultimately the approach I suggested as a short/medium-term solution while they did some re-work on the process as a whole. When I came back the next day and we looked at the Cacti reports, the “InnoDB rows examined” graph had dropped several orders of magnitude and CPU utilization on the machine had dropped by about 2/3. Not a bad outcome at all.

And so, what do we learn here?

- Just because the output from EXPLAIN “appears to look good” doesn’t mean that it actually is good.

- Whenever you see index_merge_intersection, it’s a good time to make sure that the indexes involved have good selectivity (a good practice in general, to be sure).

- As they say in the Perl community, there’s more than one way to do it. Otherwise known as beware the law of unintended consequences.

- Sometimes we have to sacrifice “elegance” (IMHO, index hints are inelegant and hackish) to produce the best possible outcome. C’est la vie.

0 0 votes

Article Rating

12 Comments

Oldest

Newest Most Voted

Regina

13 years ago

Haven’t really done too much heavy work with MySQL, but I would expect MySQL to utilize table stats to arrive at if there is enough selectivity of an index for a particular query to make it worth using.

PostgreSQL has similar functionality to the index merge strategy it calls bitmap scan that it’s had for a while. In these cases even if there are multiple indexes it can merge, it usually does a stats analysis to determine if an index is selective enough for a filter condition to warrant the cost of using it especially in a bitmap scan. Does MySQL have similar checks?

James Pearson

13 years ago

How do you get data points of InnoDB rows examined? I can’t find anything in MySQL’s status outputs that provides that on a global level; do you fetch it from the queries individually?

Phil Jensen

13 years ago

James Pearson; try:
show global status like ‘innodb_rows_read’;

Justin Swanhart

13 years ago

James,

Author

Ernie Souhrada

13 years ago

‘@Regina–

MySQL does maintain statistics, but that’s not really the issue. The problem is that MySQL doesn’t correlate any of the statistics from one index with those from another, and this ends up leading to a value for “rows_examined” which is grossly underestimated. If we go back to the original query and the conditions on those columns, we can easily see that the indexes on status and parent_id are effectively useless, particularly when compared to the third index on user_type, but MySQL isn’t able to arrive at that same conclusion.

As I understand it, the optimizer simply sees “hey, we have these three indexes and they fit the conditions for an index_merge, so let’s do it” – and when that comes back with an estimated search space of 8100 rows, that looks a lot cheaper (remember, the optimizer is cost-based) than my index hint query which has an estimated search space of about 32K. AFAIK, index hints (or multi-column indexes, when it makes sense to use them) are really the only decent way around this issue.

Alexander Rubin

13 years ago

We can also simply create a “covered” index on (user_type, status, parent_id, user_id), right? The index will be larger but MySQL will be able to satisfy the query out of the index itself.
One more downside of “use index” hint is this – if, in the future, someone will remove or rename the index the query will start failing.

Petr Burian

11 years ago

Our projects are from time to time very badly affected by ‘index_merge_intersection’. In fact I haven’t seen any benefit from it. Each time a query started to perform 1000x slower it was due to a wrong key was picked with intersection optimizer rule. I agree this can’t be a general recommendation to disable it completely but in our case it will do the job.

In our case the InnoDB Row Operations drop significantly when the ‘index_merge_intersection’ has been disabled completely as can be seen here: http://goo.gl/4eMk15

An example what a wrong key with ‘index_merge_intersection’ can do is visible here: http://goo.gl/LUk2x2

Jarrett

11 years ago

Very useful information, thanks! We were experiencing this exact issue, and your post helped me/us to debug it and find a solution 🙂

superdav42

10 years ago

Seems we can get the best of both worlds by just setting the optimizer_switch in the session just for the one query that performs poorly with index intersection. So it’d be like:
SET SESSION optimizer_switch=”index_merge_intersection=off”;
SELECT … something that would use the wrong key…
SET SESSION optimizer_switch=”index_merge_intersection=on”;

Any downsides to this approach? Doesn’t have the problems with index hints and the optimizer switch only affects this single query which we can be fairly confident shouldn’t ever use an index intersection.