MySQL UNION and Double IN for Query Optimization

Loose Index Scan with Double IN

Few days ago I wrote an article about using UNION to implement loose index scan.

First I should mention double IN also works same way so you do not have to use the union. So changing query to:

mysql&gt; SELECT sql_no_cache name FROM people WHERE age in(18,19,20) AND zip IN (12345,12346, 12347);<br>+----------------------------------+<br>| name                             |<br>+----------------------------------+<br>| ed4481336eb9adca222fd404fa15658e |<br>| 888ba838661aff00bbbce114a2a22423 |<br>+----------------------------------+<br>2 rows in set (0.00 sec)<br><br>mysql&gt; explain SELECT sql_no_cache name FROM people WHERE age in(18,19,20) AND zip IN (12345,12346, 12347);<br>+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+<br>| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra       |<br>+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+<br>|  1 | SIMPLE      | people | range | age           | age  |       4 | NULL |    9 | Using where |<br>+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+<br>1 row in set (0.00 sec)<br>

So as you see there are really different types of ranges in MySQL. IN range allows to optimize lookups on the second key part, while BETWEEN and other ranges do not. Using same access type in EXPLAIN makes it very confusing.

I also was wrong about bug in key length in 5.0 explain. Actually I used tinyint for age and mediumint for zip which makes 4 right answer for using full key.

Be careful however with these nested IN clauses. MySQL has to internally build all possible combinations for row retrieval which ma become very slow if IN lists are large. Take 3 IN lists 1000 values each, on appropriate 3 keyparts and you may finish your lunch before query completes even if table has just couple of rows.

Let me however show how you can profile queries to see what exactly happens during query execution – very helpful for MySQL Performance optimization:

<br>mysql&gt; flush status;<br>Query OK, 0 rows affected (0.00 sec)<br><br>mysql&gt; SELECT sql_no_cache name FROM people WHERE age BETWEEN 18 and 20 AND zip IN (12345,12346, 12347);<br>+----------------------------------+<br>| name                             |<br>+----------------------------------+<br>| ed4481336eb9adca222fd404fa15658e |<br>| 888ba838661aff00bbbce114a2a22423 |<br>+----------------------------------+<br>2 rows in set (0.39 sec)<br><br>mysql&gt; show status like "Handler%";<br>+----------------------------+-------+<br>| Variable_name              | Value |<br>+----------------------------+-------+<br>| Handler_commit             | 0     |<br>| Handler_delete             | 0     |<br>| Handler_discover           | 0     |<br>| Handler_prepare            | 0     |<br>| Handler_read_first         | 0     |<br>| Handler_read_key           | 1     |<br>| Handler_read_next          | 42250 |<br>| Handler_read_prev          | 0     |<br>| Handler_read_rnd           | 0     |<br>| Handler_read_rnd_next      | 0     |<br>| Handler_rollback           | 0     |<br>| Handler_savepoint          | 0     |<br>| Handler_savepoint_rollback | 0     |<br>| Handler_update             | 0     |<br>| Handler_write              | 14    |<br>+----------------------------+-------+<br>15 rows in set (0.00 sec)<br><br>

mysql> flush status; Query OK, 0 rows affected (0.00 sec) mysql> SELECT sql_no_cache name FROM people WHERE age BETWEEN 18 and 20 AND zip IN (12345,12346, 12347); +----------------------------------+ | name | +----------------------------------+ | ed4481336eb9adca222fd404fa15658e | | 888ba838661aff00bbbce114a2a22423 | +----------------------------------+ 2 rows in set (0.39 sec) mysql> show status like "Handler%"; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 0 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 1 | | Handler_read_next | 42250 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 0 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_update | 0 | | Handler_write | 14 | +----------------------------+-------+ 15 rows in set (0.00 sec)

So you can do FLUSH STATUS to reset counters run the query (assiming your system does not do anything) and run SHOW STATUS to see how counters have changed. It was quite inconvenient you could only do it on idle box so as in MySQL 5.0 you do not have to any more. SHOW STATUS now will show per session counter increments and to get global counters SHOW GLOBAL STATUS needs to be used.

Let us look at this handler statistic – we can see Handler_read_key=1 – this means one index range scan was initiated. Handler_read_next=42250 means 42250 rows were analyzed during this scan. Basically MySQL started scanning Index with age>=18 and continue scanning as soon as it met something larger than 20.

The Difference Between Using UNION vs. Double IN

Now let’s see what UNION can handle what IN can’t:

Lets say we want to show people in appropriate age group sorting by time when they were last online. If age is fixed this works great and it is efficient, however if we have multiple ages to deal with ether as BETWEEN range or as IN filesort appears and query becomes very slow:

mysql&gt; explain select * from people where age=18 order by last_online desc limit 10;<br>+----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+<br>| id | select_type | table  | type | possible_keys | key  | key_len | ref   | rows  | Extra       |<br>+----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+<br>|  1 | SIMPLE      | people | ref  | age           | age  | 1       | const | 12543 | Using where |<br>+----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+<br>1 row in set (0.00 sec)<br><br>mysql&gt; explain select * from people where age in(18,19,20) order by last_online desc limit 10;<br>+----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+<br>| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows  | Extra                       |<br>+----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+<br>|  1 | SIMPLE      | people | range | age           | age  | 1       | NULL | 37915 | Using where; Using filesort |<br>+----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+<br>1 row in set (0.00 sec)<br><br>

mysql> explain select * from people where age=18 order by last_online desc limit 10; +----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+ | 1 | SIMPLE | people | ref | age | age | 1 | const | 12543 | Using where | +----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+ 1 row in set (0.00 sec) mysql> explain select * from people where age in(18,19,20) order by last_online desc limit 10; +----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+ | 1 | SIMPLE | people | range | age | age | 1 | NULL | 37915 | Using where; Using filesort | +----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+ 1 row in set (0.00 sec)

We can however use UNION to avoid filesort of full table:

mysql&gt; explain (select * from people where age=18 order by last_online desc limit 10) UNION ALL (select * from people where age=19 order by last_online desc limit 10) UNION ALL (select * from people where age=20 order by last_online desc limit 10) ORDER BY last_online desc limit 10;<br>+----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+<br>| id | select_type  | table        | type | possible_keys | key  | key_len | ref   | rows  | Extra          |<br>+----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+<br>|  1 | PRIMARY      | people       | ref  | age           | age  | 1       | const | 12543 | Using where    |<br>|  2 | UNION        | people       | ref  | age           | age  | 1       | const | 12741 | Using where    |<br>|  3 | UNION        | people       | ref  | age           | age  | 1       | const | 12631 | Using where    |<br>|NULL | UNION RESULT | &lt;union1,2,3&gt; | ALL  | NULL          | NULL | NULL    | NULL  |  NULL | Using filesort |<br>+----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+<br>4 rows in set (0.01 sec)<br>

mysql> explain (select * from people where age=18 order by last_online desc limit 10) UNION ALL (select * from people where age=19 order by last_online desc limit 10) UNION ALL (select * from people where age=20 order by last_online desc limit 10) ORDER BY last_online desc limit 10; +----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+ | 1 | PRIMARY | people | ref | age | age | 1 | const | 12543 | Using where | | 2 | UNION | people | ref | age | age | 1 | const | 12741 | Using where | | 3 | UNION | people | ref | age | age | 1 | const | 12631 | Using where | |NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | Using filesort | +----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+ 4 rows in set (0.01 sec)

In this case there is also filesort but it applied only to very small table which is result of union, so it is rather fast.