I believe we wrote about this before, but this topic popups again and again.
Today I’ve read opinion that if we have clause WHERE has_something=1 we should have index on column has_something (the column has two values 0 and 1).
In reality the right answer is not so simple.
Let’s look next table
|
1 2 3 4 5 6 7 |
CREATE TABLE `testr` ( `id` int(10) unsigned NOT NULL auto_increment, `name` varchar(32) NOT NULL, `has_something` tinyint(3) unsigned NOT NULL, PRIMARY KEY (`id`), KEY `has_something` (`has_something`) ) ENGINE=MyISAM |
with 20.000.000 records.
And in first case has_something=0 for 90% of rows (with random distribution)
|
1 2 3 4 5 6 7 |
mysql> select cnt0/cnt from (select count(*) cnt0 from testr where has_something=0) t, (select count(*) cnt from testr) t1; +----------+ | cnt0/cnt | +----------+ | 0.9001 | +----------+ 1 row in set (7.56 sec) |
Let’s check execution time with and without index
|
1 2 3 4 5 6 7 |
mysql> select count(name) from testr force key (has_something) where has_something=0; +-------------+ | count(name) | +-------------+ | 18001245 | +-------------+ 1 row in set (35.96 sec) |
|
1 2 3 4 5 6 7 |
mysql> select count(name) from testr ignore key (has_something) where has_something=0; +-------------+ | count(name) | +-------------+ | 18001245 | +-------------+ 1 row in set (10.46 sec) |
As you see with index the time is by 3.5 times slower.
Good that mysql in this case choose do not use index
|
1 2 3 4 5 6 7 |
mysql> explain select count(name) from testr where has_something=0; +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | testr | ALL | has_something | NULL | NULL | NULL | 15000000 | Using where | +----+-------------+-------+------+---------------+------+---------+------+----------+-------------+ 1 row in set (0.00 sec) |
Let look the case when has_something = 0 for 50% of rows.
|
1 2 |
mysql> select count(name) from testr force key (has_something) where has_something=0; 1 row in set (20.27 sec) |
|
1 2 |
mysql> select count(name) from testr ignore key (has_something) where has_something=0; 1 row in set (10.62 sec) |
query with index is still 2 times slower.
and this time mysql is going to use index in execution plan:
|
1 2 3 4 5 6 |
mysql> explain select count(name) from testr where has_something=0; +----+-------------+-------+------+---------------+---------------+---------+-------+---------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+---------------+---------+-------+---------+-------+ | 1 | SIMPLE | testr | ref | has_something | has_something | 1 | const | 8890716 | | +----+-------------+-------+------+---------------+---------------+---------+-------+---------+-------+ 1 row in set (0.00 sec) |
What about 30% rows with has_something=0 ?
|
1 2 |
mysql> select count(name) from testr force key (has_something) where has_something=0; 1 row in set (12.36 sec) |
|
1 2 |
mysql> select count(name) from testr ignore key (has_something) where has_something=0; 1 row in set (10.51 sec) |
Still query without index is faster.
And finally for case with 20% rows with has_someting=0
|
1 2 |
mysql> select count(name) from testr force key (has_something) where has_something=0; 1 row in set (8.39 sec) |
|
1 2 |
mysql> select count(name) from testr ignore key (has_something) where has_something=0; 1 row in set (10.43 sec) |
So only in the last case we really need the index on column has_something