About year ago Peter wrote about redundant indexes and mentioned sometimes it is good to leave two indexes, even one is first part of another. I’m speaking about BTREE indexes, for example, KEY (A), and KEY (A,B). From SQL point of view KEY(A) is not needed, as for queries like WHERE A=5 the index (A,B) also can be used.
But there is case when for performance it would be good to have both
Let we have the table
CREATE TABLE `userinfo` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(64) NOT NULL DEFAULT '',
`email` varchar(64) NOT NULL DEFAULT '',
`password` varchar(64) NOT NULL DEFAULT '',
`dob` date DEFAULT NULL,
`address` varchar(255) NOT NULL DEFAULT '',
`city` varchar(64) NOT NULL DEFAULT '',
`state_id` tinyint(3) unsigned NOT NULL DEFAULT '0',
`zip` varchar(8) NOT NULL DEFAULT '',
`country_id` smallint(5) unsigned NOT NULL DEFAULT '0',
`gender` enum('M','F') NOT NULL DEFAULT 'M',
`account_type` varchar(32) NOT NULL DEFAULT '',
`verified` tinyint(4) NOT NULL DEFAULT '0',
`allow_mail` tinyint(3) unsigned NOT NULL DEFAULT '0',
`parrent_account` int(10) unsigned NOT NULL DEFAULT '0',
`closest_airport` varchar(3) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`),
KEY `country_id` (`country_id`),
KEY `state_id` (`state_id`),
with 1,000,000 rows, and for each state_id there are about 20,000 records.
In the benchmark the query (Q1)
SELECT count(*) FROM userinfo WHERE state_id=5
which uses KEY (
state_id), shows the result 115 queries/sec
And now we have the task to execute the query (Q2)
SELECT state_id,city,address FROM userinfo WHERE state_id=5
which extracts additional info – city and address, both long strings columns
In the benchmark for this query we have 10 queries per sec.
Usuall technique we use for such queries is covering index, which store all needed columns in the index, and there is no need to read row data from the table.
So – we can extend index
state_id) by two columns:
ALTER TABLE userinfo DROP KEY state_id, ADD KEY `state_id_2` (state_id, city, address)
And now for Q2 we have 16.34 q/s, that is almost +100% comparing with non-convering index.
But if run the test for Q1 we have: 25.40 q/s, that in 4 times slower than in first run.
(This is for MyISAM, for InnoDB difference will be less. MyISAM uses key-compression for VARCHAR columns, so makes things worse).
So to have good performance for both case we should leave both indexes:
state_id (state_id) and
state_id_2 (state_id, city, address)
The drawback is worse INSERT performance, here is time to INSERT 1million records:
only with state_id KEY – 72 sec
with state_id and state_id_2 KEYs – 470sec
(Here I limited available memory to fit only one index, that shows how can degrade performance if we balance on memory limits)
If you are interested in, the numbers for InnoDB in the same tests:
only state_id KEY:
Q1 – 108 q/s
Q2 – 12.12 q/s
only state_id_2 KEY:
Q1 – 100 q/s
Q2 – 24.08 q/s
And INSERT of 1mil records:
only with state_id KEY – 80 sec
with state_id and state_id_2 KEYs – 136sec
(but in this case the memory was enough for both indexes)
Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.
Besides specific database help, the blog also provides notices on upcoming events and webinars.
Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below.