Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Multiple column index vs multiple indexes

August 23, 2008

Author

Peter Zaitsev

Benchmarks

Insight for DBAs

Share this Post:

(There is an updated version of the content in this post by Percona’s Stephane Combaudon available here.)
After my previous post there were questions raised about Index Merge on Multiple Indexes vs Two Column Index efficiency. I mentioned in most cases when query can use both of the ways using multiple column index would be faster but I also went ahead to do some benchmarks today.

I’m using couple of simple tables:

CREATE TABLE `t1000idxmerge` (
  `i` int(11) NOT NULL,
  `j` int(11) NOT NULL,
  `val` char(10) NOT NULL,
  KEY `i` (`i`),
  KEY `j` (`j`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t1000idx2` (
  `i` int(11) NOT NULL,
  `j` int(11) NOT NULL,
  `val` char(10) NOT NULL,
  KEY `i` (`i`,`j`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t1000idxmerge` (

`i` int(11) NOT NULL,

`j` int(11) NOT NULL,

`val` char(10) NOT NULL,

KEY `i` (`i`),

KEY `j` (`j`)

) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t1000idx2` (

`i` int(11) NOT NULL,

`j` int(11) NOT NULL,

`val` char(10) NOT NULL,

KEY `i` (`i`,`j`)

) ENGINE=MyISAM DEFAULT CHARSET=latin1

I have populated this table with random data for i and j having both of them having 1000 of distinct values, independent on each other. I also created couple of other tables with same data just with really low cardinality with i and j having just 3 values each. The table contained about 18M rows though was small enough to fit in the systems memory.

I’ve benchmarked simple queries using where clause which covers multiple columns:

Q1 SELECT sum(length(val)) FROM  T  WHERE i=2 AND j=1
Q2 SELECT sum(length(val)) FROM  T  WHERE i=2 AND j BETWEEN 1 and 2
Q3 SELECT sum(length(val)) FROM  T  WHERE j=2 AND i BETWEEN 1 and 2
Q4 SELECT sum(length(val)) FROM  T  WHERE i=2 OR j=1
Q5 SELECT sum(length(val)) FROM  T  WHERE j=2 AND i BETWEEN 100 and 200

Q1 SELECT sum(length(val)) FROM T WHERE i=2 AND j=1

Q2 SELECT sum(length(val)) FROM T WHERE i=2 AND j BETWEEN 1 and 2

Q3 SELECT sum(length(val)) FROM T WHERE j=2 AND i BETWEEN 1 and 2

Q4 SELECT sum(length(val)) FROM T WHERE i=2 OR j=1

Q5 SELECT sum(length(val)) FROM T WHERE j=2 AND i BETWEEN 100 and 200

As some of them there way too fast if run once I ran them multiple times and measured time appropriately. To remove the overhead of starting MySQL etc from equation I also measured execution of “SELECT 1” query using same script and subtracted this time from result in the table.

time for ((i=0;i<100;i+=1)); do mysql test -e “SELECT sum(length(val)) FROM t1000idxmerge WHERE i=2 AND j=1”; done > /dev/null

In the result table I compute per query results and present results in milliseconds.

Query	1000 – 2 indexes	1000 – 2 columns	3 – 2 indexes	3 – 2 columns
Q1	20	0.2	6940	2530
Q2	25	0.3	7400	7500
Q3	25	30	7200	3830
Q4	70	3800	4700	4700
Q5	25	2980	–	–

Note1: Q1 will not use Index Merge technique for low cardinality table but instead pick to do single index scan. I’m not aware of the optimizer hint which would allow to force index merge as you can do with index accesses in general.

Note2 Q2/Q3 can’t use Index Merge however as it is currently implemented so they would use single index range scan. The 2 indexes however benefits to Q3 because it can only use first keypart of index (j,i) to resolve BETWEEN part of the clause which may not be very selective.

Note3 You may be surprised why 2 column index is faster for Q3 in case of low cardinality even though MySQL can’t use index well. You’re right MySQL can’t and MySQL does not – Full table scan is performed and in this case turns to be faster than scanning 1/5th of the table using index. Also Full Table Scan is preferred for Q4 in all cases but in case of high cardinality multiple index configuration.

Note4 Q5 was just run on high cardinality tables to show what difference large BETWEEN can make.

Conclusion: For benchmarked queries we can see Multiple Column index beats Index Merge in all cases when such index can be used. It is also worth to watchout a MySQL may decide not to do Index merge (either intersection or union) but instead do full table scan or access table picking only one index on the pair.

0 0 votes

Article Rating

14 Comments

Oldest

Newest Most Voted

Cameron Eure

17 years ago

Multi column indexes may be faster in some cases, however, you also need to balance the performance of write operations, and disk consumption.

As a rule of thumb, the more indexes you have, the slower write operations are. DELETEs can be especially costly, under the right circumstances. Also, I’ve seen situations where indexes use more disk space than the actual table.

tgabi

17 years ago

Peter,
very intersting. Please post a histogram of query time versus range of BETWEEN of queries Q2 and Q3/Q5. I did my tests on a table crafted on the specifications you posted but I’d like to see your data before I post my comments. I’m not interested in the case where cardinality is 3 since it’s too low to have a reliable index usage.

Author

Peter Zaitsev

17 years ago

Tgabi,

I did not do the timing for histogram. You have the table information and so can do the run yourself and share results 🙂

tgabi

17 years ago

Very well. Here’s my data on the table with cardinality 1000 and 18M records – the query is SELECT SQL_NO_CACHE sum(length(val)) FROM t1000c WHERE j=2 AND i BETWEEN 1 AND x:

x time
————–
2 48.57
3 97.82
4 147.25
5 197.1
10 443.31
15 691.06

If goes downhill from there. So, very limited range of aplicability.
On the other hand, I’ve got consistent results from 2 index table, surprisingly, much faster than your results. This is why I wanted to see your data. I’m not sure why my results are so much faster:

x time
————–
2 46.08
10 46.14
20 46.24

Anyway, the 2 column index is only useful for a limited range of values. It can only be used as performance boost in same specific cases in addition to single column indexes.
I’ve used a 8 GB RAM computer dual core CPU. Tables are less than 1 GB in size. The index cache was set at 4 GB. Mysql version 5.0.67 ( i was doing testing for bug 29857).

mysql> show index from t1000;
+——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
| t1000 | 1 | i | 1 | i | A | 1001 | NULL | NULL | | BTREE | |
| t1000 | 1 | j | 1 | j | A | 1001 | NULL | NULL | | BTREE | |
+——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
2 rows in set (0.00 sec)

mysql> select count(*) from t1000;
+———-+
| count(*) |
+———-+
| 18000000 |
+———-+
1 row in set (0.00 sec)

(the other table, t1000c, has same data only combined index instead of simple index).

tgabi

17 years ago

Update: for whatever reason the multi-column index is buggered by BETWEEN. See below:

mysql> explain SELECT SQL_NO_CACHE sum(length(val)) FROM t1000c WHERE j=2 AND i BETWEEN 1 AND 2;
+—-+————-+——–+——-+—————+——+———+——+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——–+——-+—————+——+———+——+——-+————-+
| 1 | SIMPLE | t1000c | range | i | i | 8 | NULL | 14729 | Using where |
+—-+————-+——–+——-+—————+——+———+——+——-+————-+
1 row in set (0.00 sec)

mysql> explain SELECT SQL_NO_CACHE sum(length(val)) FROM t1000c WHERE j=2 AND i IN (1,2);
+—-+————-+——–+——-+—————+——+———+——+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——–+——-+—————+——+———+——+——+————-+
| 1 | SIMPLE | t1000c | range | i | i | 8 | NULL | 33 | Using where |
+—-+————-+——–+——-+—————+——+———+——+——+————-+
1 row in set (0.00 sec)

Peter, what version of mysql are you using ?

Author

Peter Zaitsev

17 years ago

Tgabi,

The query you’re running is exactly query which can’t use multi column index completely – BETWEEN makes it to scan only first key part this is why with it the performance is bad. If you have query as you described the index should be (j,i) not (i,j)

Now you also see two more things with your explain – first Index Merge is not really used here – it is just range scan by one or other index. Second there is a huge difference between IN and BETWEEN – IN allows scanning both keyparts while BETWEEN does not.

tgabi

17 years ago

Yet, in your data shows as quick. Look at Q3:

Q3 25 0.3

was this on a “switched” index that doesn’t show on the description ?

Regarding BETWEEN: shouldn’t the optimized transform this ?

Author

Peter Zaitsev

17 years ago

tgabi,

Ah… This is typo it should be 30ms not 0.3ms So pretty much it works as good as BETWEEN 1 and 2 query would 🙂
BETWEEN is never optimized to IN even though it is often possible – you can do it manually though

jocelyn

17 years ago

What about the impact of multiple column index on ORDER BY speed ?
e.g. :

SELECT * FROM a WHERE b=1 AND c=2 ORDER BY d

Here, an index on (b,c,d) will avoid MySQL using filesorting.
If three indexes are put on b, c and d, will MySQL be able to use index merge on d to avoid filesorting ?

nguyen

17 years ago

hi all,
i have a table:
CREATE TABLE yellow_pages (
bid int(11) NOT NULL auto_increment,
bname varchar(100) character set utf8 NOT NULL,
address varchar(50) character set utf8 default NULL,
city varchar(30) character set utf8 default NULL,
state varchar(2) default NULL,
zip int(5) unsigned NOT NULL,
country varchar(20) default NULL,
website varchar(50) default NULL,
phone varchar(10) default NULL,
fax varchar(10) default NULL,
latitude decimal(10,6) default NULL,
longitude decimal(10,6) default NULL,
contact varchar(50) default NULL,
title varchar(50) default NULL,
sic int(10) unsigned default NULL,
sic_des varchar(100) default NULL,
uid int(10) default NULL,
PRIMARY KEY (bid),
KEY latitude (latitude),
KEY longitude (longitude),
KEY zip (zip),
FULLTEXT KEY bname (bname),
FULLTEXT KEY sic_des (sic_des)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
and the query:

select (IFNULL(ACOS(0.796641758191*COS(RADIANS(l.latitude))*(-0.525545896347*COS(RADIANS(l.longitude)) + -0.850765250132*SIN(RADIANS(l.longitude))) + 0.604451742578*SIN(RADIANS(l.latitude))), 0.00000)*6370298.85223) as dis, l.* from yellow_pages l WHERE match(sic_des) against(‘”hotel*” ‘ in boolean mode) and ( longitude BETWEEN -122.159246013 AND -121.250753987 and latitude BETWEEN 36.827530043 AND 37.551269957 or zip in(95050,95051,95052,95053,95054,95055,95056)) order by dis LIMIT 0, 10

i use php(v5.2) to connect to mysql(v5.02) but it run sooooo long
any idea??

thanks

praveen

17 years ago

Hi, I am pasting my query. It takes some 1 min 22 sec to execute.
TBL_RES_USER_SKILLS has some 5 lacks data. If I remove “AND SKILL_NAME=’JAVA'” condition from the below query then it gets result in 5 seconds. I am not able to figure out the problem.Please guide me.

select count(*) from (select ID from ((SELECT ID FROM TBL_RES_USR) ORDER BY SCORE DESC) a where a.ID=(SELECT DISTINCT(USER_ID) FROM TBL_RES_SCORE_DATA WHERE COMPLETENESS>=80 and USER_ID=a.ID ) AND NOT((select COUNT(*) from TBL_USR_ORGANIZATION_INFO b where TO_DATE =(SELECT max(TO_DATE) from TBL_USR_ORGANIZATION_INFO where USER_ID=a.ID AND DEL_IND =’n’) AND USER_ID=a.ID AND DEL_IND =’n’ and ORGAN_NAME In(‘SAP’))) and a.ID=(select distinct(USER_ID) from TBL_RES_USER_SKILLS where USER_ID=a.ID and SKILL_NAME=’JAVA’ )and a.ID=(select distinct(USER_ID) from TBL_RES_SUMMARY where USER_ID = a.ID and TOTAL_EXPERIENCE<=100)and (a.ID=(select distinct(USER_ID) from TBL_RES_SUMMARY where (timediff(date_add((select curdate()),INTERVAL -’22:12′ HOUR_MINUTE),date_add(LAST_LOGIN,INTERVAL -’22:12′ HOUR_MINUTE))/24)<=365 and USER_ID=a.ID)))as Counts;

+———-+
| count(*) |
+———-+
| 4333 |
+———-+
1 row in set, 8666 warnings (1 min 22.93 sec)

http://rocky-says.blogspot.com/

17 years ago

‘@praveen:
Do an “Explain ” in both the cases and check the difference

Artur Ejsmont

15 years ago

excellent!

thanks for the post.

I am wondering what is the best practice from performance point of view to index enumerations/chars

If you have a lot of flags like: has_garden, has_pool, has_whatever. Its just an example from top of my head i am not actually writing property selling site 😉

Mysql does not have bitmap indexes as far as i know and creating index with all of them does not seem like a good idea or does it? Columns order matters so if you search for house without considering first flag you are doomed.

what is your take on that situation with millions of rows?

Thanks : )

Harish Kalagi

12 years ago

we are working on a product.it has 93 tables,so how to optimize the performance..i am a complete fresher and i know nothing apart from the basic concepts of dbms.