Multiple column index vs multiple indexes

(There is an updated version of the content in this post by Percona’s Stephane Combaudon available here.)
After my previous post there were questions raised about Index Merge on Multiple Indexes vs Two Column Index efficiency. I mentioned in most cases when query can use both of the ways using multiple column index would be faster but I also went ahead to do some benchmarks today.

I’m using couple of simple tables:

I have populated this table with random data for i and j having both of them having 1000 of distinct values, independent on each other. I also created couple of other tables with same data just with really low cardinality with i and j having just 3 values each. The table contained about 18M rows though was small enough to fit in the systems memory.

I’ve benchmarked simple queries using where clause which covers multiple columns:

As some of them there way too fast if run once I ran them multiple times and measured time appropriately. To remove the overhead of starting MySQL etc from equation I also measured execution of “SELECT 1” query using same script and subtracted this time from result in the table.

time for ((i=0;i<100;i+=1)); do mysql test -e “SELECT sum(length(val)) FROM t1000idxmerge WHERE i=2 AND j=1”; done > /dev/null

In the result table I compute per query results and present results in milliseconds.

Query 1000 – 2 indexes 1000 – 2 columns 3 – 2 indexes 3 – 2 columns
Q1 20 0.2 6940 2530
Q2 25 0.3 7400 7500
Q3 25 30 7200 3830
Q4 70 3800 4700 4700
Q5 25 2980

Note1: Q1 will not use Index Merge technique for low cardinality table but instead pick to do single index scan. I’m not aware of the optimizer hint which would allow to force index merge as you can do with index accesses in general.

Note2 Q2/Q3 can’t use Index Merge however as it is currently implemented so they would use single index range scan. The 2 indexes however benefits to Q3 because it can only use first keypart of index (j,i) to resolve BETWEEN part of the clause which may not be very selective.

Note3 You may be surprised why 2 column index is faster for Q3 in case of low cardinality even though MySQL can’t use index well. You’re right MySQL can’t and MySQL does not – Full table scan is performed and in this case turns to be faster than scanning 1/5th of the table using index. Also Full Table Scan is preferred for Q4 in all cases but in case of high cardinality multiple index configuration.

Note4 Q5 was just run on high cardinality tables to show what difference large BETWEEN can make.

Conclusion: For benchmarked queries we can see Multiple Column index beats Index Merge in all cases when such index can be used. It is also worth to watchout a MySQL may decide not to do Index merge (either intersection or union) but instead do full table scan or access table picking only one index on the pair.

Share this post

Comments (14)

  • Cameron Eure

    Multi column indexes may be faster in some cases, however, you also need to balance the performance of write operations, and disk consumption.

    As a rule of thumb, the more indexes you have, the slower write operations are. DELETEs can be especially costly, under the right circumstances. Also, I’ve seen situations where indexes use more disk space than the actual table.

    August 23, 2008 at 11:01 am
  • tgabi

    very intersting. Please post a histogram of query time versus range of BETWEEN of queries Q2 and Q3/Q5. I did my tests on a table crafted on the specifications you posted but I’d like to see your data before I post my comments. I’m not interested in the case where cardinality is 3 since it’s too low to have a reliable index usage.

    August 24, 2008 at 6:13 pm
  • peter


    I did not do the timing for histogram. You have the table information and so can do the run yourself and share results 🙂

    August 24, 2008 at 6:20 pm
  • tgabi

    Very well. Here’s my data on the table with cardinality 1000 and 18M records – the query is SELECT SQL_NO_CACHE sum(length(val)) FROM t1000c WHERE j=2 AND i BETWEEN 1 AND x:

    x time
    2 48.57
    3 97.82
    4 147.25
    5 197.1
    10 443.31
    15 691.06

    If goes downhill from there. So, very limited range of aplicability.
    On the other hand, I’ve got consistent results from 2 index table, surprisingly, much faster than your results. This is why I wanted to see your data. I’m not sure why my results are so much faster:

    x time
    2 46.08
    10 46.14
    20 46.24

    Anyway, the 2 column index is only useful for a limited range of values. It can only be used as performance boost in same specific cases in addition to single column indexes.
    I’ve used a 8 GB RAM computer dual core CPU. Tables are less than 1 GB in size. The index cache was set at 4 GB. Mysql version 5.0.67 ( i was doing testing for bug 29857).

    mysql> show index from t1000;
    | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
    | t1000 | 1 | i | 1 | i | A | 1001 | NULL | NULL | | BTREE | |
    | t1000 | 1 | j | 1 | j | A | 1001 | NULL | NULL | | BTREE | |
    2 rows in set (0.00 sec)

    mysql> select count(*) from t1000;
    | count(*) |
    | 18000000 |
    1 row in set (0.00 sec)

    (the other table, t1000c, has same data only combined index instead of simple index).

    August 25, 2008 at 4:19 am
  • tgabi

    Update: for whatever reason the multi-column index is buggered by BETWEEN. See below:

    mysql> explain SELECT SQL_NO_CACHE sum(length(val)) FROM t1000c WHERE j=2 AND i BETWEEN 1 AND 2;
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    | 1 | SIMPLE | t1000c | range | i | i | 8 | NULL | 14729 | Using where |
    1 row in set (0.00 sec)

    mysql> explain SELECT SQL_NO_CACHE sum(length(val)) FROM t1000c WHERE j=2 AND i IN (1,2);
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    | 1 | SIMPLE | t1000c | range | i | i | 8 | NULL | 33 | Using where |
    1 row in set (0.00 sec)

    Peter, what version of mysql are you using ?

    August 25, 2008 at 6:47 am
  • peter


    The query you’re running is exactly query which can’t use multi column index completely – BETWEEN makes it to scan only first key part this is why with it the performance is bad. If you have query as you described the index should be (j,i) not (i,j)

    Now you also see two more things with your explain – first Index Merge is not really used here – it is just range scan by one or other index. Second there is a huge difference between IN and BETWEEN – IN allows scanning both keyparts while BETWEEN does not.

    August 25, 2008 at 9:01 am
  • tgabi

    Yet, in your data shows as quick. Look at Q3:

    Q3 25 0.3

    was this on a “switched” index that doesn’t show on the description ?

    Regarding BETWEEN: shouldn’t the optimized transform this ?

    August 25, 2008 at 9:33 am
  • peter


    Ah… This is typo it should be 30ms not 0.3ms So pretty much it works as good as BETWEEN 1 and 2 query would 🙂
    BETWEEN is never optimized to IN even though it is often possible – you can do it manually though

    August 25, 2008 at 6:54 pm
  • jocelyn

    What about the impact of multiple column index on ORDER BY speed ?
    e.g. :


    Here, an index on (b,c,d) will avoid MySQL using filesorting.
    If three indexes are put on b, c and d, will MySQL be able to use index merge on d to avoid filesorting ?

    August 26, 2008 at 6:19 am
  • nguyen

    hi all,
    i have a table:
    CREATE TABLE yellow_pages (
    bid int(11) NOT NULL auto_increment,
    bname varchar(100) character set utf8 NOT NULL,
    address varchar(50) character set utf8 default NULL,
    city varchar(30) character set utf8 default NULL,
    state varchar(2) default NULL,
    zip int(5) unsigned NOT NULL,
    country varchar(20) default NULL,
    website varchar(50) default NULL,
    phone varchar(10) default NULL,
    fax varchar(10) default NULL,
    latitude decimal(10,6) default NULL,
    longitude decimal(10,6) default NULL,
    contact varchar(50) default NULL,
    title varchar(50) default NULL,
    sic int(10) unsigned default NULL,
    sic_des varchar(100) default NULL,
    uid int(10) default NULL,
    PRIMARY KEY (bid),
    KEY latitude (latitude),
    KEY longitude (longitude),
    KEY zip (zip),
    FULLTEXT KEY bname (bname),
    FULLTEXT KEY sic_des (sic_des)
    and the query:

    select (IFNULL(ACOS(0.796641758191*COS(RADIANS(l.latitude))*(-0.525545896347*COS(RADIANS(l.longitude)) + -0.850765250132*SIN(RADIANS(l.longitude))) + 0.604451742578*SIN(RADIANS(l.latitude))), 0.00000)*6370298.85223) as dis, l.* from yellow_pages l WHERE match(sic_des) against(‘”hotel*” ‘ in boolean mode) and ( longitude BETWEEN -122.159246013 AND -121.250753987 and latitude BETWEEN 36.827530043 AND 37.551269957 or zip in(95050,95051,95052,95053,95054,95055,95056)) order by dis LIMIT 0, 10

    i use php(v5.2) to connect to mysql(v5.02) but it run sooooo long
    any idea??


    September 8, 2008 at 4:12 pm
  • praveen

    Hi, I am pasting my query. It takes some 1 min 22 sec to execute.
    TBL_RES_USER_SKILLS has some 5 lacks data. If I remove “AND SKILL_NAME=’JAVA'” condition from the below query then it gets result in 5 seconds. I am not able to figure out the problem.Please guide me.

    select count(*) from (select ID from ((SELECT ID FROM TBL_RES_USR) ORDER BY SCORE DESC) a where a.ID=(SELECT DISTINCT(USER_ID) FROM TBL_RES_SCORE_DATA WHERE COMPLETENESS>=80 and USER_ID=a.ID ) AND NOT((select COUNT(*) from TBL_USR_ORGANIZATION_INFO b where TO_DATE =(SELECT max(TO_DATE) from TBL_USR_ORGANIZATION_INFO where USER_ID=a.ID AND DEL_IND =’n’) AND USER_ID=a.ID AND DEL_IND =’n’ and ORGAN_NAME In(‘SAP’))) and a.ID=(select distinct(USER_ID) from TBL_RES_USER_SKILLS where USER_ID=a.ID and SKILL_NAME=’JAVA’ )and a.ID=(select distinct(USER_ID) from TBL_RES_SUMMARY where USER_ID = a.ID and TOTAL_EXPERIENCE<=100)and (a.ID=(select distinct(USER_ID) from TBL_RES_SUMMARY where (timediff(date_add((select curdate()),INTERVAL -’22:12′ HOUR_MINUTE),date_add(LAST_LOGIN,INTERVAL -’22:12′ HOUR_MINUTE))/24)<=365 and USER_ID=a.ID)))as Counts;

    | count(*) |
    | 4333 |
    1 row in set, 8666 warnings (1 min 22.93 sec)

    December 18, 2008 at 1:32 am

    Do an “Explain ” in both the cases and check the difference

    March 3, 2009 at 3:32 am
  • Artur Ejsmont


    thanks for the post.

    I am wondering what is the best practice from performance point of view to index enumerations/chars

    If you have a lot of flags like: has_garden, has_pool, has_whatever. Its just an example from top of my head i am not actually writing property selling site 😉

    Mysql does not have bitmap indexes as far as i know and creating index with all of them does not seem like a good idea or does it? Columns order matters so if you search for house without considering first flag you are doomed.

    what is your take on that situation with millions of rows?

    Thanks : )

    November 18, 2010 at 5:23 am
  • Harish Kalagi

    we are working on a has 93 tables,so how to optimize the performance..i am a complete fresher and i know nothing apart from the basic concepts of dbms.

    January 22, 2014 at 5:07 am

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.