EmergencyEMERGENCY? Get 24/7 Help Now!

MySQL 6.0 vs 5.1 in TPC-H queries

 | March 25, 2008 |  Posted In: Benchmarks


Last week I played with queries from TPC-H benchmarks, particularly comparing MySQL 6.0.4-alpha with 5.1. MySQL 6.0 is interesting here, as there is a lot of new changes in optimizer, which should affect execution plan of TPC-H queries. In reality only two queries (from 22) have significantly better execution time (about them in next posts), but I want to write about is queries that execute slower in new MySQL 6.0 version.

Query is pretty simple

with execution plain (in 5.1)

against table

So this query takes 32.27 sec to execute in 5.1, and 39.39 sec in 6.0 (after warpm-ups and fully cached in key_cache)

What is difference in 6.0? Here is execution plain for 6.0:

See there are new Extra notes

It would be interesting to know what is ‘Using Index condition’, what I was able to find is Baron’s bug with no mention in MySQL 6.0 manual. No mentions about Using MRR also, but I guess it is Multi-Range-Read feature which allows to read data not row-by-row but in batches reads.

So let us see how perform 5.1 and 6.0 in cold mode (I dropped OS caches by

and restarting mysqld)

5.1 – time 1 hour 14 min 36.90 sec
6.0 – time 1 hour 6 min 2.21 sec

So difference is not so big we could speak about big improvements with using MRR here.

What is interesting is iostat -dx 5 output during query execution.

For 5.1 I had:

almost all time. As you see 1285 rkB/s is pretty small for RAID 10 system we have on this server. This is because of Random I/O we had during execution of query, as MySQL used index to scan rows and then did access to data to read other columns.

But for 6 picture is a bit different. At start I had:

and then rkB/s was continuously dropped to

at the end of query execution. I guess we had 4300 rkB/s because of MRR, but I have no idea why this number was dropped to 1200 finally. Probably we need to check how really MRR works.

What else worth to mention here is that MySQL is wrong choosing index scan here. If we try to execute query with IGNORE KEY (li_shp_dt_idx) (full table scan will be performed in this case) then we got:
1 min 39.14 sec to execute query (compare with 1 hour 14 min 36.90 sec).

and iostat here is:

As you see we have sequential I/O with 65362 rkB/s (compare 1200 rkB/s with random I/O), that allows to execute query ~40 times faster.

Vadim Tkachenko

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products. He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition.


  • The fact optimizer is unaware of caching and fragmentation is one of the most serious practical deficiencies.

    Another challenge around the corner for the optimizer is completely different costs for SSD storage compared to standard hard drives.

  • Jay,

    Actually it should not help, MySQL optimizer will not use indexes with range conditions
    Here is explain with index you suggested (in MySQL 6.0)
    mysql> explain select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date ‘1995-01-01’ and l_shipdate < date '1995-01-01' + interval '1' year and l_discount between 0.09 - 0.01 and 0.09 + 0.01 and l_quantity < 24; +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-----------------------------------------------+ | 1 | SIMPLE | lineitem | range | li_shp_dt_idx,l_quantity | li_shp_dt_idx | 3 | NULL | 8272067 | Using index condition; Using where; Using MRR | +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-----------------------------------------------+ 1 row in set (0.12 sec) And worse thing - TPC-H does not allow additional indexes.

  • Documentation for Index Condition Pushdown and MRR are on our todo.

    You can turn index condition pushdown off with this statement:

    SET engine_condition_pushdown = off;

    and MRR with this statement:

    SET optimizer_use_mrr=’disable’;

    It would be interesting to see how the first query’s speed is affected by disabling index condition pushdown. Peter, could you do a re-run with after “SET engine_condition_pushdown = off;” ?

  • Sergey,

    With SET engine_condition_pushdown = off; I get next plan:

    explain select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date ‘1995-01-01’ and l_shipdate < date '1995-01-01' + interval '1' year and l_discount between 0.09 - 0.01 and 0.09 + 0.01 and l_quantity < 24; +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+------------------------+ | 1 | SIMPLE | lineitem | range | li_shp_dt_idx,l_quantity | li_shp_dt_idx | 3 | NULL | 8272067 | Using where; Using MRR | +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+------------------------+ 1 row in set (0.00 sec) But execution time is the same - 39.19 sec with hot caches.

  • Vadim,

    thanks for the answer. So it’s not Index Condition Pushdown (which was our first guess). Would it be too much to ask to re-run with

    SET engine_condition_pushdown=off; SET optimizer_use_mrr=’disable’;

    so we know if the slowdown is caused by MRR or something else?

  • Sergey,

    No problem at all while I have things setuped on my box.

    mysql> SET engine_condition_pushdown=off; SET optimizer_use_mrr=’disable’;
    Query OK, 0 rows affected (0.00 sec)

    Query OK, 0 rows affected (0.00 sec)

    mysql> explain select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date ‘1995-01-01’ and l_shipdate < date '1995-01-01' + interval '1' year and l_discount between 0.09 - 0.01 and 0.09 + 0.01 and l_quantity < 24; +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-------------+ | 1 | SIMPLE | lineitem | range | li_shp_dt_idx,l_quantity | li_shp_dt_idx | 3 | NULL | 8272067 | Using where | +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-------------+ 1 row in set (0.00 sec) mysql> select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date ‘1995-01-01’ and l_shipdate < date '1995-01-01' + interval '1' year and l_discount between 0.09 - 0.01 and 0.09 + 0.01 and l_quantity < 24; +-----------------+ | revenue | +-----------------+ | 1847603277.9361 | +-----------------+ 1 row in set (33.96 sec) As you see disabling "using MRR" makes good difference. May I ask you to explain in short words how MRR works ?

  • Sure. The idea behind MRR (which stands for Multi Range Read) is to perform range scan as follows:
    * Scan the index, collect the rowids in a buffer (of @@read_rnd_buff_size bytes)
    * Sort the rowids
    * Go and retrieve full table records in one sequential ‘sweep’.

    the expected improvement is that the full records are read sequentially which is faster than doing random disk probing. MRR doesn’t provide any speedup when all record data is in disk cache, but it was not expected to cause so much slowdown either.

  • Sergey,

    Thank you for explain.
    Also it is interesting why it does not give a lot of performance gain in case of cold caches.

  • Vadim,

    Could it be possible that data on disk is already in the right physical order for this query? If the rows in lineitem are already ordered physically roughly in the same order as we would get by ordering the rowids we got via li_shp_dt_idx, then the overhead of MRR could be due to the unnecessary (in this case) rowid sorting.

    Could you have a look at the script that populates lineitem and see whether that is the case?

  • Timour,

    I use dbgen utility from tpc-h benchmark to generate data. It produce data for lineitem as (first N rows from generated file):

    1996-03-13|1|1551894|76910|1|17|33078.94|0.04|0.02|N|O|1996-02-12|1996-03-22|DELIVER IN PERSON|egular courts above the|TRUCK|
    1996-04-12|1|673091|73092|2|36|38306.16|0.09|0.06|N|O|1996-02-28|1996-04-20|TAKE BACK RETURN|ly final dependencies: slyly bold |
    1996-01-29|1|636998|36999|3|8|15479.68|0.10|0.02|N|O|1996-03-05|1996-01-31|TAKE BACK RETURN|riously. regular, express dep|REG AI
    1996-04-21|1|21315|46316|4|28|34616.68|0.09|0.06|N|O|1996-03-30|1996-05-16|NONE|lites. fluffily even de|AIR|
    1996-03-30|1|240267|15274|5|24|28974.00|0.10|0.04|N|O|1996-03-14|1996-04-01|NONE| pending foxes. slyly re|FOB|
    1996-01-30|1|156345|6348|6|32|44842.88|0.07|0.02|N|O|1996-02-07|1996-02-03|DELIVER IN PERSON|arefully slyly ex|MAIL|
    1997-01-28|2|1061698|11719|1|38|63066.32|0.00|0.05|N|O|1997-01-14|1997-02-02|TAKE BACK RETURN|ven requests. deposits breach a|RA
    1994-02-02|3|42970|17971|1|45|86083.65|0.06|0.00|R|F|1994-01-04|1994-02-23|NONE|ongside of the furiously brave acco|AIR|
    1993-11-09|3|190355|65359|2|49|70822.15|0.10|0.00|R|F|1993-12-20|1993-11-24|TAKE BACK RETURN| unusual accounts. eve|RAIL|
    1994-01-16|3|1284483|34508|3|27|39620.34|0.06|0.07|A|F|1993-11-22|1994-01-23|DELIVER IN PERSON|nal foxes wake. |SHIP|
    1993-12-04|3|293797|18800|4|2|3581.56|0.01|0.06|A|F|1994-01-07|1994-01-01|NONE|y. fluffily pending d|TRUCK|
    1993-12-14|3|1830941|5996|5|28|52411.80|0.04|0.00|R|F|1994-01-10|1994-01-01|TAKE BACK RETURN|ages nag slyly pending|FOB|
    1993-10-29|3|621426|96445|6|26|35032.14|0.10|0.02|A|F|1993-12-18|1993-11-04|TAKE BACK RETURN|ges sleep after the caref|RAIL|
    1996-01-10|4|880347|55372|1|30|39819.00|0.03|0.08|N|O|1995-12-14|1996-01-18|DELIVER IN PERSON|- quickly regular packages sleep.
    1994-10-31|5|1085693|85694|1|15|25179.60|0.02|0.04|R|F|1994-08-31|1994-11-20|NONE|ts wake furiously |AIR|
    1994-10-16|5|1239268|39269|2|26|31387.20|0.07|0.08|R|F|1994-09-25|1994-10-19|NONE|sts use slyly quickly special instruc|FOB|
    1994-08-08|5|375302|306|3|50|68864.50|0.08|0.03|A|F|1994-10-13|1994-08-26|DELIVER IN PERSON|eodolites. fluffily unusual|AIR|
    1992-04-27|6|1396355|21369|1|37|53697.73|0.08|0.03|A|F|1992-05-15|1992-05-02|TAKE BACK RETURN|p furiously special foxes|TRUCK|
    1996-05-07|7|1820519|95574|1|12|17273.04|0.07|0.03|N|O|1996-03-13|1996-06-03|TAKE BACK RETURN|ss pinto beans wake against th|FOB
    1996-02-01|7|1452428|77443|2|9|12423.15|0.08|0.08|N|O|1996-03-02|1996-02-19|TAKE BACK RETURN|es. instructions|SHIP|
    1996-01-15|7|947798|97817|3|46|84904.50|0.10|0.07|N|O|1996-03-27|1996-02-03|COLLECT COD| unusual reques|MAIL|
    1996-03-21|7|1630721|30722|4|28|46245.92|0.03|0.04|N|O|1996-04-08|1996-04-20|NONE|. slyly special requests haggl|FOB|
    1996-02-11|7|1518939|93985|5|38|74398.68|0.08|0.01|N|O|1996-02-24|1996-02-18|DELIVER IN PERSON|ns haggle carefully ironic deposi
    1996-01-16|7|792502|17510|6|35|55806.45|0.06|0.03|N|O|1996-02-23|1996-01-22|TAKE BACK RETURN|jole. excuses wake carefully alongs
    1996-02-10|7|1572371|22402|7|5|7216.50|0.04|0.02|N|O|1996-03-26|1996-02-13|NONE|ithely regula|FOB|

    and then I load it with LOAD DATA INFILE command.
    As you see data it ordered by columns (l_orderkey, l_lineitem).

    I can share generated files and LOAD DATA INFILE command, it takes about 8GB

  • Hi Vadim,

    – Which scale factor are you using?
    – How MySQL 5.1/6.0 scales with TPC-H (e.g.: scale factors 1,3,5,10,30,100)?



  • Noah,

    I tried different scale factors 1, 10, 100.

    I can’t say how MySQL scales because I have not done experiments.
    Some queries even with scale factor 1 take hours to complete.

  • […] As you can see on 100G data set MySQL 5.1 could only complete 10 out of 22 queries within 3hours run time allowed for each query and MySQL 6.0 has similar number of queries it can execute in reasonable time frame. There 2 queries (Query8 and Query20) which MySQL 6.0 does better but there is also Query11 in which significant regression is observed. Vadim has already Wrote about it in his MySQL 5.1 vs 6.0 in TPC-H Queries post […]

  • The main cost of executing wide index range scans is NOT the order of the blocks in the table. The main cost is that the blocks are in different places in the table, so even reading them in sequential order still requires to move the disk drive’s heads from place to place. As I first read about MRR and expectations this technique should provide the same speed advantage as clustered indexes, I laughed 😉 Actually MRR would help a little only because MySQL doesn’t take advantage from asynchronous I/O support (or am I wrong?). Having async I/O makes MRR useless, as the OS / disk drive’s firmware does block ordering itself.

  • Is there a comprehensive list with the TPC-H performance figures for SF-1 until SF-100 for out of the box MySQL, or with all trickery of a DBA applied.

Leave a Reply


Percona’s widely read Percona Database Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.

Besides specific database help, the blog also provides notices on upcoming events and webinars.
Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below and we’ll send you an update every Friday at 1pm ET.

No, thank you. Please do not ask me again.