MySQL 6.0 vs 5.1 in TPC-H queries

Last week I played with queries from TPC-H benchmarks, particularly comparing MySQL 6.0.4-alpha with 5.1. MySQL 6.0 is interesting here, as there is a lot of new changes in optimizer, which should affect execution plan of TPC-H queries. In reality only two queries (from 22) have significantly better execution time (about them in next posts), but I want to write about is queries that execute slower in new MySQL 6.0 version.

Query is pretty simple

SELECT sum(l_extendedprice * l_discount) as revenue 
FROM lineitem WHERE l_shipdate >= date '1995-01-01' 
AND l_shipdate < date '1995-01-01' + interval '1' year
AND l_discount BETWEEN 0.09 - 0.01 AND 0.09 + 0.01
AND l_quantity < 24;

SELECT sum(l_extendedprice * l_discount) as revenue

FROM lineitem WHERE l_shipdate >= date '1995-01-01'

AND l_shipdate < date '1995-01-01' + interval '1' year

AND l_discount BETWEEN 0.09 - 0.01 AND 0.09 + 0.01

AND l_quantity < 24;

with execution plain (in 5.1)

+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-------------+
| id | select_type | table    | type  | possible_keys | key           | key_len | ref  | rows    | Extra       |
+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-------------+
|  1 | SIMPLE      | lineitem | range | li_shp_dt_idx | li_shp_dt_idx | 3       | NULL | 8272067 | Using where | 
+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-------------+

+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-------------+

+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-------------+

+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-------------+

against table

CREATE TABLE `lineitem` (
  `l_shipdate` date NOT NULL,
  `l_orderkey` int(11) NOT NULL,
  `l_partkey` int(11) NOT NULL,
  `l_suppkey` int(11) NOT NULL,
  `l_linenumber` int(11) NOT NULL,
  `l_quantity` decimal(15,2) NOT NULL,
  `l_extendedprice` decimal(15,2) NOT NULL,
  `l_discount` decimal(15,2) NOT NULL,
  `l_tax` decimal(15,2) NOT NULL,
  `l_returnflag` char(1) NOT NULL,
  `l_linestatus` char(1) NOT NULL,
  `l_commitdate` date NOT NULL,
  `l_receiptdate` date NOT NULL,
  `l_shipinstruct` char(25) NOT NULL,
  `l_comment` varchar(44) NOT NULL,
  `l_shipmode` char(10) NOT NULL,
  PRIMARY KEY (`l_orderkey`,`l_linenumber`),
  KEY `lineitem_fk2` (`l_suppkey`),
  KEY `lineitem_fk3` (`l_partkey`,`l_suppkey`),
  KEY `li_shp_dt_idx` (`l_shipdate`),
  KEY `li_com_dt_idx` (`l_commitdate`),
  KEY `li_rcpt_dt_idx` (`l_receiptdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

with 59986052 rows

CREATE TABLE `lineitem` (

`l_shipdate` date NOT NULL,

`l_orderkey` int(11) NOT NULL,

`l_partkey` int(11) NOT NULL,

`l_suppkey` int(11) NOT NULL,

`l_linenumber` int(11) NOT NULL,

`l_quantity` decimal(15,2) NOT NULL,

`l_extendedprice` decimal(15,2) NOT NULL,

`l_discount` decimal(15,2) NOT NULL,

`l_tax` decimal(15,2) NOT NULL,

`l_returnflag` char(1) NOT NULL,

`l_linestatus` char(1) NOT NULL,

`l_commitdate` date NOT NULL,

`l_receiptdate` date NOT NULL,

`l_shipinstruct` char(25) NOT NULL,

`l_comment` varchar(44) NOT NULL,

`l_shipmode` char(10) NOT NULL,

PRIMARY KEY (`l_orderkey`,`l_linenumber`),

KEY `lineitem_fk2` (`l_suppkey`),

KEY `lineitem_fk3` (`l_partkey`,`l_suppkey`),

KEY `li_shp_dt_idx` (`l_shipdate`),

KEY `li_com_dt_idx` (`l_commitdate`),

KEY `li_rcpt_dt_idx` (`l_receiptdate`)

) ENGINE=MyISAM DEFAULT CHARSET=latin1

with 59986052 rows

So this query takes 32.27 sec to execute in 5.1, and 39.39 sec in 6.0 (after warpm-ups and fully cached in key_cache)

What is difference in 6.0? Here is execution plain for 6.0:

+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-----------------------------------------------+
| id | select_type | table    | type  | possible_keys | key           | key_len | ref  | rows    | Extra                                         |
+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-----------------------------------------------+
|  1 | SIMPLE      | lineitem | range | li_shp_dt_idx | li_shp_dt_idx | 3       | NULL | 8272067 | Using index condition; Using where; Using MRR | 
+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-----------------------------------------------+

+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-----------------------------------------------+

+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-----------------------------------------------+

+----+-------------+----------+-------+---------------+---------------+---------+------+---------+-----------------------------------------------+

See there are new Extra notes
USING INDEX condition; USING MRR.

It would be interesting to know what is ‘Using Index condition’, what I was able to find is Baron’s bug with no mention in MySQL 6.0 manual. No mentions about Using MRR also, but I guess it is Multi-Range-Read feature which allows to read data not row-by-row but in batches reads.

So let us see how perform 5.1 and 6.0 in cold mode (I dropped OS caches by



		
		
			
			

			
			echo 1 > /proc/sys/vm/drop_caches
			
				
					
				
					1
				
						echo 1 > /proc/sys/vm/drop_caches

and restarting mysqld)

5.1 – time 1 hour 14 min 36.90 sec
6.0 – time 1 hour 6 min 2.21 sec

So difference is not so big we could speak about big improvements with using MRR here.

What is interesting is iostat -dx 5 output during query execution.

For 5.1 I had:

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   2.06 310.70  0.82 2493.83   23.05  1246.91    11.52     8.08     1.02    3.28   3.28 102.04
dm-0         0.00   0.00 310.70  0.62 2493.83    4.94  1246.91     2.47     8.03     1.02    3.29   3.28 102.18

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   0.21 301.23  0.41 2441.07    4.93  1220.53     2.46     8.11     1.02    3.39   3.39 102.14
dm-0         0.00   0.00 301.23  0.62 2441.07    4.93  1220.53     2.46     8.10     1.02    3.39   3.39 102.18

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   0.41 317.83  0.82 2570.49    9.84  1285.25     4.92     8.10     1.02    3.20   3.20 102.01
dm-0         0.00   0.00 317.83  0.61 2570.49    4.92  1285.25     2.46     8.09     1.02    3.21   3.21 102.11

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 2.06 310.70 0.82 2493.83 23.05 1246.91 11.52 8.08 1.02 3.28 3.28 102.04

dm-0 0.00 0.00 310.70 0.62 2493.83 4.94 1246.91 2.47 8.03 1.02 3.29 3.28 102.18

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 0.21 301.23 0.41 2441.07 4.93 1220.53 2.46 8.11 1.02 3.39 3.39 102.14

dm-0 0.00 0.00 301.23 0.62 2441.07 4.93 1220.53 2.46 8.10 1.02 3.39 3.39 102.18

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 0.41 317.83 0.82 2570.49 9.84 1285.25 4.92 8.10 1.02 3.20 3.20 102.01

dm-0 0.00 0.00 317.83 0.61 2570.49 4.92 1285.25 2.46 8.09 1.02 3.21 3.21 102.11

almost all time. As you see 1285 rkB/s is pretty small for RAID 10 system we have on this server. This is because of Random I/O we had during execution of query, as MySQL used index to scan rows and then did access to data to read other columns.

But for 6 picture is a bit different. At start I had:

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   0.21 1052.67  0.41 8455.97    4.94  4227.98     2.47     8.03     1.01    0.96   0.96 101.44
dm-0         0.00   0.00 1052.67  0.62 8455.97    4.94  4227.98     2.47     8.03     1.02    0.96   0.96 101.50

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   0.21 1091.94  3.10 8763.64   26.45  4381.82    13.22     8.03     1.02    0.94   0.93 102.13
dm-0         0.00   0.00 1091.53  0.62 8760.33    4.96  4380.17     2.48     8.03     1.02    0.94   0.94 102.25

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   0.62 944.97  0.82 7582.75   11.50  3791.38     5.75     8.03     1.01    1.07   1.07 101.40
dm-0         0.00   0.00 944.97  0.62 7582.75    4.93  3791.38     2.46     8.02     1.01    1.07   1.07 101.40

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 0.21 1052.67 0.41 8455.97 4.94 4227.98 2.47 8.03 1.01 0.96 0.96 101.44

dm-0 0.00 0.00 1052.67 0.62 8455.97 4.94 4227.98 2.47 8.03 1.02 0.96 0.96 101.50

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 0.21 1091.94 3.10 8763.64 26.45 4381.82 13.22 8.03 1.02 0.94 0.93 102.13

dm-0 0.00 0.00 1091.53 0.62 8760.33 4.96 4380.17 2.48 8.03 1.02 0.94 0.94 102.25

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 0.62 944.97 0.82 7582.75 11.50 3791.38 5.75 8.03 1.01 1.07 1.07 101.40

dm-0 0.00 0.00 944.97 0.62 7582.75 4.93 3791.38 2.46 8.02 1.01 1.07 1.07 101.40

and then rkB/s was continuously dropped to

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   0.00 341.80  0.00 2739.34    0.00  1369.67     0.00     8.01     1.02    2.97   2.97 101.56
dm-0         0.00   0.00 341.80  0.00 2739.34    0.00  1369.67     0.00     8.01     1.02    2.97   2.97 101.56

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   0.61 309.63  0.82 2488.52   11.48  1244.26     5.74     8.05     1.02    3.28   3.28 101.84
dm-0         0.00   0.00 309.63  0.61 2488.52    4.92  1244.26     2.46     8.04     1.02    3.28   3.28 101.78

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00   0.21 318.76  0.41 2558.35    4.95  1279.18     2.47     8.03     1.02    3.21   3.21 102.43
dm-0         0.00   0.00 318.76  0.62 2558.35    4.95  1279.18     2.47     8.03     1.02    3.21   3.21 102.49

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 0.00 341.80 0.00 2739.34 0.00 1369.67 0.00 8.01 1.02 2.97 2.97 101.56

dm-0 0.00 0.00 341.80 0.00 2739.34 0.00 1369.67 0.00 8.01 1.02 2.97 2.97 101.56

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 0.61 309.63 0.82 2488.52 11.48 1244.26 5.74 8.05 1.02 3.28 3.28 101.84

dm-0 0.00 0.00 309.63 0.61 2488.52 4.92 1244.26 2.46 8.04 1.02 3.28 3.28 101.78

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 0.00 0.21 318.76 0.41 2558.35 4.95 1279.18 2.47 8.03 1.02 3.21 3.21 102.43

dm-0 0.00 0.00 318.76 0.62 2558.35 4.95 1279.18 2.47 8.03 1.02 3.21 3.21 102.49

at the end of query execution. I guess we had 4300 rkB/s because of MRR, but I have no idea why this number was dropped to 1200 finally. Probably we need to check how really MRR works.

What else worth to mention here is that MySQL is wrong choosing index scan here. If we try to execute query with IGNORE KEY (li_shp_dt_idx) (full table scan will be performed in this case) then we got:
1 min 39.14 sec to execute query (compare with 1 hour 14 min 36.90 sec).

and iostat here is:

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda         16.22   1.23 534.29  1.85 132547.02   24.64 66273.51    12.32   247.27     0.30    0.55   0.55  29.67
dm-0         0.00   0.00 550.10  0.62 132545.38    4.93 66272.69     2.46   240.69     0.30    0.54   0.53  29.30

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda         15.78   0.20 524.39  0.41 130224.59    4.92 65112.30     2.46   248.15     0.30    0.56   0.56  29.61
dm-0         0.00   0.00 540.37  0.61 130224.59    4.92 65112.30     2.46   240.73     0.30    0.56   0.55  29.49

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda         16.02   0.20 526.77  1.01 130724.54    9.74 65362.27     4.87   247.70     0.27    0.52   0.52  27.22
dm-0         0.00   0.00 542.60  0.61 130724.54    4.87 65362.27     2.43   240.66     0.28    0.51   0.50  27.08

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 16.22 1.23 534.29 1.85 132547.02 24.64 66273.51 12.32 247.27 0.30 0.55 0.55 29.67

dm-0 0.00 0.00 550.10 0.62 132545.38 4.93 66272.69 2.46 240.69 0.30 0.54 0.53 29.30

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 15.78 0.20 524.39 0.41 130224.59 4.92 65112.30 2.46 248.15 0.30 0.56 0.56 29.61

dm-0 0.00 0.00 540.37 0.61 130224.59 4.92 65112.30 2.46 240.73 0.30 0.56 0.55 29.49

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util

sda 16.02 0.20 526.77 1.01 130724.54 9.74 65362.27 4.87 247.70 0.27 0.52 0.52 27.22

dm-0 0.00 0.00 542.60 0.61 130724.54 4.87 65362.27 2.43 240.66 0.28 0.51 0.50 27.08

As you see we have sequential I/O with 65362 rkB/s (compare 1200 rkB/s with random I/O), that allows to execute query ~40 times faster.

15 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Peter Zaitsev

Admin

17 years ago

The fact optimizer is unaware of caching and fragmentation is one of the most serious practical deficiencies.

Another challenge around the corner for the optimizer is completely different costs for SSD storage compared to standard hard drives.

Jay Pipes

17 years ago

What about putting an index on (quantity, discount, ship_date)?

Vadim Tkachenko

Author

17 years ago

Jay,

Actually it should not help, MySQL optimizer will not use indexes with range conditions
Here is explain with index you suggested (in MySQL 6.0)
mysql> explain select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date ‘1995-01-01’ and l_shipdate < date '1995-01-01' + interval '1' year and l_discount between 0.09 - 0.01 and 0.09 + 0.01 and l_quantity < 24; +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-----------------------------------------------+ | 1 | SIMPLE | lineitem | range | li_shp_dt_idx,l_quantity | li_shp_dt_idx | 3 | NULL | 8272067 | Using index condition; Using where; Using MRR | +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-----------------------------------------------+ 1 row in set (0.12 sec) And worse thing - TPC-H does not allow additional indexes.

Sergey Petrunia

17 years ago

Documentation for Index Condition Pushdown and MRR are on our todo.

You can turn index condition pushdown off with this statement:

SET engine_condition_pushdown = off;

and MRR with this statement:

SET optimizer_use_mrr=’disable’;

It would be interesting to see how the first query’s speed is affected by disabling index condition pushdown. Peter, could you do a re-run with after “SET engine_condition_pushdown = off;” ?

Vadim Tkachenko

Author

17 years ago

Sergey,

With SET engine_condition_pushdown = off; I get next plan:

Sergey Petrunia

17 years ago

Vadim,

thanks for the answer. So it’s not Index Condition Pushdown (which was our first guess). Would it be too much to ask to re-run with

SET engine_condition_pushdown=off; SET optimizer_use_mrr=’disable’;

so we know if the slowdown is caused by MRR or something else?

Vadim Tkachenko

Author

17 years ago

Sergey,

No problem at all while I have things setuped on my box.

mysql> SET engine_condition_pushdown=off; SET optimizer_use_mrr=’disable’;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> explain select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date ‘1995-01-01’ and l_shipdate < date '1995-01-01' + interval '1' year and l_discount between 0.09 - 0.01 and 0.09 + 0.01 and l_quantity < 24; +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-------------+ | 1 | SIMPLE | lineitem | range | li_shp_dt_idx,l_quantity | li_shp_dt_idx | 3 | NULL | 8272067 | Using where | +----+-------------+----------+-------+--------------------------+---------------+---------+------+---------+-------------+ 1 row in set (0.00 sec) mysql> select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date ‘1995-01-01’ and l_shipdate < date '1995-01-01' + interval '1' year and l_discount between 0.09 - 0.01 and 0.09 + 0.01 and l_quantity < 24; +-----------------+ | revenue | +-----------------+ | 1847603277.9361 | +-----------------+ 1 row in set (33.96 sec) As you see disabling "using MRR" makes good difference. May I ask you to explain in short words how MRR works ?

Sergey Petrunia

17 years ago

Sure. The idea behind MRR (which stands for Multi Range Read) is to perform range scan as follows:
* Scan the index, collect the rowids in a buffer (of @@read_rnd_buff_size bytes)
* Sort the rowids
* Go and retrieve full table records in one sequential ‘sweep’.

the expected improvement is that the full records are read sequentially which is faster than doing random disk probing. MRR doesn’t provide any speedup when all record data is in disk cache, but it was not expected to cause so much slowdown either.

Vadim Tkachenko

Author

17 years ago

Sergey,

Thank you for explain.
Also it is interesting why it does not give a lot of performance gain in case of cold caches.

Timour

17 years ago

Vadim,

Could it be possible that data on disk is already in the right physical order for this query? If the rows in lineitem are already ordered physically roughly in the same order as we would get by ordering the rowids we got via li_shp_dt_idx, then the overhead of MRR could be due to the unnecessary (in this case) rowid sorting.

Could you have a look at the script that populates lineitem and see whether that is the case?

Vadim Tkachenko

Author

17 years ago

Timour,

I use dbgen utility from tpc-h benchmark to generate data. It produce data for lineitem as (first N rows from generated file):

1996-03-13|1|1551894|76910|1|17|33078.94|0.04|0.02|N|O|1996-02-12|1996-03-22|DELIVER IN PERSON|egular courts above the|TRUCK|
1996-04-12|1|673091|73092|2|36|38306.16|0.09|0.06|N|O|1996-02-28|1996-04-20|TAKE BACK RETURN|ly final dependencies: slyly bold |
1996-01-29|1|636998|36999|3|8|15479.68|0.10|0.02|N|O|1996-03-05|1996-01-31|TAKE BACK RETURN|riously. regular, express dep|REG AI
1996-04-21|1|21315|46316|4|28|34616.68|0.09|0.06|N|O|1996-03-30|1996-05-16|NONE|lites. fluffily even de|AIR|
1996-03-30|1|240267|15274|5|24|28974.00|0.10|0.04|N|O|1996-03-14|1996-04-01|NONE| pending foxes. slyly re|FOB|
1996-01-30|1|156345|6348|6|32|44842.88|0.07|0.02|N|O|1996-02-07|1996-02-03|DELIVER IN PERSON|arefully slyly ex|MAIL|
1997-01-28|2|1061698|11719|1|38|63066.32|0.00|0.05|N|O|1997-01-14|1997-02-02|TAKE BACK RETURN|ven requests. deposits breach a|RA
1994-02-02|3|42970|17971|1|45|86083.65|0.06|0.00|R|F|1994-01-04|1994-02-23|NONE|ongside of the furiously brave acco|AIR|
1993-11-09|3|190355|65359|2|49|70822.15|0.10|0.00|R|F|1993-12-20|1993-11-24|TAKE BACK RETURN| unusual accounts. eve|RAIL|
1994-01-16|3|1284483|34508|3|27|39620.34|0.06|0.07|A|F|1993-11-22|1994-01-23|DELIVER IN PERSON|nal foxes wake. |SHIP|
1993-12-04|3|293797|18800|4|2|3581.56|0.01|0.06|A|F|1994-01-07|1994-01-01|NONE|y. fluffily pending d|TRUCK|
1993-12-14|3|1830941|5996|5|28|52411.80|0.04|0.00|R|F|1994-01-10|1994-01-01|TAKE BACK RETURN|ages nag slyly pending|FOB|
1993-10-29|3|621426|96445|6|26|35032.14|0.10|0.02|A|F|1993-12-18|1993-11-04|TAKE BACK RETURN|ges sleep after the caref|RAIL|
1996-01-10|4|880347|55372|1|30|39819.00|0.03|0.08|N|O|1995-12-14|1996-01-18|DELIVER IN PERSON|- quickly regular packages sleep.
1994-10-31|5|1085693|85694|1|15|25179.60|0.02|0.04|R|F|1994-08-31|1994-11-20|NONE|ts wake furiously |AIR|
1994-10-16|5|1239268|39269|2|26|31387.20|0.07|0.08|R|F|1994-09-25|1994-10-19|NONE|sts use slyly quickly special instruc|FOB|
1994-08-08|5|375302|306|3|50|68864.50|0.08|0.03|A|F|1994-10-13|1994-08-26|DELIVER IN PERSON|eodolites. fluffily unusual|AIR|
1992-04-27|6|1396355|21369|1|37|53697.73|0.08|0.03|A|F|1992-05-15|1992-05-02|TAKE BACK RETURN|p furiously special foxes|TRUCK|
1996-05-07|7|1820519|95574|1|12|17273.04|0.07|0.03|N|O|1996-03-13|1996-06-03|TAKE BACK RETURN|ss pinto beans wake against th|FOB
1996-02-01|7|1452428|77443|2|9|12423.15|0.08|0.08|N|O|1996-03-02|1996-02-19|TAKE BACK RETURN|es. instructions|SHIP|
1996-01-15|7|947798|97817|3|46|84904.50|0.10|0.07|N|O|1996-03-27|1996-02-03|COLLECT COD| unusual reques|MAIL|
1996-03-21|7|1630721|30722|4|28|46245.92|0.03|0.04|N|O|1996-04-08|1996-04-20|NONE|. slyly special requests haggl|FOB|
1996-02-11|7|1518939|93985|5|38|74398.68|0.08|0.01|N|O|1996-02-24|1996-02-18|DELIVER IN PERSON|ns haggle carefully ironic deposi
1996-01-16|7|792502|17510|6|35|55806.45|0.06|0.03|N|O|1996-02-23|1996-01-22|TAKE BACK RETURN|jole. excuses wake carefully alongs
1996-02-10|7|1572371|22402|7|5|7216.50|0.04|0.02|N|O|1996-03-26|1996-02-13|NONE|ithely regula|FOB|

and then I load it with LOAD DATA INFILE command.
As you see data it ordered by columns (l_orderkey, l_lineitem).

I can share generated files and LOAD DATA INFILE command, it takes about 8GB

Noah

17 years ago

Hi Vadim,

– Which scale factor are you using?
– How MySQL 5.1/6.0 scales with TPC-H (e.g.: scale factors 1,3,5,10,30,100)?

Thanks,

Noah

Vadim Tkachenko

Author

17 years ago

Noah,

I tried different scale factors 1, 10, 100.

I can’t say how MySQL scales because I have not done experiments.
Some queries even with scale factor 1 take hours to complete.

Piotr KoÅ‚aczkowski

17 years ago

The main cost of executing wide index range scans is NOT the order of the blocks in the table. The main cost is that the blocks are in different places in the table, so even reading them in sequential order still requires to move the disk drive’s heads from place to place. As I first read about MRR and expectations this technique should provide the same speed advantage as clustered indexes, I laughed 😉 Actually MRR would help a little only because MySQL doesn’t take advantage from asynchronous I/O support (or am I wrong?). Having async I/O makes MRR useless, as the OS / disk drive’s firmware does block ordering itself.

martin kersten

17 years ago

Is there a comprehensive list with the TPC-H performance figures for SF-1 until SF-100 for out of the box MySQL, or with all trickery of a DBA applied.

MySQL 5.7
Support

Compare Percona to Leading Database Solutions

Software
Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MySQL 6.0 vs 5.1 in TPC-H queries

Related Blog Articles

RECOMMENDED ARTICLES

MySQL January 2026 Performance Review

The Importance of Realistic Benchmark Workloads

Introducing Percona Load Generator for MongoDB Clusters: The Benchmark Tool That Simulates Your Actual Application

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MySQL 6.0 vs 5.1 in TPC-H queries

About the Author

Share This Post!

Stay up to date with the Percona Blog

Related Blog Articles

RECOMMENDED ARTICLES

MySQL January 2026 Performance Review

The Importance of Realistic Benchmark Workloads

Introducing Percona Load Generator for MongoDB Clusters: The Benchmark Tool That Simulates Your Actual Application

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7
Support

Software
Downloads