October 25, 2014

TPC-H Run on MySQL 5.1 and 6.0

We were doing MySQL Performance evaluation on TPC-H queries for the client and they kindly allowed us to publish results which are very interesting.
This is obviously not audited TPC-H run, and it can’t be because we used MyISAM tables which are not ACID complaint. Plus we only measured Power to keep things simple.

We tested 10G and 100G data sets which was CPU bound and IO bound box on the Dell 2950 box w 16G of RAM which we used for testing. Even though box had 8 cores it is little use for MySQL as only one query is ran concurrently, same can be told about 8 hard drives which this box had.
MySQL Also was very slow running some queries so we changed scripts a bit to kill extremely long running queries to get results for others this means we can’t really get a valid TPC-H result from MySQL,though at least we get to see performance of individual queries.
We also packaged the toolset we used for benchmarks so you can repeat them if you like. It can be downloaded here

So let us first take a look at MySQL 5.1.23 vs 6.0.4 results for 10GB data set which “fits in memory”. The cut-off time for this test was 1 hour so query taking over 1 hour has NA. Times are given in seconds.
Ratio is MySQL 6.0 time divided by MySQL 5.1 time so if it is less than 1 MySQL 6.0 is faster if more than 1 slower:

Query MySQL 5.0.23 MySQL 6.0.4Ratio
Query1370.50372.161.00
Query2724.63623.680.86
Query3328.25354.621.08
Query47.958.081.02
Query5154.36161.651.05
Query629.3538.511.31
Query7322.04331.051.03
Query8469.31457.180.97
Query9123.75121.330.98
Query10343.37341.230.99
Query112.322.341.01
Query1228.6838.611.35
Query1346.0047.841.04
Query148.838.230.93
Query1521.5023.611.10
Query1655.4861.421.11
Query179.009.331.04
Query18N/A1962.96N/A
Query195.035.301.05
Query202.060.260.13
Query2133.1632.980.99
Query227.848.061.03

As you can see for 10G results there is only one query which does not complete within an hour in 5.1 and MySQL 6.0 can complete all queries within an hour.
We can also see MySQL 6.0 improving query 20 speed dramatically, the rest of the queries is however close and MySQL 6.0 is even significantly slower than 5.1 for
number of queries. Honestly I expected more from MySQL 6.0 optimizer improvements effort.

100GB Results are more interesting because this is database size for which you can find results published on TPC Web site. For example These Are results recently published for Microsoft SQL Server.
Of course this is much more powerful system but still you can get an idea – This system has geomean of 7.7 sec for Power test which is good order of magnitude better compared to MySQL results on 10G database.
TPC does discourage from comparing results against different database sizes so let’s see 100G data set results.
In this case we set cut-off time to 3 hours to give MySQL more time to complete the queries:

Query MySQL 5.0.23 MySQL 6.0.4Ratio
Query13784.453737.540.99
Query2NANANA
Query3NANANA
Query4NANANA
Query5NANANA
Query6NANANA
Query74328.224533.621.05
Query88947.724122.620.46
Query9NANANA
Query10NANANA
Query112726.203395.021.25
Query12NANANA
Query13NANANA
Query142345.68?NA
Query15NANANA
Query16725.31693.560.96
Query171895.55NANA
Query18NANANA
Query194896.274682.190.96
Query203117.451450.420.47
Query21NANANA
Query22108.56100.720.93

As you can see on 100G data set MySQL 5.1 could only complete 10 out of 22 queries within 3hours run time allowed for each query and MySQL 6.0 has similar number of queries it can execute in reasonable time frame. There 2 queries (Query8 and Query20) which MySQL 6.0 does better but there is also Query11 in which significant regression is observed. Vadim has already Wrote about it in his MySQL 5.1 vs 6.0 in TPC-H Queries post

As a Summary: We can see MySQL capabilities to run complex analytics queries, in particular those presented in TPC-H benchmark are still subpar even with changes which are currently seen in MySQL 6.0 tree. There is a long way till Release and may be MySQL 6.0 performance will improve. Though considering a lot of talks about optimizer improvements in MySQL 6.0 I expected drastically better results.
Though may we should not be as surprised by these results – MySQL 6.0 still does not have hash or merge join which are frequently used for these kind of queries, and there is also no query parallelization which would allow MySQL to use CPUs and hard drives this system has more efficiently. Parallel query will be growing issue for MySQL because CPUs just continue to add more cores. Having 2 CPUs and being able to use only half of system CPU resources for query processing was not too bad, however with 16,32,64 cores it is just too bad.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Matthew Montgomery says:

    In the 3rd paragraph you probably meant this:

    “if it is less than 1 MySQL 6.0 is faster if [larger] than 1 slower:”

  2. peter says:

    Thanks. Fixed.

  3. Ivan Evtuhovich says:

    Hello, Peter.

    At this time I can’t post bug on bugs.mysql.com. When i press Add Comment on existing bug, i see “An error occured when trying to verify your login credentials. Please log out and try again.” I relogin, but problem still exists (sorry for my english).

    On site i don’t find any email address, where to post this problem.

  4. peter says:

    Ivan,

    I’m not sure who is responsible for bugs now but you can write to jay@ (Jay Pipes) and he should take care of the problem himself or ask someone to do.

  5. Igor says:

    Given that Kickfire published 100GB TPC-H results (audited) where none of the queries took more than 37 SECONDS – so for example on Q1 it’s 100 TIMES faster than result above (http://www.tpc.org/results/individual_results/Kickfire/Kickfire_2300_ES_TPCH_100G_041408.pdf) it appears MySQL/Sun have some real challenges ahead in this area (unless they’ll buy Kickfire or sue it out of existence etc).
    I’m not affiliated with Kickfire in any way, shape or form no do I know anyone working there.

  6. peter says:

    Igor,

    Kickfire is very fast for what it does, but you should consider couple things here.

    First – KickFire is positioned as Datawarehouse engine right now – it is not designed for OLTP etc.

    Second – MySQL just handled these particular queries very badly. In real MySQL world if you have such queries you either rewrite your queries or you do not use MySQL.

  7. Igor says:

    Q1 in TPC-H is one of the most basic queries there is – and it’s 100x slower using ~30K Kickfire appliance (so presumably no tuning is required – is it really the case?) vs what looks like 4-6K$ Dell server you’ve used. Time difference on other queries (except Q22) is even more dramatic.

  8. peter says:

    Sure. However you need to understand why things are the way they are.

    First this is appliance with 64G of Memory and it uses certain compression for the data which means there was quite good data fit in memory compared to 16G of memory the Dell box had.
    Furthermore Kickfire uses column store which reduces IO needed dramatically in case of large scans. It however comes at cost of accessing multiple columns becomes more expensive.

    As I mentioned Kickfire indeed handles some kind of queries just great but I would not put it as “MySQL Killer” especially until it is out of Beta and we get more hands-on experiences with it to understand Its limits.

    Infobright, NitroDB, Kickfire all promise great speed improvements for certain kind of operations and I’m quite confident things work this way for queries they have on presentation slides and in demos, however you always see best results on such marketing presentation.

    I think Kickfire has a great idea of hardware accelerator – I just do not have enough information at this point to judge how well they have implemented it in practice.

  9. Noah Freire says:

    Hi Peter,

    It would be great to know which MySQL settings you used for the 10GB test and 100GB test. What about posting your my.cnf? ;-)

    Thank you,

    -Noah

  10. martin kersten says:

    Hi Peter,

    Indeed, understanding the implications of a DBMS architecture on the application requirements is one of the hardest DBA tasks. The TPC-H and similar applications are meant to aid in this process, albeit a little. Ideally, a user does not require an extensive DBA course to handle reasonably sized database and its application.

    The development in the area of column-stores and main-memory databases is moving at great pace.
    Some throw hardware at the problem (Kickfire), others exploit different ways of software layering (Vertica, MonetDB). From an software architecture point of view understanding the pro-s and con-s are highly desirable and benchmark comparisons are a step into this direction.

    From your comments on april 22, should i conclude that you don’t consider MySQL the premiere choice for datawarehouses?
    If it is, did the performance team produce better results on the MySQL versions as presented by Peter at the beginning of this thread, including those beyond the 100GB. What are the tricks and their impact?
    I am eager to learn those from the insiders, because the performance of MySQL does not stand out against the fullscale (OLTP +DW) version of MonetDB. Any reference to independent and public available numbers, albeit non-official, are appreciated.

    regards, and keep up the good work on database kernels
    Martin

    ps http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/ for an indication. Numbers are confirmed recently, trends are the same.

  11. Teratzul says:

    Hi. I’m trying to execute the tpc-h queries on a mysql 5.0.27 system. Some of my results are somewhat similar to what you posted while others largely differ. I was wondering if I made a mistake while defining an index or something. Could I have a look at your create table scripts for the tpc-h standard ?

    Thanks in advance.

  12. Joel says:

    Peter,
    Excellent Job on your dilligent work, I love reading your blogs. I am working on a system that I beleive will put MySQL at the top of the TPC-H charts. My dilemma is that I do not see the data that is to be loaded as referrenced in the file: load_mysql.sh – PATH_DATA=/usb/rawdata/dbgen.10G

    Is that information publically available for download?

    Thanks in advance and keep up the great work!

  13. luca says:

    Hi Peter,
    i was trying to repeat some tests but dbgen won’t to complete the writing of a 10GB dataset.
    I have a “file size exceeded” when the size of lineitem table reaches 2GB. I can manage big files in my system (ext3+ related libraries)… i can’t actually understand what the problem is.
    Any ideas?

  14. peter says:

    Luca,

    It looks like you have 2G file size limit somethere. MySQL does not have 2G limit – even with small data_pointer_size for MyISAM tables you will get at last 4G

    So I’d look at OS settings, may be quota.

  15. Eddie says:

    Hi Peter,

    Re utilising multi cores, MySQL does claim to be capable of utilising multiple CPU cores. (http://dev.mysql.com/doc/refman/5.5/en/faqs-general.html#qandaitem-B-1-1-8).

    Is that a different kind of assessment or different benchmarking?
    Cheers.

  16. AfEf says:

    Hi Peter,

    Can you help me to import data from TCPH benchmark to HBase nosql Database via sqoop and Wamp Server (MySql 5.1) ?

    Thanks in advance,

Speak Your Mind

*