Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Increasing slow MySQL query performance with the parallel query execution

January 7, 2014

Author

Alexander Rubin

MySQL

Share this Post:

MySQL and Scaling-up (using more powerful hardware) was always a hot topic. Originally MySQL did not scale well with multiple CPUs; there were times when InnoDB performed poorer with more CPU cores than with fewer CPU cores. MySQL 5.6 can scale significantly better; however, there is still 1 big limitation: 1 SQL query will eventually use only 1 CPU core (no parallelism). Here is what I mean by that: let’s say we have a complex query which will need to scan millions of rows and may need to create a temporary table; in this case MySQL will not be able to scan the table in multiple threads (even with partitioning) so the single query will not be faster on the more powerful server. On the contrary, a server with more slower CPUs will show worse performance than the server with less (but faster) CPUs.

MySQL parallel query execution

To address this issue we can use a parallel query execution. Vadim wrote about the PHP asynchronous calls for MySQL. Another way to increase the parallelism will be to use “sharding” approach, for example with Shard Query. I’ve decided to test out the parallel (asynchronous) query execution with a relatively large table: I’ve used the US Flights Ontime performance database, which was originally used by Vadim in the old post Analyzing air traffic performance. Let’s see how this can help us increase the performance of the complex query reports.

Parallel Query Example

To illustrate the parallel query execution with MySQL I’ve created the following table:

CREATE TABLE `ontime` (
`YearD` year(4) NOT NULL,
`Quarter` tinyint(4) DEFAULT NULL,
`MonthD` tinyint(4) DEFAULT NULL,
`DayofMonth` tinyint(4) DEFAULT NULL,
`DayOfWeek` tinyint(4) DEFAULT NULL,
`FlightDate` date DEFAULT NULL,
`UniqueCarrier` char(7) DEFAULT NULL,
`AirlineID` int(11) DEFAULT NULL,
`Carrier` char(2) DEFAULT NULL,
`TailNum` varchar(50) DEFAULT NULL,
`FlightNum` varchar(10) DEFAULT NULL,
`OriginAirportID` int(11) DEFAULT NULL,
`OriginAirportSeqID` int(11) DEFAULT NULL,
`OriginCityMarketID` int(11) DEFAULT NULL,
`Origin` char(5) DEFAULT NULL,
`OriginCityName` varchar(100) DEFAULT NULL,
`OriginState` char(2) DEFAULT NULL,
`OriginStateFips` varchar(10) DEFAULT NULL,
`OriginStateName` varchar(100) DEFAULT NULL,
`OriginWac` int(11) DEFAULT NULL,
`DestAirportID` int(11) DEFAULT NULL,
`DestAirportSeqID` int(11) DEFAULT NULL,
`DestCityMarketID` int(11) DEFAULT NULL,
`Dest` char(5) DEFAULT NULL,
-- ... (removed number of fields)
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
KEY `YearD` (`YearD`),
KEY `Carrier` (`Carrier`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `ontime` (

`YearD` year(4) NOT NULL,

`Quarter` tinyint(4) DEFAULT NULL,

`MonthD` tinyint(4) DEFAULT NULL,

`DayofMonth` tinyint(4) DEFAULT NULL,

`DayOfWeek` tinyint(4) DEFAULT NULL,

`FlightDate` date DEFAULT NULL,

`UniqueCarrier` char(7) DEFAULT NULL,

`AirlineID` int(11) DEFAULT NULL,

`Carrier` char(2) DEFAULT NULL,

`TailNum` varchar(50) DEFAULT NULL,

`FlightNum` varchar(10) DEFAULT NULL,

`OriginAirportID` int(11) DEFAULT NULL,

`OriginAirportSeqID` int(11) DEFAULT NULL,

`OriginCityMarketID` int(11) DEFAULT NULL,

`Origin` char(5) DEFAULT NULL,

`OriginCityName` varchar(100) DEFAULT NULL,

`OriginState` char(2) DEFAULT NULL,

`OriginStateFips` varchar(10) DEFAULT NULL,

`OriginStateName` varchar(100) DEFAULT NULL,

`OriginWac` int(11) DEFAULT NULL,

`DestAirportID` int(11) DEFAULT NULL,

`DestAirportSeqID` int(11) DEFAULT NULL,

`DestCityMarketID` int(11) DEFAULT NULL,

`Dest` char(5) DEFAULT NULL,

-- ... (removed number of fields)

`id` int(11) NOT NULL AUTO_INCREMENT,

PRIMARY KEY (`id`),

KEY `YearD` (`YearD`),

KEY `Carrier` (`Carrier`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

And loaded 26 years of data into it. The table is 56G with ~152M rows.

Software: Percona 5.6.15-63.0. Hardware: Supermicro; X8DTG-D; 48G of RAM; 24xIntel(R) Xeon(R) CPU L5639 @ 2.13GHz, 1xSSD drive (250G)

So we have 24 relatively slow CPUs

Simple query

Now we can run some queries. The first query is very simple: find all flights per year (in the US):

select yeard, count(*) from ontime group by yeard

1	select yeard, count(*) from ontime group by yeard

As we have the index on YearD, the query will use the index:

mysql> explain select yeard, count(*) from ontime group by yeardG
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ontime
type: index
possible_keys: YearD,comb1
key: YearD
key_len: 1
ref: NULL
rows: 148046200
Extra: Using index
1 row in set (0.00 sec)

mysql> explain select yeard, count(*) from ontime group by yeardG

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: ontime

type: index

possible_keys: YearD,comb1

key: YearD

key_len: 1

ref: NULL

rows: 148046200

Extra: Using index

1 row in set (0.00 sec)

The query is simple, however, it will have to scan 150M rows. Here is the results of the query (cached):

mysql> select yeard, count(*) from ontime group by yeard;
+-------+----------+
| yeard | count(*) |
+-------+----------+
| 1988  | 5202096  |
| 1989  | 5041200  |
| 1990  | 5270893  |
| 1991  | 5076925  |
| 1992  | 5092157  |
| 1993  | 5070501  |
| 1994  | 5180048  |
| 1995  | 5327435  |
| 1996  | 5351983  |
| 1997  | 5411843  |
| 1998  | 5384721  |
| 1999  | 5527884  |
| 2000  | 5683047  |
| 2001  | 5967780  |
| 2002  | 5271359  |
| 2003  | 6488540  |
| 2004  | 7129270  |
| 2005  | 7140596  |
| 2006  | 7141922  |
| 2007  | 7455458  |
| 2008  | 7009726  |
| 2009  | 6450285  |
| 2010  | 6450117  |
| 2011  | 6085281  |
| 2012  | 6096762  |
| 2013  | 5349447  |
+-------+----------+
26 rows in set (54.10 sec)

mysql> select yeard, count(*) from ontime group by yeard;

+-------+----------+

| yeard | count(*) |

+-------+----------+

| 1988 | 5202096 |

| 1989 | 5041200 |

| 1990 | 5270893 |

| 1991 | 5076925 |

| 1992 | 5092157 |

| 1993 | 5070501 |

| 1994 | 5180048 |

| 1995 | 5327435 |

| 1996 | 5351983 |

| 1997 | 5411843 |

| 1998 | 5384721 |

| 1999 | 5527884 |

| 2000 | 5683047 |

| 2001 | 5967780 |

| 2002 | 5271359 |

| 2003 | 6488540 |

| 2004 | 7129270 |

| 2005 | 7140596 |

| 2006 | 7141922 |

| 2007 | 7455458 |

| 2008 | 7009726 |

| 2009 | 6450285 |

| 2010 | 6450117 |

| 2011 | 6085281 |

| 2012 | 6096762 |

| 2013 | 5349447 |

+-------+----------+

26 rows in set (54.10 sec)

The query took 54 seconds and utilized only 1 CPU core. However, this query is perfect for running in parallel. We can run 26 parallel queries, each will count its own year. I’ve used the following shell script to run the queries in the background:

#!/bin/bash
date
for y in {1988..2013}
do
  sql="select yeard, count(*) from ontime where yeard=$y"
  mysql -vvv ontime -e "$sql" &>par_sql1/$y.log & 
done
wait
date

#!/bin/bash

date

for y in {1988..2013}

sql="select yeard, count(*) from ontime where yeard=$y"

mysql -vvv ontime -e "$sql" &>par_sql1/$y.log &

done

wait

date

Here are the results:

Start: 11:41:21 EST 2014
  End: 11:41:26 EST 2014

1 2	Start: 11:41:21 EST 2014 End: 11:41:26 EST 2014

So the total execution time is ~5 (10x faster) seconds. Each individual results are here:

par_sql1/1988.log:1 row in set (3.70 sec)
par_sql1/1989.log:1 row in set (4.08 sec)
par_sql1/1990.log:1 row in set (4.59 sec)
par_sql1/1991.log:1 row in set (4.26 sec)
par_sql1/1992.log:1 row in set (4.54 sec)
par_sql1/1993.log:1 row in set (2.78 sec)
par_sql1/1994.log:1 row in set (3.41 sec)
par_sql1/1995.log:1 row in set (4.87 sec)
par_sql1/1996.log:1 row in set (4.41 sec)
par_sql1/1997.log:1 row in set (3.69 sec)
par_sql1/1998.log:1 row in set (3.56 sec)
par_sql1/1999.log:1 row in set (4.47 sec)
par_sql1/2000.log:1 row in set (4.71 sec)
par_sql1/2001.log:1 row in set (4.81 sec)
par_sql1/2002.log:1 row in set (4.19 sec)
par_sql1/2003.log:1 row in set (4.04 sec)
par_sql1/2004.log:1 row in set (5.12 sec)
par_sql1/2005.log:1 row in set (5.10 sec)
par_sql1/2006.log:1 row in set (4.93 sec)
par_sql1/2007.log:1 row in set (5.29 sec)
par_sql1/2008.log:1 row in set (5.59 sec)
par_sql1/2009.log:1 row in set (4.44 sec)
par_sql1/2010.log:1 row in set (4.91 sec)
par_sql1/2011.log:1 row in set (5.08 sec)
par_sql1/2012.log:1 row in set (4.85 sec)
par_sql1/2013.log:1 row in set (4.56 sec)

par_sql1/1988.log:1 row in set (3.70 sec)

par_sql1/1989.log:1 row in set (4.08 sec)

par_sql1/1990.log:1 row in set (4.59 sec)

par_sql1/1991.log:1 row in set (4.26 sec)

par_sql1/1992.log:1 row in set (4.54 sec)

par_sql1/1993.log:1 row in set (2.78 sec)

par_sql1/1994.log:1 row in set (3.41 sec)

par_sql1/1995.log:1 row in set (4.87 sec)

par_sql1/1996.log:1 row in set (4.41 sec)

par_sql1/1997.log:1 row in set (3.69 sec)

par_sql1/1998.log:1 row in set (3.56 sec)

par_sql1/1999.log:1 row in set (4.47 sec)

par_sql1/2000.log:1 row in set (4.71 sec)

par_sql1/2001.log:1 row in set (4.81 sec)

par_sql1/2002.log:1 row in set (4.19 sec)

par_sql1/2003.log:1 row in set (4.04 sec)

par_sql1/2004.log:1 row in set (5.12 sec)

par_sql1/2005.log:1 row in set (5.10 sec)

par_sql1/2006.log:1 row in set (4.93 sec)

par_sql1/2007.log:1 row in set (5.29 sec)

par_sql1/2008.log:1 row in set (5.59 sec)

par_sql1/2009.log:1 row in set (4.44 sec)

par_sql1/2010.log:1 row in set (4.91 sec)

par_sql1/2011.log:1 row in set (5.08 sec)

par_sql1/2012.log:1 row in set (4.85 sec)

par_sql1/2013.log:1 row in set (4.56 sec)

Complex Query

Now we can try a more complex query. Let’s imagine we want to find out which airlines have maximum delays for the flights inside the continental US during the business days from 1988 to 2009 (I was trying to come up with the complex query with multiple conditions in the where clause).

select 
   min(yeard), max(yeard), Carrier, count(*) as cnt, 
   sum(ArrDelayMinutes>30) as flights_delayed, 
   round(sum(ArrDelayMinutes>30)/count(*),2) as rate 
FROM ontime 
WHERE 
   DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') 
   and DestState not in ('AK', 'HI', 'PR', 'VI') 
   and flightdate < '2010-01-01' 
GROUP by carrier 
HAVING cnt > 100000 and max(yeard) > 1990 
ORDER by rate DESC

select

min(yeard), max(yeard), Carrier, count(*) as cnt,

sum(ArrDelayMinutes>30) as flights_delayed,

round(sum(ArrDelayMinutes>30)/count(*),2) as rate

FROM ontime

WHERE

DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI')

and DestState not in ('AK', 'HI', 'PR', 'VI')

and flightdate < '2010-01-01'

GROUP by carrier

HAVING cnt > 100000 and max(yeard) > 1990

ORDER by rate DESC

As the query has “group by” and “order by” plus multiple ranges in the where clause it will have to create a temporary table:

id: 1
select_type: SIMPLE
table: ontime
type: index
possible_keys: comb1
key: comb1
key_len: 9
ref: NULL
rows: 148046200
Extra: Using where; Using temporary; Using filesort

id: 1

select_type: SIMPLE

table: ontime

type: index

possible_keys: comb1

key: comb1

key_len: 9

ref: NULL

rows: 148046200

Extra: Using where; Using temporary; Using filesort

(for this query I’ve created the combined index: KEY comb1 (Carrier,YearD,ArrDelayMinutes) to increase performance)

The query runs in ~15 minutes:

+------------+------------+---------+----------+-----------------+------+
| min(yeard) | max(yeard) | Carrier | cnt      | flights_delayed | rate |
+------------+------------+---------+----------+-----------------+------+
|       2003 |       2009 | EV      |  1454777 |          237698 | 0.16 |
|       2006 |       2009 | XE      |  1016010 |          152431 | 0.15 |
|       2006 |       2009 | YV      |   740608 |          110389 | 0.15 |
|       2003 |       2009 | B6      |   683874 |          103677 | 0.15 |
|       2003 |       2009 | FL      |  1082489 |          158748 | 0.15 |
|       2003 |       2005 | DH      |   501056 |           69833 | 0.14 |
|       2001 |       2009 | MQ      |  3238137 |          448037 | 0.14 |
|       2003 |       2006 | RU      |  1007248 |          126733 | 0.13 |
|       2004 |       2009 | OH      |  1195868 |          160071 | 0.13 |
|       2003 |       2006 | TZ      |   136735 |           16496 | 0.12 |
|       1988 |       2009 | UA      |  9593284 |         1197053 | 0.12 |
|       1988 |       2009 | AA      | 10600509 |         1185343 | 0.11 |
|       1988 |       2001 | TW      |  2659963 |          280741 | 0.11 |
|       1988 |       2009 | CO      |  6029149 |          673863 | 0.11 |
|       2007 |       2009 | 9E      |   577244 |           59440 | 0.10 |
|       1988 |       2009 | DL      | 11869471 |         1156267 | 0.10 |
|       1988 |       2009 | NW      |  7601727 |          725460 | 0.10 |
|       1988 |       2009 | AS      |  1506003 |          146920 | 0.10 |
|       2003 |       2009 | OO      |  2654259 |          257069 | 0.10 |
|       1988 |       2009 | US      | 10276941 |          991016 | 0.10 |
|       1988 |       1991 | PA      |   206841 |           19465 | 0.09 |
|       1988 |       2005 | HP      |  2607603 |          235675 | 0.09 |
|       1988 |       2009 | WN      | 12722174 |         1107840 | 0.09 |
|       2005 |       2009 | F9      |   307569 |           28679 | 0.09 |
+------------+------------+---------+----------+-----------------+------+
24 rows in set (15 min 56.40 sec)

+------------+------------+---------+----------+-----------------+------+

+------------+------------+---------+----------+-----------------+------+

| 2003 | 2009 | EV | 1454777 | 237698 | 0.16 |

| 2006 | 2009 | XE | 1016010 | 152431 | 0.15 |

| 2006 | 2009 | YV | 740608 | 110389 | 0.15 |

| 2003 | 2009 | B6 | 683874 | 103677 | 0.15 |

| 2003 | 2009 | FL | 1082489 | 158748 | 0.15 |

| 2003 | 2005 | DH | 501056 | 69833 | 0.14 |

| 2001 | 2009 | MQ | 3238137 | 448037 | 0.14 |

| 2003 | 2006 | RU | 1007248 | 126733 | 0.13 |

| 2004 | 2009 | OH | 1195868 | 160071 | 0.13 |

| 2003 | 2006 | TZ | 136735 | 16496 | 0.12 |

| 1988 | 2009 | UA | 9593284 | 1197053 | 0.12 |

| 1988 | 2009 | AA | 10600509 | 1185343 | 0.11 |

| 1988 | 2001 | TW | 2659963 | 280741 | 0.11 |

| 1988 | 2009 | CO | 6029149 | 673863 | 0.11 |

| 2007 | 2009 | 9E | 577244 | 59440 | 0.10 |

| 1988 | 2009 | DL | 11869471 | 1156267 | 0.10 |

| 1988 | 2009 | NW | 7601727 | 725460 | 0.10 |

| 1988 | 2009 | AS | 1506003 | 146920 | 0.10 |

| 2003 | 2009 | OO | 2654259 | 257069 | 0.10 |

| 1988 | 2009 | US | 10276941 | 991016 | 0.10 |

| 1988 | 1991 | PA | 206841 | 19465 | 0.09 |

| 1988 | 2005 | HP | 2607603 | 235675 | 0.09 |

| 1988 | 2009 | WN | 12722174 | 1107840 | 0.09 |

| 2005 | 2009 | F9 | 307569 | 28679 | 0.09 |

+------------+------------+---------+----------+-----------------+------+

24 rows in set (15 min 56.40 sec)

Now we can split this query and run the 31 queries (=31 distinct airlines in this table) in parallel. I have used the following script:

date
for c in '9E' 'AA' 'AL' 'AQ' 'AS' 'B6' 'CO' 'DH' 'DL' 'EA' 'EV' 'F9' 'FL' 'HA' 'HP' 'ML' 'MQ' 'NW' 'OH' 'OO' 'PA' 'PI' 'PS' 'RU' 'TW' 'TZ' 'UA' 'US' 'WN' 'XE' 'YV'
do
   sql=" select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(ArrDelayMinutes>30) as flights_delayed, round(sum(ArrDelayMinutes>30)/count(*),2) as rate FROM ontime WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' and carrier = '$c'"
   mysql -uroot -vvv ontime -e "$sql" &>par_sql_complex/$c.log &
done
wait
date

date

for c in '9E' 'AA' 'AL' 'AQ' 'AS' 'B6' 'CO' 'DH' 'DL' 'EA' 'EV' 'F9' 'FL' 'HA' 'HP' 'ML' 'MQ' 'NW' 'OH' 'OO' 'PA' 'PI' 'PS' 'RU' 'TW' 'TZ' 'UA' 'US' 'WN' 'XE' 'YV'

sql=" select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(ArrDelayMinutes>30) as flights_delayed, round(sum(ArrDelayMinutes>30)/count(*),2) as rate FROM ontime WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' and carrier = '$c'"

mysql -uroot -vvv ontime -e "$sql" &>par_sql_complex/$c.log &

done

wait

date

In this case, we will also avoid creating a temporary table (as we have an index which starts with carrier).

Results: total time is 5 min 47 seconds (3x faster)

Start: 15:41:02 EST 2013
  End: 15:46:49 EST 2013

1 2	Start: 15:41:02 EST 2013 End: 15:46:49 EST 2013

Per query statistics:

par_sql_complex/9E.log:1 row in set (44.47 sec)
par_sql_complex/AA.log:1 row in set (5 min 41.13 sec)
par_sql_complex/AL.log:1 row in set (15.81 sec)
par_sql_complex/AQ.log:1 row in set (14.52 sec)
par_sql_complex/AS.log:1 row in set (2 min 43.01 sec)
par_sql_complex/B6.log:1 row in set (1 min 26.06 sec)
par_sql_complex/CO.log:1 row in set (3 min 58.07 sec)
par_sql_complex/DH.log:1 row in set (31.30 sec)
par_sql_complex/DL.log:1 row in set (5 min 47.07 sec)
par_sql_complex/EA.log:1 row in set (28.58 sec)
par_sql_complex/EV.log:1 row in set (2 min 6.87 sec)
par_sql_complex/F9.log:1 row in set (46.18 sec)
par_sql_complex/FL.log:1 row in set (1 min 30.83 sec)
par_sql_complex/HA.log:1 row in set (39.42 sec)
par_sql_complex/HP.log:1 row in set (2 min 45.57 sec)
par_sql_complex/ML.log:1 row in set (4.64 sec)
par_sql_complex/MQ.log:1 row in set (2 min 22.55 sec)
par_sql_complex/NW.log:1 row in set (4 min 26.67 sec)
par_sql_complex/OH.log:1 row in set (1 min 9.67 sec)
par_sql_complex/OO.log:1 row in set (2 min 14.97 sec)
par_sql_complex/PA.log:1 row in set (17.62 sec)
par_sql_complex/PI.log:1 row in set (14.52 sec)
par_sql_complex/PS.log:1 row in set (3.46 sec)
par_sql_complex/RU.log:1 row in set (40.14 sec)
par_sql_complex/TW.log:1 row in set (2 min 32.32 sec)
par_sql_complex/TZ.log:1 row in set (14.16 sec)
par_sql_complex/UA.log:1 row in set (4 min 55.18 sec)
par_sql_complex/US.log:1 row in set (4 min 38.08 sec)
par_sql_complex/WN.log:1 row in set (4 min 56.12 sec)
par_sql_complex/XE.log:1 row in set (24.21 sec)
par_sql_complex/YV.log:1 row in set (20.82 sec)

par_sql_complex/9E.log:1 row in set (44.47 sec)

par_sql_complex/AA.log:1 row in set (5 min 41.13 sec)

par_sql_complex/AL.log:1 row in set (15.81 sec)

par_sql_complex/AQ.log:1 row in set (14.52 sec)

par_sql_complex/AS.log:1 row in set (2 min 43.01 sec)

par_sql_complex/B6.log:1 row in set (1 min 26.06 sec)

par_sql_complex/CO.log:1 row in set (3 min 58.07 sec)

par_sql_complex/DH.log:1 row in set (31.30 sec)

par_sql_complex/DL.log:1 row in set (5 min 47.07 sec)

par_sql_complex/EA.log:1 row in set (28.58 sec)

par_sql_complex/EV.log:1 row in set (2 min 6.87 sec)

par_sql_complex/F9.log:1 row in set (46.18 sec)

par_sql_complex/FL.log:1 row in set (1 min 30.83 sec)

par_sql_complex/HA.log:1 row in set (39.42 sec)

par_sql_complex/HP.log:1 row in set (2 min 45.57 sec)

par_sql_complex/ML.log:1 row in set (4.64 sec)

par_sql_complex/MQ.log:1 row in set (2 min 22.55 sec)

par_sql_complex/NW.log:1 row in set (4 min 26.67 sec)

par_sql_complex/OH.log:1 row in set (1 min 9.67 sec)

par_sql_complex/OO.log:1 row in set (2 min 14.97 sec)

par_sql_complex/PA.log:1 row in set (17.62 sec)

par_sql_complex/PI.log:1 row in set (14.52 sec)

par_sql_complex/PS.log:1 row in set (3.46 sec)

par_sql_complex/RU.log:1 row in set (40.14 sec)

par_sql_complex/TW.log:1 row in set (2 min 32.32 sec)

par_sql_complex/TZ.log:1 row in set (14.16 sec)

par_sql_complex/UA.log:1 row in set (4 min 55.18 sec)

par_sql_complex/US.log:1 row in set (4 min 38.08 sec)

par_sql_complex/WN.log:1 row in set (4 min 56.12 sec)

par_sql_complex/XE.log:1 row in set (24.21 sec)

par_sql_complex/YV.log:1 row in set (20.82 sec)

As we can see there are large airlines (like AA, UA, US, DL, etc) which took most of the time. In this case, the load will not be distributed evenly as in the previous example; however, by running the query in parallel we have got 3x times better response time on this server.

CPU utilization:

Cpu3 : 22.0%us, 1.2%sy, 0.0%ni, 74.4%id, 2.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 16.0%us, 0.0%sy, 0.0%ni, 84.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 39.0%us, 1.2%sy, 0.0%ni, 56.1%id, 3.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 33.3%us, 0.0%sy, 0.0%ni, 51.9%id, 13.6%wa, 0.0%hi, 1.2%si, 0.0%st
Cpu7 : 33.3%us, 1.2%sy, 0.0%ni, 48.8%id, 16.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 24.7%us, 0.0%sy, 0.0%ni, 60.5%id, 14.8%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 24.4%us, 0.0%sy, 0.0%ni, 56.1%id, 19.5%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 40.7%us, 0.0%sy, 0.0%ni, 56.8%id, 2.5%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 19.5%us, 1.2%sy, 0.0%ni, 65.9%id, 12.2%wa, 0.0%hi, 1.2%si, 0.0%st
Cpu12 : 40.2%us, 1.2%sy, 0.0%ni, 56.1%id, 2.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 : 82.7%us, 0.0%sy, 0.0%ni, 17.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 55.4%us, 0.0%sy, 0.0%ni, 43.4%id, 1.2%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 86.6%us, 0.0%sy, 0.0%ni, 13.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu16 : 61.0%us, 1.2%sy, 0.0%ni, 37.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 29.3%us, 1.2%sy, 0.0%ni, 69.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 18.8%us, 0.0%sy, 0.0%ni, 52.5%id, 28.8%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 14.3%us, 1.2%sy, 0.0%ni, 57.1%id, 27.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 12.3%us, 0.0%sy, 0.0%ni, 59.3%id, 28.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 10.7%us, 0.0%sy, 0.0%ni, 76.2%id, 11.9%wa, 0.0%hi, 1.2%si, 0.0%st
Cpu22 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 10.8%us, 2.4%sy, 0.0%ni, 71.1%id, 15.7%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu3 : 22.0%us, 1.2%sy, 0.0%ni, 74.4%id, 2.4%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu4 : 16.0%us, 0.0%sy, 0.0%ni, 84.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu5 : 39.0%us, 1.2%sy, 0.0%ni, 56.1%id, 3.7%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu6 : 33.3%us, 0.0%sy, 0.0%ni, 51.9%id, 13.6%wa, 0.0%hi, 1.2%si, 0.0%st

Cpu7 : 33.3%us, 1.2%sy, 0.0%ni, 48.8%id, 16.7%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu8 : 24.7%us, 0.0%sy, 0.0%ni, 60.5%id, 14.8%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu9 : 24.4%us, 0.0%sy, 0.0%ni, 56.1%id, 19.5%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu10 : 40.7%us, 0.0%sy, 0.0%ni, 56.8%id, 2.5%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu11 : 19.5%us, 1.2%sy, 0.0%ni, 65.9%id, 12.2%wa, 0.0%hi, 1.2%si, 0.0%st

Cpu12 : 40.2%us, 1.2%sy, 0.0%ni, 56.1%id, 2.4%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu13 : 82.7%us, 0.0%sy, 0.0%ni, 17.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu14 : 55.4%us, 0.0%sy, 0.0%ni, 43.4%id, 1.2%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu15 : 86.6%us, 0.0%sy, 0.0%ni, 13.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu16 : 61.0%us, 1.2%sy, 0.0%ni, 37.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu17 : 29.3%us, 1.2%sy, 0.0%ni, 69.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu18 : 18.8%us, 0.0%sy, 0.0%ni, 52.5%id, 28.8%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu19 : 14.3%us, 1.2%sy, 0.0%ni, 57.1%id, 27.4%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu20 : 12.3%us, 0.0%sy, 0.0%ni, 59.3%id, 28.4%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu21 : 10.7%us, 0.0%sy, 0.0%ni, 76.2%id, 11.9%wa, 0.0%hi, 1.2%si, 0.0%st

Cpu22 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu23 : 10.8%us, 2.4%sy, 0.0%ni, 71.1%id, 15.7%wa, 0.0%hi, 0.0%si, 0.0%st

Note that in case of “order by” we will need to manually sort the results, however, sorting 10-100 rows will be fast.

Conclusion

Splitting a complex report into multiple queries and running it in parallel (asynchronously) can increase performance (3x to 10x in the above example) and will better utilize modern hardware. It is also possible to split the queries between multiple MySQL servers (i.e. MySQL slave servers) to further increase scalability (will require more coding).

0 0 votes

Article Rating

23 Comments

Oldest

Newest Most Voted

aftab

12 years ago

Thanks Alex, have you tried to run your split queries sequentially? if so, what was the total time?

Author

Alexander Rubin

12 years ago

Aftab,

Yes, I’ve modified the script for the complex query example and run it without placing “mysql” cli in the background:
total time: 16m42.318s (compared to 5 min)

Per query statistics:

seq_sql_complex/9E.log:1 row in set (10.80 sec)
seq_sql_complex/AA.log:1 row in set (1 min 59.09 sec)
seq_sql_complex/AL.log:1 row in set (2.17 sec)
seq_sql_complex/AQ.log:1 row in set (0.80 sec)
seq_sql_complex/AS.log:1 row in set (23.96 sec)
seq_sql_complex/B6.log:1 row in set (10.56 sec)
seq_sql_complex/CO.log:1 row in set (1 min 1.11 sec)
seq_sql_complex/DH.log:1 row in set (3.97 sec)
seq_sql_complex/DL.log:1 row in set (2 min 10.92 sec)
seq_sql_complex/EA.log:1 row in set (4.72 sec)
seq_sql_complex/EV.log:1 row in set (23.61 sec)
seq_sql_complex/F9.log:1 row in set (4.35 sec)
seq_sql_complex/FL.log:1 row in set (13.78 sec)
seq_sql_complex/HA.log:1 row in set (3.26 sec)
seq_sql_complex/HP.log:1 row in set (25.43 sec)
seq_sql_complex/ML.log:1 row in set (0.46 sec)
seq_sql_complex/MQ.log:1 row in set (35.88 sec)
seq_sql_complex/NW.log:1 row in set (1 min 10.91 sec)
seq_sql_complex/OH.log:1 row in set (10.42 sec)
seq_sql_complex/OO.log:1 row in set (35.20 sec)
seq_sql_complex/PA.log:1 row in set (1.79 sec)
seq_sql_complex/PI.log:1 row in set (4.66 sec)
seq_sql_complex/PS.log:1 row in set (0.22 sec)
seq_sql_complex/RU.log:1 row in set (7.70 sec)
seq_sql_complex/TW.log:1 row in set (27.48 sec)
seq_sql_complex/TZ.log:1 row in set (1.13 sec)
seq_sql_complex/UA.log:1 row in set (1 min 42.88 sec)
seq_sql_complex/US.log:1 row in set (1 min 44.81 sec)
seq_sql_complex/WN.log:1 row in set (2 min 18.02 sec)
seq_sql_complex/XE.log:1 row in set (12.55 sec)
seq_sql_complex/YV.log:1 row in set (9.47 sec)

As we can see the maximum query execution times decreased (2 min vs 5 min), however, by running it in parallel we have significant faster response.

Justin Swanhart

12 years ago

Last time I checked we already had a parallel query tool called Shard-Query.

It is much more full featured than such a simple script (though it started out basically like one).

Please take the time to look into it before reinventing the wheel.

Thanks

Author

Alexander Rubin

12 years ago

Justin, yes, I’ve mention the ShardQuery in the beginning of the post. The idea is to show how MySQL 5.6 will handle the parallel query execution load, I’m not trying to re-invent the wheel of cause.

Justin Swanhart

12 years ago

My point is that you spent time writing and testing a script, etc, when you could have actually tested and blogged about shard-query. It has a SQL explain feature and you can explain exactly what it is doing just as easily as this. I can’t be the only one to blog about it.

Justin Swanhart

12 years ago

Just a minor correction to your post then : Shard-Query doesn’t need sharding. It works with a single table. You could use an IN clause to get the same results with shard-query, or just partition the table and it would “just work”.

step 1)
create partitioned table
step 2)
load data into partitioned table
step 3)
set up shard query to use one server
step 4)
access server with shard-query and voila instant parallel queries
step 5)
access shard-query with mysql-proxy and voila parallel transparent mysql server for OLAP queries

Author

Alexander Rubin

12 years ago

Justin, sure, good point, I will test it out with the same data on the same server and write a follow-up post. Thank you for pointing it out!

Justin Swanhart

12 years ago

Hi,

Let me know if you need any assistance.

Thanks!

Rick James

12 years ago

Really big queries end up being I/O bound. I suspect (without proof) that parallelizing I/O bound queries will show a smaller improvement — because the I/O subsystem is limited. RAID striping (-1/5/10) may or may not help; it depends on how well the original, non-parallelized, query could make use of read-ahead.

Do either of you have good benchmarks of such?

Justin Swanhart

12 years ago

Hi Rick,

The idea here is to get as close to sequential reads as possible over many partitions. In order for that to work best you have to either use SSD or RAID with many spindles, ideally using MySQL 5.6 to place partitions on different devices. For example, in EC2 you could use a large instance with four EBS devices, each RAID0 and place a subset of partitions on each of the EBS devices. Or you could stripe all four together – it depends on your use case. For SSD, parallel query is ideal, because a single thread can’t read more than 50MB/sec or so in MySQL. Multiple threads can sustain much higher IO rates on the SSD devices.

Eventually though, you will hit an IO bottleneck on the server.

Here you typically look at more options and they can be combined:
a) column store – this reduces IO by reducing the amount of data that has to be examined for large queries, especially on wide fact tables
b) sharding – queries are distributed over multiple nodes, where each node has a subset of the total data set. Most sharding solutions can’t distribute OLAP queries, but Shard-Query can
c) materialized views – insert-only data sets are easy to materialize, and Flexviews supports incrementally refreshing materialized views over data sets with updates. OLAP tools like Mondrian can automatically use the materialized views (summary tables) to answer queries, reducing IO dramatically.

The optimal solution is usually combining a column store with sharding or materialized views with sharding. A combination of all three would work, but MySQL column stores don’t record binlogs or support triggers, so Flexviews can’t be used with them.

I will do some benchmarks and post the results.

Sergey

12 years ago

Is it possible to hire Alexander for tuning my project ?

Author

Alexander Rubin

12 years ago

Sergey, sure I will be more than happy to help you via our Consulting (https://www.percona.com/products/mysql-consulting/overview) offering. I will send you a separate email as well.

Colin MacKenzie

10 years ago

I’ve used this technique but it seems like the threads in mysql are getting stalled. prstat reports cpu usage low. iostat reports low io usage. Sometimes the same query (same criteria) is 10 secs and other times it’s 2 minutes.

I’ve set transaction isolation globally to READ UNCOMMITTED (perfect for us, we are ok with dirty reads). Table is partitioned to match the sharding, and WHERE includes the partition hash value. Our app is primarily single-user.

I’m still on 5.5.37, perhaps I should upgrade to 5.6. Are there issues where these threads might have lock contention with each other? I am using read-uncommitted so it shouldnt lock anyway.

Thanks in advance for any suggestions you might have!
Colin

Programster

9 years ago

‘@Justin Swanhart doesn’t Shard-Query require partitioning, yet this solution does not? I am not ready to give up my foreign-keys any time soon which you cannot have on partitioned tables. Is there really no RDBMS or storage engine that will automatically parallelize queries?

Jacob

9 years ago

Reply to Programster

I ran multiple queries in parallel on the same MySQL instance in a way very similar to the one described in this article using MySQL 5.6.23 with SSD. I easily got order of magnitude performance improvement. I was able to run up to 100 queries in parallel without significant I/O bottleneck. Once I exceeded this approximate 100 threshold the queue of queries started to get significantly longer. It is transparent and reliable solution.

I tried to install Justin Swanhart’s Shard-Query solution, but got nasty installation errors and gave up after spending some time.

The case when you have to run multiple almost identical queries with difference in only one or two field values is very common.

It is essentially queries that look like
SELECT a,b,c,d, …
WHERE ID IN (1,2,3,…N);

or something like that. You can split it on N queries and run them in parallel. MySQL is great in multi-threading.

This problem is very low hanging fruit waiting to be picked up.

Jacob

Laura

8 years ago

can you tell me how to split queries between multiple MySQL servers (slave servers) ?. I have set up one master and one slave but i don´t know how to split the query in both

Jacob

8 years ago

Reply to Laura

Queries for multiple servers could be managed using application. The application issues query to each server and combines the results into one set.

Laura

8 years ago

Reply to Jacob

Thanks for your reply! .. i have one question, which application do you mean?

Jacob

8 years ago

Application is a program that uses the data acquired from database. It is the data end user.

Laura

8 years ago

So, i should write a program to split mysql queries in the master and in the slave?

i mean, something like this but openning a connection to the slave too to send queries of some years

#!/bin/bash
date
for y in {1988..2013}
do
sql=”select yeard, count(*) from ontime where yeard=$y”
mysql -vvv ontime -e “$sql” &>par_sql1/$y.log &
done
wait
date

Author

Alexander Rubin

8 years ago

Reply to Laura

Correct. For the reporting purposes you can potentially use Apache Spark (https://www.percona.com/blog/2016/08/17/apache-spark-makes-slow-mysql-queries-10x-faster/). So you will need:
1. Virtual IP across multiple servers (i.e. HAproxy or ProxySQL)
2. Apache spark + JDBC to connect to MySQL.

Lirezh

7 years ago

It would be interesting to see how your query behaves if you use a few different techniques.
JOINING it multiple times or using UNION on the same table.
Maybe you can trick mysql to make multiple threads?