Parallel query execution is my favorite, non-existent, feature in MySQL. In all versions of MySQL – at least at the time of writing – when you run a single query it will run in one thread, effectively utilizing one CPU core only. Multiple queries run at the same time will be using different threads and will utilize more than one CPU core.
On multi-core machines – which is the majority of the hardware nowadays – and in the cloud, we have multiple cores available for use. With faster disks (i.e. SSD) we can’t utilize the full potential of IOPS with just one thread.
AWS Aurora (based on MySQL 5.6) now has a version which will support parallelism for SELECT queries (utilizing the read capacity of storage nodes underneath the Aurora cluster). In this article, we will look at how this can improve the reporting/analytical query performance in MySQL. I will compare AWS Aurora with MySQL (Percona Server) 5.6 running on an EC2 instance of the same class.
Aurora Parallel Query response time (for queries which can not use indexes) can be 5x-10x better compared to the non-parallel fully cached operations. This is a significant improvement for the slow queries.
For my test, I need to choose:
According to Jeff Barr’s excellent article (https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/) the following instance classes will support parallel query (PQ):
“The instance class determines the number of parallel queries that can be active at a given time:
As I want to maximize the concurrency of parallel query sessions, I have chosen db.r4.8xlarge. For the EC2 instance I will use the same class: r4.8xlarge.
|
1 |
mysql> show global variables like '%version%';<br>+-------------------------+------------------------------+<br>| Variable_name | Value |<br>+-------------------------+------------------------------+<br>| aurora_version | 1.18.0 |<br>| innodb_version | 1.2.10 |<br>| protocol_version | 10 |<br>| version | 5.6.10 |<br>| version_comment | MySQL Community Server (GPL) |<br>| version_compile_machine | x86_64 |<br>| version_compile_os | Linux |<br>+-------------------------+------------------------------+ |
|
1 |
mysql> show global variables like '%version%';<br>+-------------------------+------------------------------------------------------+<br>| Variable_name | Value |<br>+-------------------------+------------------------------------------------------+<br>| innodb_version | 5.6.41-84.1 |<br>| protocol_version | 10 |<br>| slave_type_conversions | |<br>| tls_version | TLSv1.1,TLSv1.2 |<br>| version | 5.6.41-84.1 |<br>| version_comment | Percona Server (GPL), Release 84.1, Revision b308619 |<br>| version_compile_machine | x86_64 |<br>| version_compile_os | debian-linux-gnu |<br>| version_suffix | |<br>+-------------------------+------------------------------------------------------+<br> |
I’m using the “Airlines On-Time Performance” database from http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time (You can find the scripts I used here: https://github.com/Percona-Lab/ontime-airline-performance).
|
1 |
mysql> show table status like 'ontime'G<br>*************************** 1. row ***************************<br> Name: ontime<br> Engine: InnoDB<br> Version: 10<br> Row_format: Compact<br> Rows: 173221661<br>Avg_row_length: 409<br> Data_length: 70850183168<br>Max_data_length: 0<br> Index_length: 0<br> Data_free: 7340032<br>Auto_increment: NULL<br> Create_time: 2018-09-26 02:03:28<br> Update_time: NULL<br> Check_time: NULL<br> Collation: latin1_swedish_ci<br> Checksum: NULL<br>Create_options:<br> Comment:<br>1 row in set (0.00 sec) |
The table is very wide, 84 columns.
Documentation: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-mysql-parallel-query.html
Aurora PQ works by doing a full table scan (parallel reads are done on the storage level). The InnoDB buffer pool is not used when Parallel Query is utilized.
For the purposes of the test I turned PQ on and off (normally AWS Aurora uses its own heuristics to determine if the PQ will be helpful or not):
Turn on and force:
|
1 |
mysql> set session aurora_pq = 1;<br>Query OK, 0 rows affected (0.00 sec)<br>mysql> set aurora_pq_force = 1;<br>Query OK, 0 rows affected (0.00 sec)<br> |
Turn off:
|
1 |
mysql> set session aurora_pq = 0;<br>Query OK, 0 rows affected (0.00 sec) |
The EXPLAIN plan in MySQL will also show the details about parallel query execution statistics.
Here, I use the “reporting” queries, running only one query at a time. The queries are similar to those I’ve used in older blog posts comparing MySQL and Apache Spark performance (https://www.percona.com/blog/2016/08/17/apache-spark-makes-slow-mysql-queries-10x-faster/ )
Here is a summary of the queries:
|
1 |
select SQL_CALC_FOUND_ROWS<br>FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest<br>FROM ontime<br>WHERE<br> DestState not in ('AK', 'HI', 'PR', 'VI')<br> and OriginState not in ('AK', 'HI', 'PR', 'VI')<br> and flightdate > '2015-01-01'<br> and ArrDelay < 15<br>and cancelled = 0<br>and Diverted = 0<br>and DivAirportLandings = 0<br> ORDER by DepDelay DESC<br>LIMIT 10; |
3. Complex filter, join “reference” table
|
1 |
select SQL_CALC_FOUND_ROWS<br>FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay<br>FROM ontime_ind o<br>JOIN carriers c on o.carrier = c.carrier_code<br>WHERE<br> (carrier_name like 'United%' or carrier_name like 'Delta%')<br> and ArrDelay > 30<br> ORDER by DepDelay DESC<br>LIMIT 10G |
4. select one row only, no index
Let’s take a look at the most simple query: count(*). This variant of the “ontime” table has no secondary indexes.
|
1 |
select count(*) from ontime where flightdate > '2017-01-01'; |
I disabled the PQ first to compare:
|
1 |
mysql> select count(*) from ontime where flightdate > '2017-01-01';<br>+----------+<br>| count(*) |<br>+----------+<br>| 5660651 |<br>+----------+<br>1 row in set (8 min 25.49 sec)<br><br>mysql> select count(*) from ontime where flightdate > '2017-01-01';<br>+----------+<br>| count(*) |<br>+----------+<br>| 5660651 |<br>+----------+<br>1 row in set (2 min 48.81 sec)<br><br>mysql> mysql> select count(*) from ontime where flightdate > '2017-01-01';<br>+----------+<br>| count(*) |<br>+----------+<br>| 5660651 |<br>+----------+<br>1 row in set (2 min 48.25 sec)<br><br>Please note: the first run was "cold run"; data was read from disk. The second and third run used the cached data.<br><br>Now let's enable and force Aurora PQ:<br><br>mysql> set session aurora_pq = 1;<br>Query OK, 0 rows affected (0.00 sec)<br>mysql> set aurora_pq_force = 1; <br>Query OK, 0 rows affected (0.00 sec)<br><br>mysql> explain select count(*) from ontime where flightdate > '2017-01-01'G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: ontime<br> type: ALL<br>possible_keys: NULL<br> key: NULL<br> key_len: NULL<br> ref: NULL<br> rows: 173706586<br> Extra: Using where; Using parallel query (1 columns, 1 filters, 0 exprs; 0 extra)<br>1 row in set (0.00 sec)<br> |
(from the EXPLAIN plan, we can see that parallel query is used).
Results:
|
1 |
mysql> select count(*) from ontime where flightdate > '2017-01-01'; <br>+----------+<br>| count(*) |<br>+----------+<br>| 5660651 |<br>+----------+<br>1 row in set (16.53 sec)<br><br>mysql> select count(*) from ontime where flightdate > '2017-01-01';<br>+----------+<br>| count(*) |<br>+----------+<br>| 5660651 |<br>+----------+<br>1 row in set (16.56 sec)<br><br>mysql> select count(*) from ontime where flightdate > '2017-01-01';<br>+----------+<br>| count(*) |<br>+----------+<br>| 5660651 |<br>+----------+<br>1 row in set (16.36 sec)<br><br>mysql> select count(*) from ontime where flightdate > '2017-01-01';<br>+----------+<br>| count(*) |<br>+----------+<br>| 5660651 |<br>+----------+<br>1 row in set (16.56 sec)<br><br>mysql> select count(*) from ontime where flightdate > '2017-01-01';<br>+----------+<br>| count(*) |<br>+----------+<br>| 5660651 |<br>+----------+<br>1 row in set (16.36 sec) |
As we can see the results are very stable. It does not use any cache (ie: innodb buffer pool) either. The result is also interesting: utilizing multiple threads (up to 16 threads) and reading data from disk (using disk cache, probably) can be ~10x faster compared to reading from memory in a single thread.
Result: ~10x performance gain, no index used
|
1 |
set aurora_pq = 1; set aurora_pq_force=1;<br><br>select avg(DepDelay) from ontime;<br>+---------------+<br>| avg(DepDelay) |<br>+---------------+<br>| 8.2666 |<br>+---------------+<br>1 row in set (1 min 48.17 sec)<br><br><br>set aurora_pq = 0; set aurora_pq_force=0; <br><br>select avg(DepDelay) from ontime;<br>+---------------+<br>| avg(DepDelay) |<br>+---------------+<br>| 8.2666 |<br>+---------------+<br><br>1 row in set (2 min 49.95 sec)<br><br>Here we can see that PQ gives use ~2x performance increase. |
Here is what we learned comparing Aurora PQ performance to native MySQL query execution:
The following query will always be slow in MySQL. This combination of the filters in the WHERE condition makes it extremely hard to prepare a good set of indexes to make this query faster.
|
1 |
select SQL_CALC_FOUND_ROWS<br>FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest<br>FROM ontime<br>WHERE<br> DestState not in ('AK', 'HI', 'PR', 'VI')<br> and OriginState not in ('AK', 'HI', 'PR', 'VI')<br> and flightdate > '2015-01-01'<br> and ArrDelay < 15<br>and cancelled = 0<br>and Diverted = 0<br>and DivAirportLandings = '0'<br>ORDER by DepDelay DESC<br>LIMIT 10; |
Let’s compare the query performance with and without PQ.
PQ disabled:
|
1 |
mysql> set aurora_pq_force = 0;<br>Query OK, 0 rows affected (0.00 sec)<br>mysql> set aurora_pq = 0; <br>Query OK, 0 rows affected (0.00 sec)<br><br>mysql> explain select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01' and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: ontime<br> type: ALL<br>possible_keys: NULL<br> key: NULL<br> key_len: NULL<br> ref: NULL<br> rows: 173706586<br> Extra: Using where; Using filesort<br>1 row in set (0.00 sec)<br><br>mysql> select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01' and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10;<br>+------------+---------+-----------+--------+------+<br>| FlightDate | carrier | FlightNum | Origin | Dest |<br>+------------+---------+-----------+--------+------+<br>| 2017-10-09 | OO | 5028 | SBP | SFO |<br>| 2015-11-03 | VX | 969 | SAN | SFO |<br>| 2015-05-29 | VX | 720 | TUL | AUS |<br>| 2016-03-11 | UA | 380 | SFO | BOS |<br>| 2016-06-13 | DL | 2066 | JFK | SAN |<br>| 2016-11-14 | UA | 1600 | EWR | LAX |<br>| 2016-11-09 | WN | 2318 | BDL | LAS |<br>| 2016-11-09 | UA | 1652 | IAD | LAX |<br>| 2016-11-13 | AA | 23 | JFK | LAX |<br>| 2016-11-12 | UA | 800 | EWR | SFO |<br>+------------+---------+-----------+--------+------+ |
10 rows in set (3 min 42.47 sec)
/* another run */
10 rows in set (3 min 46.90 sec)
This query is 100% cached. Here is the graph from PMM showing the number of read requests:
Buffer pool requests:
Now let’s enable and force PQ:
|
1 |
mysql> set session aurora_pq = 1;<br>Query OK, 0 rows affected (0.00 sec)<br><br>mysql> set aurora_pq_force = 1; Query OK, 0 rows affected (0.00 sec)<br><br>mysql> explain select SQL_CALC_FOUND_ROWS FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE DestState not in ('AK', 'HI', 'PR', 'VI') and OriginState not in ('AK', 'HI', 'PR', 'VI') and flightdate > '2015-01-01' and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: ontime<br> type: ALL<br>possible_keys: NULL<br> key: NULL<br> key_len: NULL<br> ref: NULL<br> rows: 173706586<br> Extra: Using where; Using filesort; Using parallel query (12 columns, 4 filters, 3 exprs; 0 extra)<br>1 row in set (0.00 sec)<br><br><br>mysql> select SQL_CALC_FOUND_ROWS -> FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest -> FROM ontime<br> -> WHERE<br> -> DestState not in ('AK', 'HI', 'PR', 'VI')<br> -> and OriginState not in ('AK', 'HI', 'PR', 'VI')<br> -> and flightdate > '2015-01-01'<br> -> and ArrDelay < 15<br> -> and cancelled = 0<br> -> and Diverted = 0<br> -> and DivAirportLandings = 0<br> -> ORDER by DepDelay DESC<br> -> LIMIT 10;<br>+------------+---------+-----------+--------+------+<br>| FlightDate | carrier | FlightNum | Origin | Dest |<br>+------------+---------+-----------+--------+------+<br>| 2017-10-09 | OO | 5028 | SBP | SFO |<br>| 2015-11-03 | VX | 969 | SAN | SFO |<br>| 2015-05-29 | VX | 720 | TUL | AUS |<br>| 2016-03-11 | UA | 380 | SFO | BOS |<br>| 2016-06-13 | DL | 2066 | JFK | SAN |<br>| 2016-11-14 | UA | 1600 | EWR | LAX |<br>| 2016-11-09 | WN | 2318 | BDL | LAS |<br>| 2016-11-09 | UA | 1652 | IAD | LAX |<br>| 2016-11-13 | AA | 23 | JFK | LAX |<br>| 2016-11-12 | UA | 800 | EWR | SFO |<br>+------------+---------+-----------+--------+------+<br>10 rows in set (41.88 sec)<br>/* run 2 */<br>10 rows in set (28.49 sec)<br>/* run 3 */<br>10 rows in set (29.60 sec)<br> |
Now let’s compare the requests:
As we can see, Aurora PQ is almost NOT utilizing the buffer pool (there are a minor number of read requests. Compare the max of 4K requests per second with PQ to the constant 600K requests per second in the previous graph).
Result: ~8x performance gain
In this example I join two tables: the main “ontime” table and a reference table. If we have both tables without indexes it will simply be too slow in MySQL. To make it better, I have created an index for both tables and so it will use indexes for the join:
|
1 |
CREATE TABLE `carriers` (<br> `carrier_code` varchar(8) NOT NULL DEFAULT '',<br> `carrier_name` varchar(200) DEFAULT NULL,<br> PRIMARY KEY (`carrier_code`),<br> KEY `carrier_name` (`carrier_name`)<br>) ENGINE=InnoDB DEFAULT CHARSET=latin1<br><br>mysql> show create table ontime_indG<br>...<br> PRIMARY KEY (`id`),<br> KEY `comb1` (`Carrier`,`Year`,`ArrDelayMinutes`),<br> KEY `FlightDate` (`FlightDate`)<br>) ENGINE=InnoDB AUTO_INCREMENT=178116912 DEFAULT CHARSET=latin1 |
Query:
|
1 |
select SQL_CALC_FOUND_ROWS<br>FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay<br>FROM ontime_ind o<br>JOIN carriers c on o.carrier = c.carrier_code<br>WHERE<br> (carrier_name like 'United%' or carrier_name like 'Delta%')<br> and ArrDelay > 30<br> ORDER by DepDelay DESC<br>LIMIT 10G |
PQ disabled, explain plan:
|
1 |
mysql> set aurora_pq_force = 0;<br>Query OK, 0 rows affected (0.00 sec)<br><br>mysql> set aurora_pq = 0; <br>Query OK, 0 rows affected (0.00 sec)<br><br>mysql> explain<br> -> select SQL_CALC_FOUND_ROWS<br> -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay<br> -> FROM ontime_ind o<br> -> JOIN carriers c on o.carrier = c.carrier_code<br> -> WHERE<br> -> (carrier_name like 'United%' or carrier_name like 'Delta%')<br> -> and ArrDelay > 30<br> -> ORDER by DepDelay DESC<br> -> LIMIT 10G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: c<br> type: range<br>possible_keys: PRIMARY,carrier_name<br> key: carrier_name<br> key_len: 203<br> ref: NULL<br> rows: 3<br> Extra: Using where; Using index; Using temporary; Using filesort<br>*************************** 2. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: o<br> type: ref<br>possible_keys: comb1<br> key: comb1<br> key_len: 3<br> ref: ontime.c.carrier_code<br> rows: 2711597<br> Extra: Using index condition; Using where<br>2 rows in set (0.01 sec) |
As we can see MySQL uses indexes for the join. Response times:
/* run 1 – cold run */
10 rows in set (29 min 17.39 sec)
/* run 2 – warm run */
10 rows in set (2 min 45.16 sec)
PQ enabled, explain plan:
|
1 |
mysql> explain<br> -> select SQL_CALC_FOUND_ROWS<br> -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay<br> -> FROM ontime_ind o<br> -> JOIN carriers c on o.carrier = c.carrier_code<br> -> WHERE<br> -> (carrier_name like 'United%' or carrier_name like 'Delta%')<br> -> and ArrDelay > 30<br> -> ORDER by DepDelay DESC<br> -> LIMIT 10G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: c<br> type: ALL<br>possible_keys: PRIMARY,carrier_name<br> key: NULL<br> key_len: NULL<br> ref: NULL<br> rows: 1650<br> Extra: Using where; Using temporary; Using filesort; Using parallel query (2 columns, 0 filters, 1 exprs; 0 extra)<br>*************************** 2. row ***************************<br> id: 1<br> select_type: SIMPLE<br> table: o<br> type: ALL<br>possible_keys: comb1<br> key: NULL<br> key_len: NULL<br> ref: NULL<br> rows: 173542245<br> Extra: Using where; Using join buffer (Hash Join Outer table o); Using parallel query (11 columns, 1 filters, 1 exprs; 0 extra)<br>2 rows in set (0.00 sec) |
As we can see, Aurora does not use any indexes and uses a parallel scan instead.
Response time:
|
1 |
mysql> select SQL_CALC_FOUND_ROWS<br> -> FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay<br> -> FROM ontime_ind o<br> -> JOIN carriers c on o.carrier = c.carrier_code<br> -> WHERE<br> -> (carrier_name like 'United%' or carrier_name like 'Delta%')<br> -> and ArrDelay > 30<br> -> ORDER by DepDelay DESC<br> -> LIMIT 10G<br>...<br><br>*************************** 4. row ***************************<br><br> FlightDate: 2017-05-04<br>UniqueCarrier: UA<br> TailNum: N68821<br> FlightNum: 1205<br> Origin: KOA<br>OriginCityName: Kona, HI<br> Dest: LAX<br> DestCityName: Los Angeles, CA<br> DepDelay: 1457<br> ArrDelay: 1459<br>*************************** 5. row ***************************<br> FlightDate: 1991-03-12<br>UniqueCarrier: DL<br> TailNum:<br> FlightNum: 1118<br> Origin: ATL<br>OriginCityName: Atlanta, GA<br> Dest: STL<br> DestCityName: St. Louis, MO<br>...<br><br>10 rows in set (28.78 sec)<br><br><br>mysql> select found_rows();<br><br>+--------------+<br>| found_rows() |<br>+--------------+<br>| 4180974 |<br>+--------------+<br><br>1 row in set (0.00 sec) |
Result: ~5x performance gain
(this is actually comparing the index cached read to a non-index PQ execution)
Aurora PQ can significantly improve the performance of reporting queries as such queries may be extremely hard to optimize in MySQL, even when using indexes. With indexes, Aurora PQ response time can be 5x-10x better compared to the non-parallel, fully cached operations. Aurora PQ can help improve performance of complex queries by performing parallel reads.
The following table summarizes the query response times:
| Query | Time, No PQ, index | Time, PQ |
| select count(*) from ontime where flightdate > ‘2017-01-01’ | 2 min 48.81 sec | 16.53 sec |
| select avg(DepDelay) from ontime; | 2 min 49.95 sec | 1 min 48.17 sec |
| select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier as carrier, FlightNum, Origin, Dest FROM ontime WHERE DestState not in (‘AK’, ‘HI’, ‘PR’, ‘VI’) and OriginState not in (‘AK’, ‘HI’, ‘PR’, ‘VI’) and flightdate > ‘2015-01-01’ and ArrDelay < 15 and cancelled = 0 and Diverted = 0 and DivAirportLandings = 0 ORDER by DepDelay DESC LIMIT 10; |
3 min 42.47 sec | 28.49 sec |
| select SQL_CALC_FOUND_ROWS
FlightDate, UniqueCarrier, TailNum, FlightNum, Origin, OriginCityName, Dest, DestCityName, DepDelay, ArrDelay FROM ontime_ind o JOIN carriers c on o.carrier = c.carrier_code WHERE (carrier_name like ‘United%’ or carrier_name like ‘Delta%’) and ArrDelay > 30 ORDER by DepDelay DESC LIMIT 10G |
2 min 45.16 sec | 28.78 sec |
—
Photo by Thomas Lipke on Unsplash
Resources
RELATED POSTS