Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Aurora Hash Join Optimization (with a Gentle Reminder on Lab Features)

January 31, 2018

Author

Share this Post:

The Aurora hash join feature for relational databases has been around for a while now. But unlike MySQL Block Nested Loop algorithm, an Aurora hash join only caters to a specific number of use cases. When implemented with the optimizer properly, they can provide great benefits with certain workloads. Below we’ll see a brief example of a quick win.

This new feature is available in Aurora lab mode version 1.16. Because this is a lab feature, it’s important to make sure to test your queries before upgrading, especially if you are looking to scale up to the new R4 instances before the Superbowl to avoid hitting the same problem I discuss below.

When lab mode is enabled and hash_join is ON, you can verify the optimizer feature from the optimizer_switch variable:

mysql> SELECT @@aurora_version, @@aurora_lab_mode, @@optimizer_switch G
*************************** 1. row ***************************
  @@aurora_version: 1.16
 @@aurora_lab_mode: 1
@@optimizer_switch: index_merge=on,...,hash_join=on,hash_join_cost_based=on

mysql> SELECT @@aurora_version, @@aurora_lab_mode, @@optimizer_switch G

*************************** 1. row ***************************

@@aurora_version: 1.16

@@aurora_lab_mode: 1

@@optimizer_switch: index_merge=on,...,hash_join=on,hash_join_cost_based=on

Hash joins work well when joining large result sets because – unlike block nested loop in the same query – the optimizer scans the larger table and matches it against the hashed smaller table instead of the other way around. Consider the tables and query below:

+----------+----------+
| tbl      | rows     |
+----------+----------+
| branches |    55143 |
| users    |   103949 |
| history  | 27168887 |
+----------+----------+

EXPLAIN 
SELECT SQL_NO_CACHE COUNT(*) 
FROM branches b 
   INNER JOIN users u ON (b.u_id = u.u_id) 
   INNER JOIN history h ON (u.u_id = h.u_id);

+----------+----------+

| tbl | rows |

+----------+----------+

| branches | 55143 |

| users | 103949 |

| history | 27168887 |

+----------+----------+

EXPLAIN

SELECT SQL_NO_CACHE COUNT(*)

FROM branches b

INNER JOIN users u ON (b.u_id = u.u_id)

INNER JOIN history h ON (u.u_id = h.u_id);

With hash joins enabled, we can see from the Extra column in the EXPLAIN output how it builds the join conditions:

mysql> EXPLAIN
    -> SELECT SQL_NO_CACHE COUNT(*)
    -> FROM branches b
    ->    INNER JOIN users u ON (b.u_id = u.u_id)
    ->    INNER JOIN history h ON (u.u_id = h.u_id);
+----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows     | Extra                                                    |
+----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+
|  1 | SIMPLE      | u     | index | PRIMARY       | PRIMARY | 4       | NULL |   103342 | Using index                                              |
|  1 | SIMPLE      | h     | ALL   | NULL          | NULL    | NULL    | NULL | 24619023 | Using join buffer (Hash Join Outer table h)              |
|  1 | SIMPLE      | b     | index | user_id       | user_id | 4       | NULL |    54129 | Using index; Using join buffer (Hash Join Inner table b) |
+----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+

mysql> EXPLAIN

-> SELECT SQL_NO_CACHE COUNT(*)

-> FROM branches b

-> INNER JOIN users u ON (b.u_id = u.u_id)

-> INNER JOIN history h ON (u.u_id = h.u_id);

+----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+

+----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+

+----+-------------+-------+-------+---------------+---------+---------+------+----------+----------------------------------------------------------+

Without hash joins, it’s a straightforward Cartesian (almost) product of all three tables:

mysql> SET optimizer_switch='hash_join=off';
Query OK, 0 rows affected (0.02 sec)

mysql> EXPLAIN
    -> SELECT SQL_NO_CACHE COUNT(*)
    -> FROM branches b
    ->    INNER JOIN users u ON (b.u_id = u.u_id)
    ->    INNER JOIN history h ON (u.u_id = h.u_id);
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref            | rows     | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+
|  1 | SIMPLE      | h     | ALL    | NULL          | NULL    | NULL    | NULL           | 24619023 | NULL        |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY | 4       | percona.h.u_id |        1 | Using index |
|  1 | SIMPLE      | b     | ref    | user_id       | user_id | 4       | percona.h.u_id |        7 | Using index |
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+

mysql> SET optimizer_switch='hash_join=off';

Query OK, 0 rows affected (0.02 sec)

mysql> EXPLAIN

-> SELECT SQL_NO_CACHE COUNT(*)

-> FROM branches b

-> INNER JOIN users u ON (b.u_id = u.u_id)

-> INNER JOIN history h ON (u.u_id = h.u_id);

+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+

+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+

| 1 | SIMPLE | b | ref | user_id | user_id | 4 | percona.h.u_id | 7 | Using index |

+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+

Now, the execution times without hash joins enabled:

mysql> SELECT SQL_NO_CACHE COUNT(*)
    -> FROM branches b
    ->    INNER JOIN users u ON (b.u_id = u.u_id)
    ->    INNER JOIN history h ON (u.u_id = h.u_id);
+-----------+
| COUNT(*)  |
+-----------+
| 128815553 |
+-----------+
1 row in set (1 min 6.95 sec)

mysql> SET optimizer_switch='hash_join=off';
Query OK, 0 rows affected (0.01 sec)

mysql> SELECT SQL_NO_CACHE COUNT(*)
    -> FROM branches b
    ->    INNER JOIN users u ON (b.u_id = u.u_id)
    ->    INNER JOIN history h ON (u.u_id = h.u_id);
+-----------+
| COUNT(*)  |
+-----------+
| 128815553 |
+-----------+
1 row in set (2 min 28.27 sec)

mysql> SELECT SQL_NO_CACHE COUNT(*)

-> FROM branches b

-> INNER JOIN users u ON (b.u_id = u.u_id)

-> INNER JOIN history h ON (u.u_id = h.u_id);

+-----------+

| COUNT(*) |

+-----------+

| 128815553 |

+-----------+

1 row in set (1 min 6.95 sec)

mysql> SET optimizer_switch='hash_join=off';

Query OK, 0 rows affected (0.01 sec)

mysql> SELECT SQL_NO_CACHE COUNT(*)

-> FROM branches b

-> INNER JOIN users u ON (b.u_id = u.u_id)

-> INNER JOIN history h ON (u.u_id = h.u_id);

+-----------+

| COUNT(*) |

+-----------+

| 128815553 |

+-----------+

1 row in set (2 min 28.27 sec)

Clearly with this optimization enabled, we have more than a 50% gain from the example query.

Now while this type of query might be rare, most of us know we need to avoid really large JOINs as they are not scalable. But at some point, we find some that take advantage of the feature. Here is an excerpt from an actual production query I’ve recently worked on. It shows the good execution plan versus the one using hash joins.

This particular EXPLAIN output only differs in the row where without a hash join, it uses an index, and the query executes normally. With the hash join enabled, the optimizer thought it was better to use it instead:

...
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: eq_ref
possible_keys: PRIMARY,r_type_id_ix,r_id_r_type_id_dt_ix
          key: PRIMARY
      key_len: 4
          ref: db.x.p_id
         rows: 1
        Extra: Using where
...

...
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: index
possible_keys: PRIMARY,r_type_id_ix,r_id_r_type_id_dt_ix
          key: r_id_r_type_id_dt_ix
      key_len: 18
          ref: NULL
         rows: 715568233
        Extra: Using where; Using index; Using join buffer (Hash Join Inner table t)
...

...

*************************** 3. row ***************************

id: 1

select_type: SIMPLE

table: t

type: eq_ref

possible_keys: PRIMARY,r_type_id_ix,r_id_r_type_id_dt_ix

key: PRIMARY

key_len: 4

ref: db.x.p_id

rows: 1

Extra: Using where

...

*************************** 3. row ***************************

id: 1

select_type: SIMPLE

table: t

type: index

possible_keys: PRIMARY,r_type_id_ix,r_id_r_type_id_dt_ix

key: r_id_r_type_id_dt_ix

key_len: 18

ref: NULL

rows: 715568233

Extra: Using where; Using index; Using join buffer (Hash Join Inner table t)

...

Needless to say, it caused problems. Unfortunately, a bug on Aurora 1.16 exists where hash joins cannot be turned off selectively (it is enabled by default) from the parameter group. If you try this, you get an error “Error saving: Invalid parameter value: hash_join=off for: optimizer_switch”. The only way to disable the feature is to turn off lab_mode, which requires an instance restart. An alternative is to simply add SET optimizer_switch='hash_join=off'; from the application, especially if you rely on some of the other lab mode features in Aurora.

To summarize, the new hash join feature is a great addition. But as it’s a lab feature, be careful when upgrading!

0 0 votes

Article Rating

3 Comments

Oldest

Newest Most Voted

Morgan Tocker

8 years ago

Hi Jervin,

I think it is worth pointing out that one of the best cases for hash join is when not all of the data can fit in memory. A cold buffer pool is the example that Amazon used in their slide deck to show the very dramatic performance improvement 🙂

If we take a step back, and look at Aurora’s architecture, they use a consensus read for fetch pages not in memory, so I expect the cold buffer pool case to be a scenario where it routinely performs worse than regular MySQL. So while a good feature in its own right – the Amazon team perhaps prioritized it to limit cases of regressions(?). I’m speculating of course, but I had a similar thought when I saw them introduce Async Key Prefetch (AKP) earlier.

– Morgan

Øystein Grøvlen

8 years ago

Hi Jervin,

Thanks for a nice introduction to hash-join in Aurora. Just one comment on the following:

“Without hash joins, it’s a straightforward Cartesian (almost) product of all three tables:”

The join is an equi-join, not a cartesian product. Using indexes, one will only consider the row combinations satisfying the join predicates, not all possible combinations like in a Cartesian product.

The reason hash-join is better for this query, is that since all rows of the tables needs to be accessed anyway, it is more efficient to scan the entire tables/indexes than to do random accesses using a B-tree index.

Author

Jervin Real

8 years ago

‘@Morgan Tocker, indeed these tests we run on a cold buffer pool – it was also mentioned (hinted) on their documentation it was the best use case.

@Øystein – thanks for the feedback, indeed it is not a cartesian product, note I wrote “(almost)” there simply to emphasize it would not have been a very good query and the amount of rows involved to satisfy the query is almost as bad.