A case for MariaDB’s Hash Joins

MariaDB 5.3/5.5 has introduced a new join type “Hash Joins” which is an implementation of a Classic Block-based Hash Join Algorithm. In this post we will see what the Hash Join is, how it works and for what types of queries would it be the right choice. I will show the results of executing benchmarks for different queries and explain the results so that you have a better understanding of when using the Hash Join will be best and when not. Although Hash Joins are available since MariaDB 5.3, but I will be running my benchmarks on the newer MariaDB 5.5.

Overview

Hash Join is a new algorithm introduced in MariaDB 5.3/5.5 that can be used for joining tables that have a equijoin conditions of the form tbl1.col1 = tbl2.col1, etc. As I mentioned above that what is actually implemented is the Classic Hash Join. But its known as Block Nested Loop Hash (BNLH) Join in MariaDB.
The Classic Hash Join Algorithm consists of two phases, a build phase and a probe phase. Let’s consider the case of joining two tables on a equijoin condition. So the first thing would be to designate the smallest of the two tables as the left operand and the other table which is bigger, to be the right operand. Now when the algorithm begins, the first phase is the build phase, in which a hash table is created over the join attributes and rows of the left operand. Next comes the probe phase, which is where the matching rows from the right operand are found, by scanning the right operand and for each row scanned performing a lookup in the hash table by using values of the columns participating in the equijoin condition. The hash table is accessed by using a hash function on the values of the join condition, and hence is quite efficient. But what about the restriction on the size of the hash table. The size of the hash table is restricted by the value of join_buffer_size, and so if the left operand is big such that the size of the hash table built on it is greater than the join_buffer_size, then multiple hash tables would be created. For example if the left operand has “n” rows, and its size is three times the value of join_buffer_size, then 3 hash tables would need to be created each containing a hash table on n/3 rows. And so both the build and probe phase would be done three times, which means that the right operand will be scanned thrice.

Wikipedia has a nicely simplified version of the Classic Hash Join algorithm (http://en.wikipedia.org/wiki/Hash_join#Classic_hash_join) which I will quote below for better understanding:

  1. For each tuple r in the build input R
    1. Add to the in-memory hash table
    2. If the size of the hash table equals the maximum in-memory size:
      1. Scan the probe input S, and add matching join tuples to the output relation
      2. Reset the hash table
  2. Do a final scan of the probe input S and add the resulting join tuples to the output relation

Now after the explanation of the hash join lets see how it performs for different test cases.

Benchmarks

For the purpose of the benchmarks I used the DBT3 dataset of scale factor 2, which means the total dataset size is 4.8G. Let me show the breakdown of dataset size by the tables that I have used in the benchmarks:

Table ‘lineitem’: 3.8G
Table ‘supplier’: 11M
Table ‘orders’: 468M

I have benchmarked two different kinds of workloads, IO bound and in-memory. Benchmark on IO bound workload was performed with a buffer pool size of 1G, while benchmark on in-memory workload was performed with a buffer pool size of 6G. The benchmarks compare Block Nested Loop (BNL) Join of MySQL 5.5.24, Batched Key Access (BKA) Join of MySQL 5.6.5 and Block Nested Loop Hash (BNLH) Join of MariaDB 5.5.20. The configuration used with the three variants of MySQL are listed below.

Configuration

Let’s first take a look at the configuration used with different MySQL flavors.

MySQL 5.5.24 Configuration
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_log_file_size=512M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT

query_cache_size=0
query_cache_type=0

MySQL 5.6.5 Configuration
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_log_file_size=512M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT

query_cache_size=0
query_cache_type=0

optimizer_switch='index_condition_pushdown=on'
optimizer_switch='mrr=on'
optimizer_switch='mrr_cost_based=off'
read_rnd_buffer_size=32M
optimizer_switch='batched_key_access=on'
join_buffer_size=32M

MariaDB 5.5.20 Configuration
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_log_file_size=512M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT

query_cache_size=0
query_cache_type=0

optimizer_switch='index_condition_pushdown=on'

optimizer_switch='mrr=on'
optimizer_switch='mrr_sort_keys=on'
optimizer_switch='mrr_cost_based=off'
mrr_buffer_size=32M

optimizer_switch='join_cache_incremental=on'
optimizer_switch='join_cache_hashed=on'
optimizer_switch='join_cache_bka=on'
join_cache_level=4
join_buffer_size=32M
join_buffer_space_limit=32M

Note that MariaDB includes a new variable ‘join_cache_level‘, this variable controls which Join Algorithms are allowed to be used, a value of 4 here means that Nested Loop Join and Hash Join algorithms are allowed. Now as well know that ‘join_buffer_size‘ controls the size of the join buffer allocated for each join in a query, MariaDB introduces another variable to control the size of the buffer ‘join_buffer_space_limit‘. This variable controls the maximum allowed size of the buffer for the whole query. By default it has a value of 1024*128*10, which means that your effective join_buffer_size is not bigger than this value. Hence, the reason I have set join_buffer_space_limit=32M.

Benchmark Machine Specs

The machine that I used for the benchmarks, is a dual core machine with the following CPU configuration: 2xIntel(R) Core(TM)2 CPU 6600 @ 2.40GHz. The amount of memory installed is 8G and the MySQL datadir is on a 4-disk Software RAID5 volume, the disks are 5.4K RPM disks. The filesystem used is XFS, and the OS installed is Centos 6.2

Table Structure

Before moving on, let’s take a look at the structure of the tables involved in the benchmark tests.

Test Cases

Now let’s see the test cases and then see how the joins perform for each test case.

Test Case A – Join a small table that fits in memory to a large table with no WHERE clause

The SQL used for this test together with its EXPLAIN output as returned by MySQL 5.5 is as follows:

And the results in seconds of time taken to complete the above query:

First thing to note is that I have scaled down the time taken by MySQL 5.5 to finish the query on IO bound workload so that it could fit well in the chart, in actuality the query took 32077 seconds to finish in the IO bound workload. Anyhow we can clearly see from the above chart that Hash join comprehensively beats BKA and BNL, hash join is perfect in these cases where you are joining a small table with a very large table with no ‘indexed where’ conditions on the big table. BNLH takes half the time to complete the query for in-memory workload and 6.6x less times as compared to BKA MySQL 5.6, and 965x less time as compared to BNL MySQL 5.5. So hash join gives us an improvement by a very large factor both for IO bound workload and in-memory workload.

Test Case B – Join a small table that fits in memory to a large table with a selective WHERE clause on an indexed column

The SQL used for this test together with its EXPLAIN output as returned by MySQL 5.5 is as follows: