EmergencyEMERGENCY? Get 24/7 Help Now!

Is Adaptive Hash Index in InnoDB right for my workload?

 | April 12, 2016 |  Posted In: InnoDB, Percona Live


adaptive hash index in InnoDBThis blog post will discuss what the Adaptive Hash Index in InnoDB is used for, and whether it is a good fit for your workload.

Adaptive Hash Index (AHI) is one of the least understood features in InnoDB. In theory, it magically determines when it is worth supplementing InnoDB B-Tree-based indexes with fast hash lookup tables and then builds them automatically without a prompt from the user.

Since AHI is supposed to work “like magic,” it has very little configuration available. In the early versions there were no configuration options available at all. Later versions added innodb_adaptive_hash_index  to disable AHI if required (by setting it to “0” or “OFF”). MySQL 5.7 added the ability to partition AHI by enabling innodb_adaptive_hash_index_parts.  (FYI, this feature existed in Percona Server as innodb_adaptive_hash_index_partitions since version 5.5.)

To understand AHI’s impact on performance, think about it as if it were a cache. If an AHI “Hit” happens, we have much better lookup performance; if it is an AHI “Miss,” then performance gets slightly worse (as checking a hash table for matches is fast, but not free).

This is not the only part of the equation though. In addition to the cost of lookup, there is also the cost of AHI maintenance. We can compare maintenance costs – which can be seen in terms of rows added to and removed from AHI – to successful lookups. A high ratio means a lot of lookups sped up at the low cost. A low ratio means the opposite: we’re probably paying too much maintenance cost for little benefit.

Finally there is also a cost for adding an extra contention. If your workload consists of lookups to a large number of indexes or tables, you can probably reduce the impact by setting  innodb_adaptive_hash_index_parts  appropriately. If there is a hot index, however, AHI could become a bottleneck at high concurrency and might need to be disabled.

To determine if AHI is likely to help my workload, we should verify that the AHI hit and successful lookups to maintenance operations ratios are as high as possible.

Let’s investigate what really happens for some simple workloads. I will use a basic Sysbench Lookup by the primary key – the most simple workload possible. We’ll find that even in this case we’ll find a number of behaviors.

For this test, I am using MySQL 5.7.11 with a 16GB buffer pool. The base command line for sysbench is:

Looking up a single row

Notice oltp-table-size=1  from above; this is a not a mistake, but tests how AHI behaves in a very basic case:

oltp table size=1

And it works perfectly: there is a 100% hit ratio with no AHI maintenance operations to speak of.

10000 rows in the table

When we change the OLTP table setting to oltp-table-size=10000 , we get the following picture:

oltp table size=10k

oltp table size=10k-2

Again, we see almost no overhead. There is a rare incident of 16 rows or so being added to AHI (probably due to an AHI hash collision). Otherwise, it’s almost perfect.

10M rows in the table

If we change the setting to oltp-table-size=10000000, we now have more data (but still much less than buffer pool size):

oltp table size=10m

oltp table size=10m-2

In this case, there is clearly a warm-up period before we get close to the 100% hit ratio – and it never quite hits 100% (even after a longer run). In this case, maintenance operations appear to keep going without showing signs of asymptotically reaching zero. My take on this is that with 10M rows there is a higher chance of hash collisions – causing more AHI rebuilding.

500M rows in the table, uniform distribution

Let’s now set the OLTP table size as follows: oltp-table-size=500000000. This will push the data size beyond the Innodb buffer pool size.

oltp table size=500m

oltp table size=500m-2

Here we see a lot of buffer pool misses, causing the a very poor AHI hit ratio (never reaching 1%).   We can also see a large overhead of tens of thousands of rows added/removed from AHI. Obviously, AHI is not adding any value in this case

500M rows, Pareto distribution

Finally, let’s use the setting oltp-table-size=500000000, and add --rand-type=pareto. The --rand-type=pareto setting enables a skewed distribution, a more typical scenario for many real life data access patterns.

oltp table size=500m-Pareto

oltp table size=500m-Pareto-2

In this case we see the AHI hit ratio gradually improving, and reaching close to 50%. The  AHI maintenance overhead is going down, but never reaches anything that suggests it is worth it.

It is important to note in both this and the previous case that AHI has not reached a “steady state” yet. A steady state condition shows the number of rows added and removed becoming close to equal.

As you can see from the math in the workloads shown above, the Adaptive Hash Index in InnoDB “magic” doesn’t always happen! There are cases when AHI is indeed helpful, and then there are others when AHI adds a lot of data structure maintenance overhead and takes memory away from buffer pool – not to mention the contention overhead. In these cases, it’s better that AHI is disabled.

Unfortunately, AHI does not seem to have the logic built-in to detect if there is too much “churn” going on to make maintaining AHI worthwhile.

I suggest using these numbers as a general guide to decide whether AHI is likely to benefit your workload. Make sure to run a test/benchmark to be sure.

Interested in learning more about other InnoDB Internals? Please join me for the Innodb Architecture and Performance Optimization Tutorial at Percona Live!

Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.


Leave a Reply