Adaptive Hash Index on AWS Aurora

Adaptive Hash Index on AWS AuroraRecently I had a case where queries against Aurora Reader were 2-3 times slower than on the Writer node. In this blog post, we are going to discuss why.

I am not going to go into the details of how Aurora works, as there are other blog posts discussing that. Here I am only going to focus on one part.

The Problem

My customer reported there is a huge performance difference between the Reader and the Writer node just by running selects. I was a bit surprised, as the select queries should run locally on the reader node, the dataset could fit easily in memory, there were no reads on disk level, and everything looked fine.

I was trying to rule out every option when one of my colleagues mentioned I should have a look at the InnoDB_Adaptive_Hash_Indexes. He was right – it was disabled on the Reader nodes, I could see it on the console.

Let’s enable the adaptive hash index

I opened the control panel and I was checking the parameter groups, but the adaptive hash index was already enabled. Ok, I might have made a mistake but I double checked myself many times and it was true. Adaptive hash was disabled on the console but enabled on the control panel. That means the AWS control panel is lying!

I have restarted the nodes multiple times, and I have created new test clusters, etc… but I was not able to enable adaptive hash on the Reader node. It was enabled on the Writer node, and it was working.

Is this causing the performance difference?

Because I was able to enable or disable the adaptive hash index on the Writer node, I continued my tests there and I could confirm that when I disabled it the queries got slower. Same speed as on the Reader node. When I enabled,  AHI queries got faster.

In general with AHI on the Writer node, the customer’s queries were running 2 times faster.

AHI can help for many workloads but not all of them, and you should test your queries/workload both with and without AHI.

Why is it disabled on the Reader?

I have to be honest because I am not an AWS engineer and I do not know the code of Aurora, but I am only guessing here and I might be wrong.

Why can I change it in the parameter group?

We can modify the adaptive hash in the parameter groups, but there is no impact on the Reader nodes at all. Many customers could think they have AHI enabled but actually, they don’t. I think this is a bad practice because if we cannot enable it on the Reader node we should not be able to change it on the control panel.

Is this causing any performance problems for me?

If you are using the Reader node for selects queries, which are based on secondary keys, you are probably suffering from this but it depends on your workload if it is impacting your performance or not. In my customer’s case, the difference was 2 times slower without AHI.

But I want fast queries!

If your queries heavily benefit from AHI, you should run your queries on the Writer node or even on an async slave, or have a look on AWS RDS which does not have this limitation or use EC2 instances. You could also check query cache in Aurora.

Query Cache

In Aurora, they reworked the Query Cache which does not have the limitations like in Community Edition or in Percona Server.  Cacheable queries take out an “exclusive lock” on MySQL’s query cache. In the real world, that means only one query can use the Query Cache at a time and all the other queries have to wait for the mutex. Also in MySQL 8.0 they completely removed the Query Cache.

But in Aurora they redesigned it and they removed this limitation – there is no single global mutex on the Query Cache anymore. I think one of the reasons for this is could be because they knew that Adaptive Hash won’t work.

Does AWS know about this?

I have created a ticket to AWS engineers to get some feedback on this, and they verified my findings and have confirmed Adaptive Hash Index cannot be enabled on the Reader nodes. They are looking into why we can modify it on the control panel.

Conclusion

I would recommend checking your queries on your Reader nodes to make sure they perform well and compare the performance with the Writer node. At this moment, we cannot enable AHI on Reader nodes, and I am not sure if that will change any time soon. But this can impact the performance in some cases, for sure.

Share this post

Comments (2)

  • Paul C Reply

    Adaptive hashing is a layer of caching and the hard part about any caching methodology is having to invalidate the cache. To answer your why question about no AHI on Aurora readers we have to figure it what are they doing and why are they doing it.

    Well first, what do we actually know about Aurora? Not much as they’re not particularly forthcoming about the internals of their modifications. What we do know from the marketing and training materials is that each reader and writer are separate nodes of the database engine running on their own personal EC2. We also know that the underlying Aurora storage is two copies per AZ in a total of three AZs while an EC2 instance can obviously be in only one. We know that a write only really occurs when any three out of the six writes have succeeded with eventual consistency for the other three copies and write is not merely when just the local two copies have succeeded. Honestly, it sounds a lot like Dynamo/Cassandra replication running with a CL of THREE but that’s another bout of conjecture by itself.

    I feel we can reasonably guess that the storage isn’t directly attached to any particular node and any node in the Aurora instance has the same access to it. The writer node’s innodb engine should always be in since sync with what’s happening on disk since it knows what was there before and after any given write, but what about the readers? The storage can be changed out from underneath the readers and their bufferpool would be incorrect unless some notification method informs then so. In other words, any committed writes from the writer must be streamed to the readers.

    So if you’re and Aurora coder how would you implement it? Do you push full row images? I wouldn’t. The reader’s bufferpools only need to know what to invalidate. Their user sessions probably require a different working set of data and might not be accessing those particular rows at that time. Any heavy read-write-read sessions will be on the writer anyway, and if a reader happens to need that row then they can always just pull it from storage.

    If I were to be doing it, I’d just have the writer instance blast the readers with a stream of LSNs, table IDs, and PK values to invalidate from the bufferpool and be done with it. I wouldn’t send anything else because that’s the minimum you need to do the job. More data means more work.
    C omputers only run faster by doing less work, and the opposite is true.

    Now, what do you need to indicate the adaptive hash? You can’t do it with just the PK values, you need the contents of the secondary index columns because going the other way is a bit impractical. In theory you could have the readers also do a read from storage when they get informed that a row has been invalidated but that would be just plain nasty to deal with, nevermind the eventual consistency issues of the storage. You could have the writer stream the PKs as well as the secondary indexes, but again that’s still a lot more work that you don’t absolutely have to do.

    Alternatively, they could just disable the adaptive hash indexes for grins and giggles just to see how long it takes for somebody to notice it’s missing.

    June 25, 2019 at 10:59 pm
  • Narendra Reply

    Nice finding this helped us why do we see lot of disk reads

    June 26, 2019 at 12:31 pm

Leave a Reply