Recently I had a case where queries against Aurora Reader were 2-3 times slower than on the Writer node. In this blog post, we are going to discuss why.
I am not going to go into the details of how Aurora works, as there are other blog posts discussing that. Here I am only going to focus on one part.
My customer reported there is a huge performance difference between the Reader and the Writer node just by running selects. I was a bit surprised, as the select queries should run locally on the reader node, the dataset could fit easily in memory, there were no reads on disk level, and everything looked fine.
I was trying to rule out every option when one of my colleagues mentioned I should have a look at the
InnoDB_Adaptive_Hash_Indexes. He was right – it was disabled on the Reader nodes, I could see it on the console.
Let’s enable the adaptive hash index
I opened the control panel and I was checking the parameter groups, but the adaptive hash index was already enabled. Ok, I might have made a mistake but I double checked myself many times and it was true. Adaptive hash was disabled on the console but enabled on the control panel. That means the AWS control panel is lying!
I have restarted the nodes multiple times, and I have created new test clusters, etc… but I was not able to enable adaptive hash on the Reader node. It was enabled on the Writer node, and it was working.
Is this causing the performance difference?
Because I was able to enable or disable the adaptive hash index on the Writer node, I continued my tests there and I could confirm that when I disabled it the queries got slower. Same speed as on the Reader node. When I enabled, AHI queries got faster.
In general with AHI on the Writer node, the customer’s queries were running 2 times faster.
AHI can help for many workloads but not all of them, and you should test your queries/workload both with and without AHI.
Why is it disabled on the Reader?
I have to be honest because I am not an AWS engineer and I do not know the code of Aurora, but I am only guessing here and I might be wrong.
Why can I change it in the parameter group?
We can modify the adaptive hash in the parameter groups, but there is no impact on the Reader nodes at all. Many customers could think they have AHI enabled but actually, they don’t. I think this is a bad practice because if we cannot enable it on the Reader node we should not be able to change it on the control panel.
Is this causing any performance problems for me?
If you are using the Reader node for selects queries, which are based on secondary keys, you are probably suffering from this but it depends on your workload if it is impacting your performance or not. In my customer’s case, the difference was 2 times slower without AHI.
But I want fast queries!
If your queries heavily benefit from AHI, you should run your queries on the Writer node or even on an async slave, or have a look on AWS RDS which does not have this limitation or use EC2 instances. You could also check query cache in Aurora.
In Aurora, they reworked the Query Cache which does not have the limitations like in Community Edition or in Percona Server. Cacheable queries take out an “exclusive lock” on MySQL’s query cache. In the real world, that means only one query can use the Query Cache at a time and all the other queries have to wait for the mutex. Also in MySQL 8.0 they completely removed the Query Cache.
But in Aurora they redesigned it and they removed this limitation – there is no single global mutex on the Query Cache anymore. I think one of the reasons for this is could be because they knew that Adaptive Hash won’t work.
Does AWS know about this?
I have created a ticket to AWS engineers to get some feedback on this, and they verified my findings and have confirmed Adaptive Hash Index cannot be enabled on the Reader nodes. They are looking into why we can modify it on the control panel.
I would recommend checking your queries on your Reader nodes to make sure they perform well and compare the performance with the Writer node. At this moment, we cannot enable AHI on Reader nodes, and I am not sure if that will change any time soon. But this can impact the performance in some cases, for sure.