Recently I was doing some small testing by using EC2 instances on AWS and I noticed the execution time and performance highly depend on which time of the day I am running my scripts. I was using
t3.xlarge instance type as I didn’t need many CPUs and memory for my tests, but from time to time I planned to use all the resources for a short time (few minutes), and this is when I noticed the difference.
First, let’s see what AWS says about
T3 instances start in Unlimited mode by default, giving users the ability to sustain high CPU performance over any desired time frame while keeping cost as low as possible.
In theory, I should not have any issues or performance differences. I have also monitored the CPU credit balance and there was no correlation between the balance and the performance at all, and because these were unlimited instances the balance should not have any impact.
I have decided to start a longer
sysbench test on 3 threads to see how the QPS changes over the day.
As you can see, the Query Per Second could go down by
almost 90%, which is a lot. It’s important to highlightthat the sysbench script should have generated a very steady workload. So what is this big difference? After checking all the graphs I found this:
Stealing! A lot of stealing! Here is a good article which explains stealing very well. So probably, I have a noisy neighbor. This instance was running in
N. California. I have stopped it and tried to start new instances to repeat the test but I have always gotten very similar results. There was a lot of stealing which was hurting the performance a lot, probably because that region is very popular and resources are limited.
Out of curiosity, I have started two similar instances in the
Stockholm region and repeated the same test and I got very steady performance as you can see here:
I guess this region is not that popular or filled yet, and we can see there is a huge difference between where you start your instance.
I also repeated the tests with the
m5.xlarge instance type to see if it has the same behavior or not.
After I changed the instance type, we can see that both regions give very similar, steady performance, but if we take a closer look:
The instance in
Stockholm still performs almost 5% more QPS as in
N. California, and uses more CPU as well.
If you are using
T3 instance types, you should monitor the CPU usage very closely because noisy neighbors can hurt your performance a lot. If you need stable performance,
T3 are not recommended but if you only need a short burst it might work but still, you have to monitor the steal. Other instance types can give you a much more stable performance but you could still see some difference between the regions.