The Impact of Swapping on MySQL PerformancePeter Zaitsev
In this blog, I’ll look at the impact of swapping on MySQL performance.
It’s common sense that when you’re running MySQL (or really any other DBMS) you don’t want to see any I/O in your swap space. Scaling the cache size (using innodb_buffer_pool_size in MySQL’s case) is standard practice to make sure there is enough free memory so swapping isn’t needed.
But what if you make some mistake or miscalculation, and swapping happens? How much does it really impact performance? This is exactly what I set out to investigate.
My test system has the following:
- 32GB of physical memory
- OS (and swap space) on a (pretty old) Intel 520 SSD device
- Database stored on Intel 750 NVMe storage
To simulate a worst case scenario, I’m using Uniform Sysbench Workload:
sysbench --test=/usr/share/doc/sysbench/tests/db/select.lua --report-interval=1 --oltp-table-size=700000000 --max-time=0 --oltp-read-only=off --max-requests=0 --num-threads=64 --rand-type=uniform --db-driver=mysql --mysql-password=password --mysql-db=test_innodb run
To better visualize the performance of the metrics that matter for this test, I have created the following custom graph in our Percona Monitoring and Management (PMM) tool. It shows performance disk IO and swapping activity on the same graph.
Here are the baseline results for innodb_buffer_pool=24GB. The results are a reasonable ballpark number for a system with 32GB of memory.
As you can see in the baseline scenario, there is almost no swapping, with around 600MB/sec read from the disk. This gives us about 44K QPS. The 95% query response time (reported by sysbench) is about 3.5ms.
Next, I changed the configuration to innodb_buffer_pool_size=32GB, which is the total amount of memory available. As memory is required for other purposes, it caused swapping activity:
We can see that performance stabilizes after a bit at around 20K QPS, with some 380MB/sec disk IO and 125MB/sec swap IO. The 95% query response time has grown to around 9ms.
Now let’s look at an even worse case. This time, we’ll set our configuration to innodb_buffer_pool_size=48GB (on a 32GB system).
Now we have around 6K QPS. Disk IO has dropped to 250MB/sec, and swap IO is up to 190MB/sec. The 95% query response time is around 35ms. As the graph shows, the performance becomes more variable, confirming the common assumption that intense swapping affects system stability.
Finally, let’s remember MySQL 5.7 has the Online Buffer Pool Resize feature, which was created to solve exactly this problem (among other reasons). It changes the buffer pool size if you accidentally set it too large. As we have tested innodb_buffer_pool_size=24GB, and demonstrated it worked well, let’s scale it back to that value:
mysql> set global innodb_buffer_pool_size=24*1024*1024*1024;
Query OK, 0 rows affected (0.00 sec)
Now the graph shows both good and bad news. The good news is that the feature works as intended, and after the resize completes we get close to the same results before our swapping experiment. The bad news is everything pretty much grinds to halt for 15 minutes or so while resizing occurs. There is almost no IO activity or intensive swapping while the buffer pool resize is in progress.
I also performed other sysbench runs for selects using Pareto random type rather than Uniform type, creating more realistic testing (skewed) data access patterns. I further performed update key benchmarks using both Uniform and Pareto access distribution.
You can see the results below:
As you can see, the results for selects are as expected. Accesses with Pareto distributions are better and are affected less – especially by minor swapping.
If you look at the update key results, though, you find that minor swapping causes performance to improve for Pareto distribution. The results at 48GB of memory are pretty much the same.
Before you say that that is impossible, let me provide an explanation: I limited innodb_max_purge_lag on this system to avoid unbound InnoDB history length growth. These workloads tend to be bound by InnoDB purge performance. It looks like swapping has impacted the user threads more than it did the purge threads, causing such an unusual performance profile. This is something that might not be repeatable between systems.
When I started, I expected severe performance drop even with very minor swapping. I surprised myself by getting swap activity to more than 100MB/sec, with performance “only” halved.
While you should continue to plan your capacity so that there is no constant swapping on the database system, these results show that a few MB/sec of swapping activity it is not going to have a catastrophic impact.
This assumes your swap space is on an SSD, of course! SSDs handle random IO (which is what paging activity usually is) much better than HDDs.