In many write-intensive workloads Innodb/XtraDB storage engines you may see hidden and dangerous “debt” being accumulated – unpurged transaction “history” which if not kept in check over time will cause serve performance regression or will take all free space and cause an outage. Let’s talk about where it comes from and what can you do to avoid running into the trouble.
Technical Background: InnoDB is an MVCC engine which means it keeps multiple versions of the rows in the database, and when rows are deleted or updated they are not immediately removed from the database but kept for some time – until they can be removed. For a majority of OLTP workloads they can be removed seconds after the change actually took place. In some cases though they might need to be kept for a long period of time – if there are some old transactions running in the system that might still need to look at an old database state. As of MySQL 5.6 Innodb has one or several “purge threads” which remove the old data that can be removed, though they might not be doing it fast enough for workloads with very intensive writes.
Does it really happen? I started looking into this problem based on some customer concerns and to my surprise I could very easily get the history to grow rapidly using basic sysbench “update” workload. It is especially easy with default innodb_purge_threads=1 setting but even with innodb_purge_threads=8 it grows rather rapidly.
If we take a look at the purging speed (which comes from innodb-metrics table) we can see what purge is being very much starved by the active concurrent sysbench process and it speeds up greatly when it is finished:
Now to be frank this is not an easy situation to get in the majority of workloads with short transactions when the undo space is kept in memory purge and is able to keep up. If Undo space however happens to be gone from buffer pool the purge speed can slow down drastically and the system might not be able to keep up anymore. How it could happen? There are 2 common variants….
Long Running Transaction: If you’re having some long running transaction, for example mysqldump, on the larger table the purging has to pause while that transaction is running and a lot of history will be accumulated. If there is enough IO pressure a portion of undo space will be removed from the buffer pool.
MySQL Restart: Even with modest history length restarting MySQL will wash away from memory and will cause purge to be IO bound. This is of course if you’re not using InnoDB Buffer Pool save and reload.
How do you check if your UNDO space is well cached? In Percona Server I can use those commands:
mysql> select sum(curr_size)*16/1024 undo_space_MB from XTRADB_RSEG;
| undo_space_MB |
| 1688.4531 |
1 row in set (0.00 sec)
mysql> select count(*) cnt, count(*)*16/1024 size_MB, page_type from INNODB_BUFFER_PAGE group by page_type;
| cnt | size_MB | page_type |
| 55 | 0.8594 | EXTENT_DESCRIPTOR |
| 2 | 0.0313 | FILE_SPACE_HEADER |
| 108 | 1.6875 | IBUF_BITMAP |
| 17186 | 268.5313 | IBUF_INDEX |
| 352671 | 5510.4844 | INDEX |
| 69 | 1.0781 | INODE |
| 128 | 2.0000 | SYSTEM |
| 1 | 0.0156 | TRX_SYSTEM |
| 6029 | 94.2031 | UNDO_LOG |
| 16959 | 264.9844 | UNKNOWN |
10 rows in set (1.65 sec)
This shows what the total undo space size is now, 1.7GB, with less than 100MB cached in the buffer pool size….
Here are a few graphs from Running Heavy concurrent query during lighter workload where purging could keep up. In this case I used the “injection” benchmark in sysbench setting –trx-rate to 50% of what the system shown as peak.
mysql> select count(distinct k+ length(pad)) from sbtest1;
| count(distinct k+ length(pad)) |
| 30916851 |
1 row in set (28 min 32.38 sec)
What we can see from those graphs is that InnoDB purging initially is progressing at a speed fast enough to keep up with inflow of transactions,
however as we kick up the complicated query, purging is stopped and when the query is done the purge speed settles on the new much lower level where it is not able to keep up with the workload anymore.
Now, there is recognition of this problem and there are options with innodb_max_purge_lag and innodb_max_purge_lag_delay to set the maximum length of the history after reaching which delay will be injected for DML statements up to a specified amount of microseconds.
Unfortunately it is not designed very well to use with real applications. The problems I see with its design are two fold….
Looking at Total History: If you think about it there are 2 kinds of records within the history – there are records that can be purged and there are ones which can’t be purged because they are needed by some active transaction. It is perfectly fine to have a lot of records in history if some long transaction is running – it is not the cause of the problem or overload, while we expect what “purgable history” should be low most of the time.
Looking at the Size rather than Rate of Change: Even worse, the history blowout prevention is looking at the current value to inject a delay and not at whenever it is that’s growing or already shrinking.
These together means that cases of long running transactions concurrently with OLTP workloads is handled very poorly – as long as history reaches the specified maximum amount the system will kick into overdrive, delaying all statements to the maximum extent possible, until the history falls back below the threshold. Here is how it looks on graphs:
As you see on the last graph, we got the purge_dml_delay_usec spiking to 10000us (the max I set) even as no purging can be done (see the blue line is at zero). It only actually starts to work on the history when the heavy query completes and really releases the breaks when the purge is complete. In this case the throughput of the system reduced more than 5 times when the delay was active – which would not work for most real-world systems.
Design Thoughts: So what would I change in the purging design of the configuration? I would like to see a better default configuration that should include multiple purge threads and purge delay (improved). I would find some way to measure not only history size but purgable history size and base purge delay on it. Also make it based on the change rather than threshold – do just enough delay so the history is gradually shrinking. Also basing it on the undo space size instead of the number of transactions (which can vary in size) might be more practical and easier to auto-tune. We also can probably do better in terms of undo space caching – similar to Insert buffer, I’d like to keep it in memory say until 10% of the buffer pool size as removing from the cache something you know you will need very soon is bad business, as well as consider whether there is some form of read-ahead which can work to pre-read undo space which is needed. Right now I’ve tested and neither linear nor random read-ahead seems to help picking it up from disk with less random IO.
Practical Thoughts: Whatever improvements we’ll get from purging we have MySQL and Percona Server 5.6 systems to run for some years to come. So what are the practical steps we can do to manage purge history better?
Monitor: Make sure you are monitoring and graphing innodb_history_list_length. If you use large transactions, set alerts pretty high but do not leave it unchecked.
Configure Set innodb_purge_threads=8 or some other value if you have write intensive workload. Consider playing with innodb_max_purge_lag and innodb_max_purge_lag_delay but be careful – as currently designed it can really bring the server to its knees. You may consider using it interactively instead, changing them as run-time options if you spot history list growths unchecked, balancing current workload demands with resources allocated to purging.
Let it purge before shutdown: In many cases I find purge performance much worse after I restart MySQL Server because of caching. So the good approach might be just to remove the workload from MySQL server before shutting it down to let the purge of outstanding history complete – and only after that shut it down. If the server has crashed you might consider letting it complete purging before getting traffic routed back to it.
Use Innodb Buffer Pool Preload Use innodb_buffer_pool_dump_at_shutdown=on and innodb_buffer_pool_load_at_startup=on to ensure undo space is preloaded back to the buffer pool on startup.
P.S If you wonder where the graphs I have used came from – it is our Percona Cloud Tools – a very convenient way for analyses like these allowing access to all MySQL status variables, InnoDB metrics, tons of OS metrics and more.