Innodb transaction history often hides dangerous ‘debt’

In many write-intensive workloads Innodb/XtraDB storage engines you may see hidden and dangerous “debt” being accumulated – unpurged transaction “history” which if not kept in check over time will cause serve performance regression or will take all free space and cause an outage. Let’s talk about where it comes from and what can you do to avoid running into the trouble.

Technical Background: InnoDB is an MVCC engine which means it keeps multiple versions of the rows in the database, and when rows are deleted or updated they are not immediately removed from the database but kept for some time – until they can be removed. For a majority of OLTP workloads they can be removed seconds after the change actually took place. In some cases though they might need to be kept for a long period of time – if there are some old transactions running in the system that might still need to look at an old database state. As of MySQL 5.6 Innodb has one or several “purge threads” which remove the old data that can be removed, though they might not be doing it fast enough for workloads with very intensive writes.

Does it really happen? I started looking into this problem based on some customer concerns and to my surprise I could very easily get the history to grow rapidly using basic sysbench “update” workload. It is especially easy with default innodb_purge_threads=1 setting but even with innodb_purge_threads=8 it grows rather rapidly.

If we take a look at the purging speed (which comes from innodb-metrics table) we can see what purge is being very much starved by the active concurrent sysbench process and it speeds up greatly when it is finished:

Now to be frank this is not an easy situation to get in the majority of workloads with short transactions when the undo space is kept in memory purge and is able to keep up. If Undo space however happens to be gone from buffer pool the purge speed can slow down drastically and the system might not be able to keep up anymore. How it could happen? There are 2 common variants….

Long Running Transaction: If you’re having some long running transaction, for example mysqldump, on the larger table the purging has to pause while that transaction is running and a lot of history will be accumulated. If there is enough IO pressure a portion of undo space will be removed from the buffer pool.

MySQL Restart: Even with modest history length restarting MySQL will wash away from memory and will cause purge to be IO bound. This is of course if you’re not using InnoDB Buffer Pool save and reload.

How do you check if your UNDO space is well cached? In Percona Server I can use those commands:

This shows what the total undo space size is now, 1.7GB, with less than 100MB cached in the buffer pool size….

Here are a few graphs from Running Heavy concurrent query during lighter workload where purging could keep up. In this case I used the “injection” benchmark in sysbench setting –trx-rate to 50% of what the system shown as peak.

What we can see from those graphs is that InnoDB purging initially is progressing at a speed fast enough to keep up with inflow of transactions,
however as we kick up the complicated query, purging is stopped and when the query is done the purge speed settles on the new much lower level where it is not able to keep up with the workload anymore.

Now, there is recognition of this problem and there are options with innodb_max_purge_lag and innodb_max_purge_lag_delay to set the maximum length of the history after reaching which delay will be injected for DML statements up to a specified amount of microseconds.

Unfortunately it is not designed very well to use with real applications. The problems I see with its design are two fold….

Looking at Total History: If you think about it there are 2 kinds of records within the history – there are records that can be purged and there are ones which can’t be purged because they are needed by some active transaction. It is perfectly fine to have a lot of records in history if some long transaction is running – it is not the cause of the problem or overload, while we expect what “purgable history” should be low most of the time.

Looking at the Size rather than Rate of Change: Even worse, the history blowout prevention is looking at the current value to inject a delay and not at whenever it is that’s growing or already shrinking.

These together means that cases of long running transactions concurrently with OLTP workloads is handled very poorly – as long as history reaches the specified maximum amount the system will kick into overdrive, delaying all statements to the maximum extent possible, until the history falls back below the threshold. Here is how it looks on graphs:

As you see on the last graph, we got the purge_dml_delay_usec spiking to 10000us (the max I set) even as no purging can be done (see the blue line is at zero). It only actually starts to work on the history when the heavy query completes and really releases the breaks when the purge is complete. In this case the throughput of the system reduced more than 5 times when the delay was active – which would not work for most real-world systems.

Design Thoughts: So what would I change in the purging design of the configuration? I would like to see a better default configuration that should include multiple purge threads and purge delay (improved). I would find some way to measure not only history size but purgable history size and base purge delay on it.  Also make it based on the change rather than threshold – do just enough delay so the history is gradually shrinking. Also basing it on the undo space size instead of the number of transactions (which can vary in size) might be more practical and easier to auto-tune. We also can probably do better in terms of undo space caching – similar to Insert buffer, I’d like to keep it in memory say until 10% of the buffer pool size as removing from the cache something you know you will need very soon is bad business, as well as consider whether there is some form of read-ahead which can work to pre-read undo space which is needed. Right now I’ve tested and neither linear nor random read-ahead seems to help picking it up from disk with less random IO.

Practical Thoughts: Whatever improvements we’ll get from purging we have MySQL and Percona Server 5.6 systems to run for some years to come. So what are the practical steps we can do to manage purge history better?

Monitor: Make sure you are monitoring and graphing innodb_history_list_length. If you use large transactions, set alerts pretty high but do not leave it unchecked.

Configure Set innodb_purge_threads=8 or some other value if you have write intensive workload. Consider playing with innodb_max_purge_lag and innodb_max_purge_lag_delay but be careful – as currently designed it can really bring the server to its knees. You may consider using it interactively instead, changing them as run-time options if you spot history list growths unchecked, balancing current workload demands with resources allocated to purging.

Let it purge before shutdown: In many cases I find purge performance much worse after I restart MySQL Server because of caching. So the good approach might be just to remove the workload from MySQL server before shutting it down to let the purge of outstanding history complete – and only after that shut it down. If the server has crashed you might consider letting it complete purging before getting traffic routed back to it.

Use Innodb Buffer Pool Preload Use innodb_buffer_pool_dump_at_shutdown=on and innodb_buffer_pool_load_at_startup=on to ensure undo space is preloaded back to the buffer pool on startup.

P.S If you wonder where the graphs I have used came from – it is our Percona Cloud Tools – a very convenient way for analyses like these allowing access to all MySQL status variables, InnoDB metrics, tons of OS metrics and more.

Share this post

Comments (16)

  • James Day

    I agree that it is not a good design. Purge surges after a read view ends are an avoidable source of production disruption. For that reason it’s the standard recommendation of Oracle’s MySQL support team to set innodb_purge_threads to the lowest possible setting that can keep up. We’d normally advise against setting innodb_max_purge_lag because it’s too dangerous. One exception is if purge threads just can’t keep up. Then in those few production cases we give careful warnings about the consequences of read views and the impact on response time variation for foreground queries.

    Back during 5.6 development time the support team wanted to change the default for innodb_max_purge_lag from off to a billion or two to do its disk full prevention job. A billion or two because that’s just about high enough that we could hope that only a few production loads would hit that in normal use. But because of the very high work rate after a read view ends it was too dangerous so I asked the InnoDB team to regulate it so it would at least not jump to sudden highest possible work rate just because a read view ended. Instead of doing that and solving the problem they added the mostly useless innodb_max_purge_lag_delay option to limit the delay part of the work. Mostly useless because if you set it so low that it won’t disturb small operations, it won’t be high enough to slow down large multi-record operations enough to prevent disk filling. And it doesn’t stop the huge surge in purging that is enough to break the server. So that change was removed from the 5.6 change list as unsafe for production boxes.

    What’s interesting is that both your team and ours that work with production servers clearly know that this is a problem but the developers haven’t responded to the reports and fixed it.

    In some ways I think that is due to the inadequacy of the usual benchmarks. They tend to use short and even run time queries so they never show the variations in read view duration that happen on most production servers. That leaves the issues invisible and can make dangerous ideas look OK to people running those benchmarks. They can then influence the developers to make the unsafe changes without realising how dangerous they are because they never look at realistic loads.

    That’s something you and your team could improve. You don’t have to stick to short queries and constant load benchmarks. You know that load varies throughout the day and also has short term surges and dips. You also know that read view duration varies tremendously, with tens to hundreds of seconds routine and thousands and tens of thousands not uncommon. So why not model that in your benchmarks and use the improved benchmarks to do a better job of tuning purging and flushing to maximise peak throughput and reduce foreground thread response time?

    There are similar inverted priority issues for flushing. Redo log space exists to be used but instead of using it, adaptive flushing adds positive feedback instead of negative feedback when the foreground load increases. Which in turn means that customers have to buy more servers or bigger servers because of the poor timing of the flushing. Delay the flushing and peak throughput goes up and the required server count down. But who’s going to do that if benchmarks monitor unflushed pages and redo log space and consider reducing those as good instead of considering higher peak hour load capability and more consistent response times as the important measures?

    Views are my own, for an official Oracle view, consult a PR person.

    James Day, MySQL Senior Principal Support Engineer, Oracle

    October 17, 2014 at 4:27 pm
  • Peter Zaitsev


    I’m not sure what are you saying – did you say you saw some problems with higher values of innodb_purge_threads so you suggest 1 unless more is really needed ?

    In terms of disk filling I think the right approach really would be to focus on undo space size (bytes) not the number of unpurged transactions as transactions might be large or small.

    You’re right with benchmarks. I find it interesting to play with benchmarks where you have the normal foreground load but you also get some background batch jobs which can be read or write intensive…. It really shows very well whenever system stable in its ability to handle some sustained load while there is some side load takes place.

    October 18, 2014 at 12:13 pm

    thanks for sharing your discovery, after read it, but i got confused.
    why tune innodb_max_purge_lag and innodb_max_purge_lag_delay option can improve performance ? if there are heavy concurrent transactions running, we delay to purge it instead of purge it immediately,so there will accumulate many unpurged tasks, am I right ?
    what policy mysql used to purge undo space ? if the number of unpurged transactions exceeds innodb_max_purge_lag,mysql will delay purge in order to guarantee there are enough resources(CPU) for transactions? so if there are too many unpurge transactions to keep in memory, mysql will swap them into disk, if this situation occured, the performance will get much worse. so you wrote this post to avoid this situation, am i right?
    any reply is appreciate!

    October 19, 2014 at 11:06 am
  • Peter Zaitsev

    Good question!

    There are 2 kinds of workload which system generally handles. One is the online transactions which come at defined rate. IF you have 100 page views a second and each needs 10 updates you need to serve 1000 updates a second and if you can’t handle that (ie because of delay caused by innodb_max_purge_lag setting) users will have poor experience and some will not be serviced. Now there are other kind of workload – background. For example consider some batch job which runs on the hour – delaying those “spikes” and spreading its workload over 10 minutes instead of 1 minutes can ensure both your “real time” transactions are handled as well as dept not accumulated. So it is kind of smothering tool.

    October 19, 2014 at 11:37 am
  • James Day

    Peter, the purge thread will try to use as many threads as you allow. If you set innodb_purge_threads=4 then it will try to use that many if, looking only at the purge lag, it appears that there’s enough lag to make it worth doing. It’s traditionally been one of the more common designed in disruption sources we’ve seen, going back even before there was a dedicated purge thread. So yes, I suggest 1 unless it’s necessary to use more to keep up. It’s the standard recommendation of the Oracle MySQL Support Team because we don’t like avoidable outages caused by poor settings choices.

    I agree with you about a focus on size rather than events. It’s conceivable that we could instead get rid of the purge thread and eliminate much of the pain that way. Write to one place, delete another in the OS once all of its content is redundant. Still limited disk I/O issues there but it should be a big improvement and might get rid of one issue where the needs of those looking after production servers and doing benchmarks differ.


    October 20, 2014 at 1:37 pm
  • Sunny Bains


    Not strictly true. If –innodb-purge-threads=4 then InnoDB will try and keep the number of threads to the minimum required to keep the history length from increasing. I’m not convinced that purge can be delayed indefinitely. There are other ramifications of that. For older systems, the system tablespace will grow to a very large size and that space cannot be reclaimed.

    if (trx_sys->rseg_history_len > rseg_history_len
    || (srv_max_purge_lag > 0
    && rseg_history_len > srv_max_purge_lag)) {

    /* History length is now longer than what it was
    when we took the last snapshot. Use more threads. */

    if (n_use_threads 1) {

    /* History length same or smaller since last snapshot,
    use fewer threads. */


    old_activity_count = srv_get_activity_count();

    October 20, 2014 at 7:35 pm
  • Peter Zaitsev


    What I’m trying to understand per your experience did you see many cases when say innodb_purge_threads=8 causes significant regression compared to innodb_purge_threads=1 (default in 5.6) ? I have seen many cases when multiple threads can keep up much better.

    October 20, 2014 at 8:21 pm
  • vidyadhar

    Hi Peter,

    So while tuning the lag of mysql replication do we need to consider tuning innodb_max_purge_lag, innodb_max_purge_lag_delay along with innodb_purge_threads as well ?

    mysqldump will definitely block other transaction if we are not using single-transaction. how the behavior of the innodb purging activity here if i am using slave parallel workers on a mysql server with single database present on it ?


    December 11, 2014 at 8:16 am
  • Fernando

    Hi Peter,

    it is possible that the problem will be resolved changing isolation level from REPETABLE-READ to READ-COMMITTED?

    We experienced a huge problem with our crm app (sometimes value grew up to 150K)

    How can we alleviate the situation?

    January 29, 2015 at 9:43 am
  • Peter Zaitsev


    The answer is – sometimes. In some cases you have long multi-statement transactions and having READ-COMMITTED isolation mode allows to purge data faster and hence avoid long history being accumulated. If you however have long running statements or having purge threads unable to keep up

    January 29, 2015 at 3:17 pm
  • fernando mattera

    Peter, thank you for the answer.
    Our case is multi statements, but short running.
    We are talking about Activiti framework.
    So, you say READ COMMITED clean history faster?

    January 29, 2015 at 3:23 pm
  • Fernando


    Thank you Peter, and can you tell me why READ COMMITED cleans history faster?
    There is documentation about this?

    February 2, 2015 at 9:42 am