I just wrote a large post on reasons for innodb main tablespace excessive growth and I thought it would make sense to explain briefly of why it is so frequently you have purge not being the problem at all and when out of no where you can see purge thread being unable to keep up and undo table space explodes and performance drops down. Here is what happens.
When you have typical OLTP system with small transactions your UNDO space is small and it fits in buffer pool. In fact most of the changes do not need to go to the disk at all – the space for undo space is allocated, used and freed without ever needing to go to the disk.
Now when you have spike in writes or long running transactions which increases your undo space size it may be evicted from buffer pool and stored on disk. This is when problems often starts to happen. Now instead of purge thread simply operating in memory it has to perform IO which slows it down dramatically and makes it unable to handle amount of changes coming in.
The solution to this problem may range from pacing the load (which is helpful for batch job operations as it can be controlled), using innodb_max_purge_lag or enable separate purge thread (or threads) via innodb_use_purge_thread if you’re running Percona Server.