Today seems to be Innodb day in our Blog, but well this is the question which pops ups quite frequently in Innodb talks and during consulting engagements.
It is well known to get better performance you should normally set innodb_log_file_size large. We however usually recommend caution as it may significantly increase recovery time if Innodb needs to do crash recovery.
I’m often something like what recovery time would expect for 512MB total log file size. This is however question with no exact answer as there are many things which affect recovery time.
For example Paul mentioned YouTube has something like 4 hours recovery time even if logs are of reasonable size and wikipedia has 40 minutes or so with innodb_log_file_size=256MB. In other cases I know same log file time may have recovery time of 5-10 minutes.
To understand things affecting recovery time you better to understand how Innodb recovery works.
When Innodb recovers from logs it finds the last checkpoint time. Innodb has fuzzy checkpointing and yet there is particular log sequence number up to which all modifications are already reflected in the database. Changes which have longer log sequence number may be already in the database and may not be as buffer pool flushes are not predictable.
As Innodb is scanning log files from last checkpoint time it looks at records and checks if these are already applied to the pages (Innodb log format has page numbers stored together with operation info), if it has not it performs operation specified in the log applying changes in the buffer pool.
Such process means the following variables are important.
Checkpointing interval This is mainly controlled by size of log files as Innodb needs to flush database pages to the tablespace before it can overwrite matching log records in the log file (which is circular).
Size of log records While we speak about log file size it is number of page updates operations which is important and it can be different depending on your record size and workload. If you have tables with short rows your log records will likely be more compact and so same log length will contain more log records which will need to be replayed.
Data Access Locality This is another extremely important variable – if updates you’re performing are happening to some small amount of pages (for example you’re doing batch update for relatively small table) less IO will be required as pages already will be in the buffer pool. I think the reason why YouTube had so long recovery time is because their application was well optimized to accumulate updates in the cache and so having pretty scattered updates instead updating view counter for popular movie hundreds of times per second.
Database Size This comes back to the data locality but with same data access distribution the larger database you’ll have the more scattered records will be in the end thus larger database tend to recover longer.
Buffer Pool Size This affects recovery time from two parts aspects – first if you have small buffer pool you will have a lot of pages being flushed from buffer pool and so you may end up with flushes being more frequent. I’m not however exactly sure if checkpoints will become more intensive in this case. It also affects recovery speed during recovery time – the larger buffer pool you have the more pages it can hold and so less IOs will be needed. But even if you have enough buffer pool to cache all database it will not magically make recovery time instant because to populate buffer pool still a lot of random IO will be required.
Number of dirty buffers during the crash Even though this looks like something which has serious impact in reality it is not that significant. If checkpoints are not made more often Innodb will still need to scan same portion of log files and perform same amount of checks if page was already flushed since last checkpoint. True – smaller amount of pages will need changes to be applied to them, but this is usually minor part of the workload compared to anything else as page is already in the buffer pool at this stage. Of course it affects time needed to flush buffer pool when recovery is complete but again this does not normally take major part of recovery time.
It is worth to note Innodb optimizes log recovery by reading portions of log records and sorting them to apply them to the pages in more sequential order.
Besides log replay phase there is also phase of flushing dirty buffers to the disk and undo phase – rolling back uncommitted transactions. As MySQL 5.0 these seems to be done in the background.
In the future to increase recovery time even more I hope Heikki will enable redo and undo phases to be performed in the parallel, which should help a lot on systems with large number of hard drives.
So how do you estimate how long it takes Innodb to Recover ? Now you know there are multiple of variables which all may be hard to take into account on paper so in my experience best time is to run some benchmarks – apply the load you expect system to handle and crash Innodb few times to see how long it takes Innodb to recover. I’d recommend to crash by power cycling if possible because this makes sure all OS/RAID caches are clean when recovery happens and also may help you to catch write cache bugs.