How to estimate time it takes Innodb to Recover ?

Today seems to be Innodb day in our Blog, but well this is the question which pops ups quite frequently in Innodb talks and during consulting engagements.

It is well known to get better performance you should normally set innodb_log_file_size large. We however usually recommend caution as it may significantly increase recovery time if Innodb needs to do crash recovery.

I’m often something like what recovery time would expect for 512MB total log file size. This is however question with no exact answer as there are many things which affect recovery time.

For example Paul mentioned YouTube has something like 4 hours recovery time even if logs are of reasonable size and wikipedia has 40 minutes or so with innodb_log_file_size=256MB. In other cases I know same log file time may have recovery time of 5-10 minutes.

To understand things affecting recovery time you better to understand how Innodb recovery works.

When Innodb recovers from logs it finds the last checkpoint time. Innodb has fuzzy checkpointing and yet there is particular log sequence number up to which all modifications are already reflected in the database. Changes which have longer log sequence number may be already in the database and may not be as buffer pool flushes are not predictable.

As Innodb is scanning log files from last checkpoint time it looks at records and checks if these are already applied to the pages (Innodb log format has page numbers stored together with operation info), if it has not it performs operation specified in the log applying changes in the buffer pool.

Such process means the following variables are important.

Checkpointing interval This is mainly controlled by size of log files as Innodb needs to flush database pages to the tablespace before it can overwrite matching log records in the log file (which is circular).

Size of log records While we speak about log file size it is number of page updates operations which is important and it can be different depending on your record size and workload. If you have tables with short rows your log records will likely be more compact and so same log length will contain more log records which will need to be replayed.

Data Access Locality This is another extremely important variable – if updates you’re performing are happening to some small amount of pages (for example you’re doing batch update for relatively small table) less IO will be required as pages already will be in the buffer pool. I think the reason why YouTube had so long recovery time is because their application was well optimized to accumulate updates in the cache and so having pretty scattered updates instead updating view counter for popular movie hundreds of times per second.

Database Size This comes back to the data locality but with same data access distribution the larger database you’ll have the more scattered records will be in the end thus larger database tend to recover longer.

Buffer Pool Size This affects recovery time from two parts aspects – first if you have small buffer pool you will have a lot of pages being flushed from buffer pool and so you may end up with flushes being more frequent. I’m not however exactly sure if checkpoints will become more intensive in this case. It also affects recovery speed during recovery time – the larger buffer pool you have the more pages it can hold and so less IOs will be needed. But even if you have enough buffer pool to cache all database it will not magically make recovery time instant because to populate buffer pool still a lot of random IO will be required.

Number of dirty buffers during the crash Even though this looks like something which has serious impact in reality it is not that significant. If checkpoints are not made more often Innodb will still need to scan same portion of log files and perform same amount of checks if page was already flushed since last checkpoint. True – smaller amount of pages will need changes to be applied to them, but this is usually minor part of the workload compared to anything else as page is already in the buffer pool at this stage. Of course it affects time needed to flush buffer pool when recovery is complete but again this does not normally take major part of recovery time.

It is worth to note Innodb optimizes log recovery by reading portions of log records and sorting them to apply them to the pages in more sequential order.

Besides log replay phase there is also phase of flushing dirty buffers to the disk and undo phase – rolling back uncommitted transactions. As MySQL 5.0 these seems to be done in the background.

In the future to increase recovery time even more I hope Heikki will enable redo and undo phases to be performed in the parallel, which should help a lot on systems with large number of hard drives.

So how do you estimate how long it takes Innodb to Recover ? Now you know there are multiple of variables which all may be hard to take into account on paper so in my experience best time is to run some benchmarks – apply the load you expect system to handle and crash Innodb few times to see how long it takes Innodb to recover. I’d recommend to crash by power cycling if possible because this makes sure all OS/RAID caches are clean when recovery happens and also may help you to catch write cache bugs.

Share this post

Comments (8)

  • dac

    One issue I have which I have not found an answer to is this:

    How do I know when innodb crash recovery has completed from a programmatic stand-point?

    Typically on a system start up, mysqld is automatically started, then the application(s)
    are started. However, if the database is doing recovery, the applications may fail to
    connect (say via JDBC) for quite a while or may not start up at all.

    How can one check if the database is in recovery mode before proceeding?

    March 7, 2008 at 12:05 pm
  • peter

    Well REDO phase of recovery is completed before MySQL starts to accept connections.
    Undo phase is in background and is not really visible – you can find it in the error log. Also in SHOW INNODB STATUS you should see undo happening.

    March 7, 2008 at 4:42 pm
  • Ash

    Hi all,

    PROBLEM: mysqld frequently restarts w/o apparent or visible reasons yet:
    Would this be related to long INNODB recovery time, to the point it will do self service-restart?

    In hopes to clear confusions with the problem Im getting w/ our servers: Here is a snippet from our error.log:

    080625 23:53:28 [Note] /usr/sbin/mysqld: ready for connections.
    Version: ‘5.0.51a-community-log’ socket: ‘/var/lib/mysql/mysql.sock’ port: 3306 MySQL Community Edition (GPL)
    080625 23:53:28 [Note] Slave SQL thread initialized, starting replication in log ‘bin.000025’ at position 12410321, relay log ‘/var/log/mysql/relay.000086’ position: 4837623
    080625 23:53:28 [Note] Slave I/O thread: connected to master ‘repl@’, replication started in log ‘bin.000025’ at position 12410321

    Number of processes running now: 0
    080626 00:06:16 mysqld restarted
    InnoDB: Log scan progressed past the checkpoint lsn 16 3276587130
    080626 0:06:16 InnoDB: Database was not shut down normally!
    InnoDB: Starting crash recovery.
    InnoDB: Reading tablespace information from the .ibd files…
    InnoDB: Restoring possible half-written data pages from the doublewrite
    InnoDB: buffer…
    InnoDB: Doing recovery: scanned up to log sequence number 16 3276588706
    080626 0:06:19 InnoDB: Starting an apply batch of log records to the database…
    InnoDB: Progress in percents: 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
    InnoDB: Apply batch completed
    InnoDB: In a MySQL replication slave the last master binlog file
    InnoDB: position 0 12501110, file name bin.000025
    InnoDB: Last MySQL binlog file position 0 90914, file name /var/log/mysql/bin.000027
    080626 0:06:20 InnoDB: Started; log sequence number 16 3276588706
    080626 0:06:20 [Note] Recovering after a crash using /var/log/mysql/bin
    080626 0:06:20 [Note] Starting crash recovery…
    080626 0:06:20 [Note] Crash recovery finished.
    080626 0:06:20 [Note] /usr/sbin/mysqld: ready for connections.
    Version: ‘5.0.51a-community-log’ socket: ‘/var/lib/mysql/mysql.sock’ port: 3306 MySQL Community Edition (GPL)
    080626 0:06:20 [Note] Slave SQL thread initialized, starting replication in log ‘bin.000025’ at position 12501137, relay log ‘/var/log/mysql/relay.000088’ position: 85644
    080626 0:06:20 [Note] Slave I/O thread: connected to master ‘repl@’, replication started in log ‘bin.000025’ at position 12501137


    System background:
    No of servers in replication chain: 20
    -> 1a 2a 3a 4a 5a 6a 7a 8a 9a 10a ->
    -> 1b 2b 3b 4b 5b 6b 7b 8b 9b 10b ->
    where 10b is master of 1a

    mysql> show master status;
    | File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
    | bin.000030 | 9462 | db1,db2,db3,db1,db2,db3 | |
    1 row in set (0.00 sec)

    *** All db in replicated mode are using INNODB

    OS (all): Linux 2.6.9-42.0.8.ELsmp #1 SMP Tue Jan 23 12:49:51 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

    Also currently: these config (my.cnf) were all disabled from 20 servers:
    # Uncomment the following if you are using InnoDB tables
    #innodb_data_home_dir = /var/lib/mysql/
    #innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
    #innodb_log_group_home_dir = /var/lib/mysql/
    #innodb_log_arch_dir = /var/lib/mysql/
    # You can set .._buffer_pool_size up to 50 – 80 %
    # of RAM but beware of setting memory usage too high
    #innodb_buffer_pool_size = 384M
    #innodb_additional_mem_pool_size = 20M
    # Set .._log_file_size to 25 % of buffer pool size
    #innodb_log_file_size = 100M
    #innodb_log_buffer_size = 8M
    #innodb_flush_log_at_trx_commit = 1
    #innodb_lock_wait_timeout = 50

    Thanks in adv,

    June 25, 2008 at 9:02 pm
  • Dubakoor

    If we have above 500GB of Single DB what is the suggestion for innodb log file size… And for us innodb recover time is taking nearly 6 hours . We have the log file size as 512M now. We have 60Gb RAM.

    August 1, 2013 at 1:31 am

Comments are closed.