The need for parallel crash recovery in MySQL

parallel crash recovery in MySQLIn this blog, I will discuss how parallel crash recovery in MySQL benefits several processes.

I recently filed an Oracle feature request to make crash recovery faster by running in multiple threads.

This might not seem very important, because MySQL does not crash that often. When it does crash, however, crash recovery can take 45 mins – as I showed in this post:

What is a big innodb_log_file_size?

Even in that case, it still might not be a big issue as you often failover to a slave.

However, crash recovery plays important part in the following processes:

  • Backups with Percona XtraBackup (and MySQL Enterprise Backups) and backups with filesystem snapshots.
    • Crash recovery is part of the backup process, and it is important to make the backup task faster.
  • State Snapshot Transfer in Percona XtraDB Cluster.
    • SST, either XtraBackup or rsync bases, also relies on the crash recovery process – so the faster it is done, the faster a new node joins the cluster.
    • It might seem that Oracle shouldn’t care about Percona XtraDB Cluster. But they are working on MySQL Group Replication. I suspect that when Group Replication copies data to the new node, it will also rely on some kind of snapshot technique. Unless they aren’t serious about this feature and will recommend mysqldump/mysqlpump for data copying).
  • My recent proof of concept for Automatic Slave propagation in Docker environment also uses Percona XtraBackup, and therefore crash recovery for new slaves.

In general, any process that involves MySQL/InnoDB data transfer will benefit from a faster crash recovery. In its current state uses just one thread to read and process data. This limits performance on modern hardware, which uses multiple CPU cores and fast SSD drives.

It is also important to consider that the crash recovery time affects how big log files can be. If we improve the crash recovery time, we can store very big InnoDB log files (which positively affects performance in general).

Percona is working on ways to make it faster. However, if faster recovery times are important to you environment, I encourage you to let Oracle know that you want to see parallel crash recovery in MySQL.

Share this post

Comments (2)

  • vidyadhar Reply

    @Vadim Tkachenko, This is really a good feature. But I just want to understand one thing, You mean to say we should have multiple threads to perform redo log phase in the process of recovery as all other phases will run in background even after mysql is up and running. Well in that case we should have a similar mechanism to using which slave parallel threads work in mysql 5.6 (one thread for database) as redo activity might involve operation which might belongs multiple databases. Please let me know your thoughts. Sorry if my question is not clear.

    July 5, 2016 at 1:00 pm
    • Vadim Tkachenko Reply


      I think it is possible and will be easier to implement recovery one thread per database.
      Still some workloads use only single database, so it would be beneficial to have an universal parallel recovery – something like LOGICAL_CLOCK in MySQL 5.7 parallel slave.

      July 7, 2016 at 6:31 pm

Leave a Reply