Checkpointing — which involves periodically writing out dirty pages from memory — is central to the design of crash recovery for both TokuDB and InnoDB. A key issue in designing a checkpointing system is how often to checkpoint, and TokuDB takes a very different approach from InnoDB. How often and how much InnoDB checkpoints is complicated, but under certain workloads it can be relatively infrequent. In contrast, TokuDB runs a complete checkpoint starting one minute after the last one ended.
Frequent checkpoints make for fast recovery. Once MySQL crashes, the storage engine needs to replay the log to get back to a correct state. The length of the log is a function of the time since the last checkpoint for TokuDB and a more complicated function of the workload for InnoDB. And replaying the log is single threaded. So TokuDB recovers in minutes, and usually much faster. InnoDB can sometimes take hours or more to recover. Indeed, there is considerable lore around making InnoDB recover faster.
So what’s the downside to frequent checkpoints? Up until now, the answer was simple: when you are in a checkpoint, your performance drops. This was famously illustrated for InnoDB when Vadim Tkachenko at Percona Consulting showed that MySQL could become completely unresponsive for minutes at a time when InnoDB is flushing dirty pages. We see a similar outcome here:
In this case we see a stall in which the throughput drops to around 25%, and the stall lasts for minutes. I want to stress that fuzzy checkpointing another techniques are used in InnoDB so to avoid such swings, but the tpcc benchmark shows that it doesn’t always work.
In previous versions of TokuDB, we also had a dip in performance associated with checkpoints, but frequent checkpoints are also smaller checkpoints, so our performance would drop to around 80% of peak for a couple of seconds. A drop to 80% is better than a drop to 25%, but we knew we could do better.
And we did. As of TokuDB v6.0, we’ve eliminated the performance variability from checkpointing. We’re still checkpointing just as frequently, so you still get fast recovery. How? It was a combination of reducing the amount of work a checkpoint needs to do and fixing the locking interaction between checkpoints and other operations. Below is a sysbench benchmark. This is a case where InnoDB checkpoint behavior is as good as it gets, and I wanted to compare us with InnoDB’s best case, not its worst case.
This graph shows that TokuDB v6.0 has no checkpoint variability. It turns out that TokuDB v6.0 with standard compression has about the same average TPS as TokuDB v5.2, but with no checkpointing artifacts. Finally, if you have the CPU budget for it, turning on aggressive compression gives a big boost in transactions per second, still with no checkpointing variability.
So in a nutshell, I feel like we’ve taken care of the checkpointing issue. As of TokuDB v6.0, we have the upside of frequent checkpoints — small logs and fast recovery — without the downside of variability. The engineering team at Tokutek is pretty proud of these results.
To learn more about TokuDB: