This is a time-honored topic, and there’s no shortage of articles on the topic on this blog. I wanted to write a post trying to condense and clarify those posts, as it has taken me a while to really understand this relationship.
Ultimately this mechanism was an optimization for slow drives: if you can sequentially write all the changes into a log, it will be faster to do on the fly as transactions come in than trying to randomly write the changes across the tablespaces. Sequential IO trumps Random IO.
However, even today in our modern flash storage world where random IO is significantly less expensive (from a latency perspective, not dollars), this is still an optimization because the longer we delay updating the tablespace, the more IOPs we can potentially conserve, condense, merge, etc. This is because:
So, first of all, what can we see about Innodb log checkpointing and what does it tell us?
|
1 2 3 4 5 6 7 |
mysql> SHOW ENGINE INNODB STATUSG --- LOG --- Log sequence number 9682004056 Log flushed up to 9682004056 Last checkpoint at 9682002296 |
This shows us the virtual head of our log (Log sequence Number), the last place the log was flushed to disk (Log flushed up to), and our last Checkpoint. The LSN grows forever, while the actual locations inside the transaction logs are reused in a circular fashion. Based on these numbers, we can determine how many bytes back in the transaction log our oldest uncheckpointed transaction is by subtracting our ‘Log sequence number’ from the ‘Last checkpoint at’ value. More on what a Checkpoint is in a minute. If you use Percona server, it does the math for you by including some more output:
|
1 2 3 4 5 6 7 8 9 10 |
--- LOG --- Log sequence number 9682004056 Log flushed up to 9682004056 Last checkpoint at 9682002296 Max checkpoint age 108005254 Checkpoint age target 104630090 Modified age 1760 Checkpoint age 1760 |
Probably most interesting here is the Checkpoint age, which is the subtraction I described above. I think of the Max checkpoint age as roughly the furthest back Innodb will allow us to go in the transaction logs; our Checkpoint age cannot exceed this without blocking client operations in Innodb to flush dirty buffers. Max checkpoint age appears to be approximately 80% of the total number of bytes in all the transaction logs, but I’m unsure if that’s always the case.
Remember our transaction logs are circular, and the checkpoint age represents how far back the oldest unflushed transaction is in the log. We cannot overwrite that without potentially losing data on a crash, so Innodb does not permit such an operation and will block incoming writes until the space is available to continue (safely) writing in the log.
On the other side, we have dirty buffers. These two numbers are relevant from the BUFFER POOL AND MEMORY section of SHOW ENGINE INNODB STATUS:
|
1 2 3 |
Database pages 65530 ... Modified db pages 3 |
So we have 3 pages that have modified data in them, and that (in this case) is a very small percentage of the total buffer pool. A page in Innodb contains rows, indexes, etc., while a transaction may modify 1 or millions of rows. Also realize that a single modified page in the buffer pool may contain modified data from multiple transactions in the transaction log.
As I said before, dirty pages are flushed to disk in the background. The order in which they are flushed really has little to nothing to do with the transaction they are associated with, nor with the position associated with their modification in the transaction log. The effect of this is that as the thread managing the dirty page flushing goes about its business, it is not necessarily flushing to optimize the Checkpoint age, it is flushing to try to optimize IO and to obey the LRU in the buffer pool.
Since buffers can and will be flushed out of order, it may be the case that there are a lot of transactions in the transaction log that are fully flushed to disk (i.e., all the pages associated with said transaction are clean), but there still could be older transactions that are not flushed. This, in essence, is what fuzzy checkpointing is.
The checkpoint process is really a logical operation. It occasionally (as chunks of dirty pages get flushed) has a look through the dirty pages in the buffer pool to find the one with the oldest LSN, and that’s the Checkpoint. Everything older must be fully flushed.
The main reason this is important is if the Checkpoint Age is not a factor in dirty buffer flushing, it can get too big and cause stalls in client operations: the algorithm that decides which dirty pages to flush does not optimize for this [well] and sometimes it is not good enough on its own.
So, how can we optimize here? The short of it is: make innodb flush more dirty pages. However, I can’t help but wonder if some tweaks could be made to the page flushing algorithm to be more effective there in choosing older dirty pages. It is clear how that algorithm works without reading the source code.
There are a lot of ways to tune this, here is a list of the most signficant, roughly ordered from oldest to newest, and simultaneously listed from least effective to most effective: