How InnoDB handles REDO logging

Xaprb (Baron) recently blogged about how InnoDB performs a checkpoint , I thought it might be useful to explain another important mechanism that affects both response time and throughput – The transaction log.

The InnoDB transaction log handles REDO logging, this is the mechanism that provides the A (Atomic) C (Consistent)  and D (Durability) in ACID. The transaction log keeps a complete record of every change that occurs to the pages inside the database. Instead of logging whole pages (Value / Physical logging) where both the before and after image of a page is logged or by logging logical changes  (Logical logging) to the dataset, InnoDB uses  Physiological logging, this basically means that it combines the two techniques to provide a logging mechanism that is both optimal in terms of the amount of data logged whilst still ensuring the log is action consistent and each operation is idempotent. It is important to note that physiological logging needs to operate on consistent pages, Innodb achieves this by using the double write buffer to ensure consistent page writes.

Why log?

Since InnoDB tries to keep the working set in memory (InnoDB Buffer Pool), therefore the changes made by transactions will occur in volatile memory and later be flushed to disk. So in the event of volatile memory failure or during a system restart InnoDB can guarantee to have a (C)onsistent record of the state of the data in the database in (D)urable memory and that each transaction is (A)tomic.

What is actually logged?

<space id, page no, operation code, data>

This translates into operations such as

<0, 2600, ‘insert’, ‘after record at offset 8192’, ‘(38, Percona, Fortune…)’>

The log header stores information about the last checkpoint. The log is stored as records rather than pages and has 512 byte alignment, matching disk sectors. Percona Server with XtraDB allows you to control the transaction log record size, as better performance may be gained with 4096 byte records with newer media such as SSD cards.

Why Idempotent?

Each record in the log needs to be able to be redone multiple times since at restart and media recovery we will not know if that action has previously been executed as we will start recovery from the last checkpoint.InnoDB achieves this by reading the LSN of the page and compares that to the one of log record, if the LSN is higher that means the page was flushed after this record was processed so the change does not need to be applied.

How does this affect response time?

In order to keep ACID compliance, the transaction log must guarantee the logging action happens before the transaction is committed, this is known as write-ahead-logging. This essentially means that before an update can return it must be logged. As the time to log is added to every update it can become an important overhead to your response time. Indeed if InnoDB cannot log at all, your transaction will never complete.

Both the MySQL configuration and hardware will affect the speed at which InnoDB can log. You can request that InnoDB force its REDO records to the transaction log but not actually flush the log to disk at commit. By this means you are ensuring that in the event of a MySQL crash the data will still be consistent, however in the event of a system crash there may be some records that never make it to durable memory.

The three options available are:

innodb_flush_log_at_trx_commit = 0 /* Write log buffer to log about once a second and flush */

innodb_flush_log_at_trx_commit = 1 /* Write log buffer to log and flush to disk */

innodb_flush_log_at_trx_commit = 2 /* Write log buffer to log and flush about once per second */

(About once a second is related to the fact the flush and sync is controlled by a background thread that gives no guarantee it will be per second)

NB: Percona  Server with XtraDB actually allow you to control this at session level.

This allows you to risk durability by having the logging return before the record is actually on disk. A similar and complementary mechanism is to use a battery backed raid card. This allows updates to be written to the raid cards battery protected memory which has much faster access than the underlying disk(s).

How are the REDO records applied?

InnoDB processes the REDO records from the last checkpoint, each record’s LSN is compared against the LSN of the page and if the on disk LSN is equal or higher the record is skipped. This behavior can be seen by enabling the innodb_recovery_stats option in Percona Server, which allows you to see the progress made during recovery.

How does this affect throughput?

As the REDO log in InnoDB uses a fixed length circular transaction log, the speed at which you can process updates is tightly linked to the speed at which check-pointing can occur. In order to insert a new record in the transaction log, InnoDB must ensure that the previously written record has been flushed to disk . This means that it can be beneficial to have very large transaction logs which allow a larger window between REDO logging and the checkpoint on disk.

Vadim touched upon this subject recently in his post MySQL 5.5.8 – in search of stability, as seen in his post, Percona Server with XtraDB allows much larger logs beyond the InnoDB limit of 4G.

As a side effect of logging there is no need to apply page changes at commit, therefore what would be a random write is translated into a much faster sequential write. The background processing of dirty pages can then apply algorithms to optimize those writes during checkpoint.

What else?

There is of course one component missing to our picture, just because each change is logged it does not mean we can reconstruct a consistent set of data. A change may be logged and then later rolled back do to an error or a specific request, as part of the DO-UNDO-REDO protocol InnoDB also records UNDO information, however that is stored outside of the transaction log and as such is not covered here.

Share this post

Comments (19)

  • Ewen Fortune Reply

    @Chang,

    >what is the relationship between undo and redo log in innodb.

    In InnoDB UNDO logging is handled via the UNDO log (Also referred to as the ROLLBACK segment). UNDO logging being the inverse of REDO, used in the case of InnoDB to both support MVCC and provide a mechanism to rollback uncommitted transactions during recovery.

    >How does innodb keep the consistent in the case of innodb_flush_log_at_trx_commit != 1

    InnoDB guarantees consistency for everything up until the last flushed transaction log record. So where innodb_flush_log_at_trx_commit != 1 there is potential to loose up to a second of transactions, but the data will be consistent up until that point.

    >one possibility is don’t flush dirty pages until related redo-logs are flushed, but it looks only >allowing one transaction per page

    Dirty pages being flushed contain the LSN, this means InnoDB can automatically keep track of
    up until what point in the transaction log the related records have been processed.

    February 3, 2011 at 12:00 am
  • Viv Reply

    Ewen, thanks a lot for the explanation, I finally understand the relationship now. Thanks a lot!:)

    February 3, 2011 at 12:00 am
  • Ewen Fortune Reply

    Viv,

    Just so we have a copy of this on the blog, the dirty pages are flushed from the BP. Since both the page and the REDO log carry the LSN we know how where we are in the REDO log in terms of what has been flushed and what is still dirty. That is of course what happens under normal operations, when InnoDB goes into recovery mode is when it uses the REDO log (And UNDO) to update the on disk pages to represent the last known state of the database.

    Ewen

    February 3, 2011 at 12:00 am
  • Baron Schwartz Reply

    Thank you for posting this on our forum at http://forum.percona.com/index.php/mv/msg/1801/ where we will answer instead of in the comments.

    February 3, 2011 at 12:00 am
  • viv Reply

    Hi Ewen

    I have a question about the innodb flush the data into the redo log. When there is one value updated , the BP would mark this page as dirty, and the changes would be inserted into redo log. When it is time to flush the changed data into data file, it should be flushed from BP or from redo log? That is the part I dont understand, and I can not find any answers on the internet. As if it should flush from redo log, then the page in BP should not be dirty anymore, cause it contains the newest value, which means how can master thread flush the redo log when it reaches max. percentage of dirty page? as it becames clean as soon as it flush the changes into redo log. If the page is still dirty in BP, and the changes still in the redo log, but not data file, if there is a query need to read this value, where should it fetch from?

    Thanks if you could explain a bit to me :)

    February 3, 2011 at 12:00 am
  • Justin Swanhart Reply

    Very nice post Ewen!

    February 3, 2011 at 4:34 pm
  • Pavel Reply

    I didn’t know about the innodb_recovery_stats … thanks for the nice writeup!

    February 3, 2011 at 5:59 pm
  • Paul Keenan Reply

    Good article. Would be more helpful if you could define LSN or provide a link.

    February 4, 2011 at 8:36 am
  • Ewen Fortune Reply

    @Justin – Thanks!

    @Pavel – Your welcome!

    @Paul Keenen – LSN = Log Sequence Number

    The LSN is used to places change records into a sequence, in InnoDB its basically the byte offset of the total log space + number of rotations.

    Related blog post: http://www.mysqlperformanceblog.com/2007/12/19/mvcc-transaction-ids-log-sequence-numbers-and-snapshots/
    Quick definition: http://dev.mysql.com/doc/innodb/1.1/en/glossary.html#glos_lsn
    For more detail check out: Transaction Processing: Concepts and Techniques

    February 4, 2011 at 9:31 am
  • Paul Keenan Reply

    Thanks Ewen, makes sense.

    February 9, 2011 at 4:12 am
  • Swarn Singh Reply

    Really helpful and comprehensive article…Ewen

    February 10, 2011 at 7:50 am
  • viv Reply

    Hi Ewen

    I have a question about the innodb flush the data into the redo log. When there is one value updated , the BP would mark this page as dirty, and the changes would be inserted into redo log. When it is time to flush the changed data into data file, it should be flushed from BP or from redo log? That is the part I dont understand, and I can not find any answers on the internet. As if it should flush from redo log, then the page in BP should not be dirty anymore, cause it contains the newest value, which means how can master thread flush the redo log when it reaches max. percentage of dirty page? as it becames clean as soon as it flush the changes into redo log. If the page is still dirty in BP, and the changes still in the redo log, but not data file, if there is a query need to read this value, where should it fetch from?

    Thanks if you could explain a bit to me 🙂

    March 30, 2011 at 6:40 am
  • Baron Schwartz Reply

    Thank you for posting this on our forum at http://forum.percona.com/index.php/mv/msg/1801/ where we will answer instead of in the comments.

    March 30, 2011 at 8:22 am
  • Ewen Fortune Reply

    Viv,

    Just so we have a copy of this on the blog, the dirty pages are flushed from the BP. Since both the page and the REDO log carry the LSN we know how where we are in the REDO log in terms of what has been flushed and what is still dirty. That is of course what happens under normal operations, when InnoDB goes into recovery mode is when it uses the REDO log (And UNDO) to update the on disk pages to represent the last known state of the database.

    Ewen

    March 30, 2011 at 8:45 am
  • Viv Reply

    Ewen, thanks a lot for the explanation, I finally understand the relationship now. Thanks a lot!:)

    March 30, 2011 at 10:34 am
  • chang Reply

    Hi Ewen

    what is the relationship between undo and redo log in innodb? How does innodb keep the consistent in the case of innodb_flush_log_at_trx_commit != 1? In other words, some pages are flushed and others aren’t, and while redo log is missed.

    one possibility is don’t flush dirty pages until related redo-logs are flushed, but it looks only allowing one transaction per page.

    what is innodb’s way?

    April 11, 2011 at 1:46 am
  • Ewen Fortune Reply

    @Chang,

    >what is the relationship between undo and redo log in innodb.

    In InnoDB UNDO logging is handled via the UNDO log (Also referred to as the ROLLBACK segment). UNDO logging being the inverse of REDO, used in the case of InnoDB to both support MVCC and provide a mechanism to rollback uncommitted transactions during recovery.

    >How does innodb keep the consistent in the case of innodb_flush_log_at_trx_commit != 1

    InnoDB guarantees consistency for everything up until the last flushed transaction log record. So where innodb_flush_log_at_trx_commit != 1 there is potential to loose up to a second of transactions, but the data will be consistent up until that point.

    >one possibility is don’t flush dirty pages until related redo-logs are flushed, but it looks only >allowing one transaction per page

    Dirty pages being flushed contain the LSN, this means InnoDB can automatically keep track of
    up until what point in the transaction log the related records have been processed.

    April 20, 2011 at 1:06 am
  • feedhoo Reply

    undo in redo logfile ?

    February 22, 2012 at 8:47 pm
  • Karthik.P.R Reply

    Nice Post Ewen

    April 30, 2013 at 4:03 am

Leave a Reply