Patch: percona_innodb_doublewrite_path

The doublewrite buffer is a reserved area in the main tablespace of InnoDB/XtraDB. It helps avoiding data corruption when data is flushed from the buffer pool to the disk and a partial page write occurs.

This patch allows you to move the doublewrite buffer from the main tablespace to a separate location.

:!: This option is for advanced users only. See the discussion below to fully understand if you really need to use it.

Version-Specific Information

Percona-Server Version Comments
5.1.47-11.0 Full functionality available.

Variables Provided

innodb_doublewrite_file

TypeSystem and command-line variable
ScopeGlobal
DynamicNo
Defaultnone

Use this option to create a dedicated tablespace for the doublewrite buffer.

This option expects a filename which can be specified either with an absolute or a relative path. A relative path is relative to the data directory.

Detailed information

The following discussion will clarify the improvements made possible by this patch.

Goal of the doublewrite buffer

InnoDB and XtraDB use many structures, some on disk and others in memory, to manage data as efficiently as possible. To have an overview of the different components see this post. Let's now focus on the doublewrite buffer.

InnoDB/XtraDB uses a reserved area in its main tablespace, called the doublewrite buffer, to prevent data corruption that could occur with partial page writes. When the data in the buffer pool is flushed to disk, InnoDB/XtraDB will flush whole pages at a time (by default 16KB pages) and not just the records that have changed within a page. It means that, if anything unexpected happens during the write, the page can be partially written leading to corrupt data.

With the doublewrite buffer feature, InnoDB/XtraDB first writes the page in the doublewrite buffer and then to the data files.

If a partial page write occurs in the data files, InnoDB/XtraDB will check on recovery if the checksum of the page in the data file is different from the checksum of the page in the doublewrite buffer and thus will know if the page is corrupt or not. If it is corrupt, the recovery process will use the page stored in the doublewrite buffer to restore the correct data.

If a partial write occurs in the doublewrite buffer, the original page is untouched and can be used with the redo logs to recover the data. For further information on the doublewrite buffer, you can see this post.

Performance impact of the doublewrite buffer

In usual workloads the performance impact is low-5% or so. As a consequence, you should always enable the doublewrite buffer because the strong guarantee against data corruption is worth the small performance drop.

But if you experience a heavy workload, especially if your data does not fit in the buffer pool, the writes in the doublewrite buffer will compete against the random reads to access the disk. In this case, you can see a sharp performance drop compared to the same workload without the doublewrite buffer-a 30% performance degradation is not uncommon.

Another case when you can see a big performance impact is when the doublewrite buffer is full. Then new writes must wait until entries in the doublewrite buffer are freed.

What's new with the patch?

In a standard InnoDB/XtraDB installation, the doublewrite buffer is located in the main tablespace (whether you activate the innodb_file_per_table or not) and you have no option to control anything about it.

The patch adds an option (innodb_doublewrite_file) to have a dedicated location for the doublewrite buffer.

How to choose a good location for the doublewrite buffer?

Basically if you want to improve the I/O activity, you will put the doublewrite buffer on a different disk. But is it better on an SSD or a more traditional HDD? First you should note that pages are written in a circular fashion in the doublewrite buffer and only read on recovery. So the doublewrite buffer performs mostly sequential writes and a few sequential reads. Second HDDs are very good at sequential write if a write cache is enabled, which is not the case of SSDs. Therefore you should choose a fast HDD if you want to see performance benefits from this option. For instance, you could place the redo logs (also written in a sequential way) and the doublewrite buffer on the same disk.

Related Reading

Please note that this is NOT a place to ask questions or report bugs. This comment system is only intended for users to share tips and documentation additions concerning particular document.
Please report bugs to https://bugs.launchpad.net/percona-project/+filebug and ask general questions in maillist Percona-discussions

Discussion

Yasufumi Kinoshita, 2010/07/07 23:22

Why my description about parallelization of write IO is removed? The doublewrite basically semi-serializes the write IO. The characteristic is not depend on its size. If you use good fs and storage about good parallelization of write IO. You should not use doublewrite buffer if you want performance of write IO. Currently, the usual fs of Linux is not so good to parallelize write IO, and the performance impact seems not so big. But in the future, the doublewrite will become ineffective for fast storage and good fs. I think the changing size of doublewrite buffer effects not so good.

Yasufumi Kinoshita, 2010/07/07 23:27

My point is the semi-serialization effect is depend on turnaround of write IO to doublewrite buffer. You should choose extreme fast another device. (“not HDD”, like BBU-DRAM disk etc.). The sequential effect is alomost nothing. because it is extremely much write IO to the only 2MB area. So if the HDD has BBU write cache over 2MB only for the doublewrite buffer may be good also.

Yasufumi Kinoshita, 2010/07/07 23:29

Sorry for many post…. In the case (2M BBU HDD only for doublewrite), increasing doublewrite buffer over 2MB should cause bad effect. “bigger is not better always”

Enter your comment (wiki syntax is allowed):
WRVAC
 
patches/percona_innodb_doublewrite_path.txt · Last modified: 2010/07/07 08:30 by stephane.combaudon
 
Except where otherwise noted, content on this wiki is licensed under the following license:GNU Free Documentation License 1.2
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki