Actually there is also the second fix to what we think is bug in InnoDB, where it blocks queries while it is not needed (I will refer to it as “sync fix”). In this post I however will focus on innodb_flush_neighbor_pages.
By default InnoDB flushes so named neighbor pages, which really are not neighbors.
Say we want to flush page P. InnoDB is looking in an area of 128 pages around page P, and flushes all the pages in that area that are dirty. To illustrate, say we have an area of memory like this:
...D...D...D....P....D....D...D....D where each dot is a page that does not need flushing, each “D” is a dirty page that InnoDB will flush, and P is our page.
So, as the result of how it works, instead of performing 1 random write, InnoDB will perform 8 random writes.
This is quite far from original intention to flush as many pages as possible in singe sequential write.
So we added new
innodb_flush_neighbor_pages=cont method, with it, only really sequential write will be performed
That is case
...D...D...D..DDDPD....D....D...D....D only following pages will be flushed:
...D...D...D..FFFFF....D....D...D....D (marked as “F”)
Beside “cont”, in Percona Server 5.5.19
innodb_flush_neighbor_pages also accepts values “area” (default) and “none” (recommended for SSD).
What kind of effect does it have ? Let’s run some benchmarks.
First results from HP ProLiant.
As you see with “cont” we are able to get stable line. And even with default innodb_flush_neighbor_pages, Percona Server has smaller dips than MySQL.
So this is to show effect of “sync fix”, let’s compare Percona Server 5.5.18 (without fix) and 5.5.19 (with fix).
You see that the fix helps to have queries running in cases when before it was “hard” stop, and no
The previous result may give you impression that “cont” guarantees stable line, but unfortunately this is not always the case.
There are results ( throughput and response time) from Cisco UCS 250 server:
You see, on this server we have longer and deeper periods when MySQL stuck in flushing, and in such cases, the
innodb_flush_neighbor_pages=cont only helps to relief the problem, not completely solving it.
Which, I believe, is still better than complete stop for significant amount of time.
The raw results, scripts and different CPU/IO metrics are available from our Benchmarks Launchpad