November 23, 2014

Is there room for more MySQL IO Optimization?

I prefer to run MySQL with innodb_flush_method=O_DIRECT in most cases – it makes sure there is no overhead of double buffering and I can save the limited amount of file system cache I would normally have on database server for those things which need to be cached — system files, binary log, FRM files, MySQL MyISAM system tables etc. Starting MySQL 5.5 MySQL uses asynchronous IO which should allow it to load IO subsystem very effectively being able to issue many outstanding requests which when can be merged on OS level or RAID controller so we should expect at least as good performance from O_DIRECT as from buffered mode ? It turns out it is not always the case.

I came to this take by accident so there is not a lot of science in coming up with it but I think it is still pretty well representative. I have an old test server having 4*7200RPM SATA hard drives in RAID10. It has 8GB of RAM and I’m running Percona Server 5.5.28-29.1 on it with 6GB allocated for innodb_buffer_pool which with other memory needs leaves roughly 1GB for OS Cache.

Here is pt-summary snippet:

I have and old table which was not optimized for prolonged period of time, so it reflect natural state of the table as it would be in the production over time:

The table is approximately 30GB in size including indexes which is way more than buffer pool size and OS cache which is available. I run CHECK TABLE on this table which essentially performs data and indexes scan which will be reflective to large full table scan and index scan queries, though this is really not the point.

The point is I have discovered with the buffered IO mode the query was taking almost half the time of time it took it to run in O_DIRECT mode. This can’t be explained by extra cache as 1GB of OS cache compared to 6B of buffer pool means there is very little, if anything to gain from this and so the performance gains are likely to come from the Read Ahead operating system is doing instead. This could be confirmed with number of innodb_data_reads (values over 100 second intervals)

with number of data reads spiking to over 2000 per second – a lot more than this software RAID could do in real IOPs.

I started to experiment with different read_ahead configuration. Percona Server still allows me to enable Random Read_Ahead which is not available in MySQL 5.5 (different version is added back in MySQL 5.6) so I tried that as well as disabling read ahead all together. In both cases for this given workload the performance was best with both read_ahead types enabled but it remained much better in buffered mode. Take a look at the graph:

chart_1

What this tells me is there is a lot of untapped potential in MySQL IO as I think we should be getting very close numbers for buffered and unbuffered IO for read only workloads, if anything unbuffered performance should be better as it has less overhead with double buffering as well as more information MySQL has about data which will be accessed next – in case of full table scan, index scans etc the information about blocks which will be needed next is known well in advance with very high probability. Prefetch based on something along these lines could be very valuable addition.

MySQL 5.6 has more changes in IO code including what should be much improved read-ahead and I will check its behavior next. Lets see if situation is dramatically different.

The good news though is even though performance advantages of buffered mode can be dramatic for some workloads such cases are rare – most OLTP kind of workloads indeed work better with O_DIRECT mode. The RAID controller also can make significant difference here as O_DIRECT only applies to operating system logic while RAID controller may be doing some form of read-ahead on its, making OS read ahead help unneeded. Well there is a lot to experiment with even in regards to such a minor item – Database Scientists will never have their job complete.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Paul says:

    Hey Peter,

    I’m admittedly not overly familiar with these configuration options nor the inner workings of MySQL, so I have a quick question. I’ve been taught that DBMSs use their own buffer pools so they don’t rely on the OS’s file caching/buffer management, and can therefore tweak things, such as cache invalidation techniques (LRU/MRU/etc).

    I’d just like to know if my understanding of O_DIRECT is correct – as far as I can tell, it means that InnoDB no longer uses it’s own buffer pool, but instead hands over all the I/O caching to the OS’s filesystem? And therefore instead of pinning pages and running its own buffer management, it just sends I/O requests to the OS and lets it handle what’s kept in memory and what’s not?

    Thanks!

  2. Paul,

    It is actually opposite. By default Innodb will use buffered IO so it will both use OS buffer and its internal cache called buffer pool. If you use O_DIRECT Innodb will use Direct (unbuffered) IO for database files and hence only use its internal cache. There is no option for Innodb to use OS cache only, other than setting up very small innodb buffer pool.

  3. Dimitri says:

    Peter,

    will be great to give details about the filesystem you’ve used, as well some simples from the “iostat” output.. – while I’ve also observed that filesystems are generally doing prefetch of data blocks more faster than InnoDB, in you particular case it’s also possible that filesystem is involving more parallel I/O reads than InnoDB (and iostat output should give an answer)..

    also, it’s not fine to draw diagrams starting form non zero values (and you’re using 80 instead of 0) — it gives a wrong perception of the ratio in potential gains, and it’s not good to give a wrong idea to a reader ;-)

    then regarding O_DIRECT — I think the main problem here is a generally poor FS cache memory management in Linux + endless I/O buffering on many levels.. – while the most optimal should be combination of buffered I/O reads + direct I/O writes (like it’s doing QFS for ex.) — let’s hope Linux will become better over a time ;-)

    Rgds,
    -Dimitri

  4. Dimitri,

    This was on XFS. I’m looking more into it and will provide more details. Yes I think the fact filesystem is doing more paralel reads and/or read ahead is a fact though the question is Why MySQL is not doing that ?

    Regarding is diagram from Zero values – I have not found how to make Google Charts where I had data to do it. It annoys me too. Next time I’ll probably do everything in Excel instead.

    Regarding O_DIRECT I do not quite get your point. If there is endless buffering etc would not the performance in buffered mode be a lot worse not other way around ? Also the workload in this case was surely IO bound based on CPU usage stats.

    I was going to see if I can get the repeatable test case with some different data ie sysbench table which can be easier for you or anyone else willing to play with to repeat.

  5. Adant says:

    Peter,

    You can enable random read head from 5.5.16, although not the same as Percona’s variable.
    It is turned off by default.

    http://dev.mysql.com/doc/refman/5.6/en/innodb-performance.html


    Random read-ahead is a technique that predicts when pages might be needed soon based on pages already in the buffer pool, regardless of the order in which those pages were read. If 13 consecutive pages from the same extent are found in the buffer pool, InnoDB asynchronously issues a request to prefetch the remaining pages of the extent. This feature was initially turned off in MySQL 5.5. It is available once again starting in MySQL 5.1.59 and 5.5.16 and higher, turned off by default. To enable this feature, set the configuration variable innodb_random_read_ahead.

    see also : http://blogs.innodb.com/wp/2011/07/reintroducing-random-readahead-in-innodb/
    Bug 12356373 – PERFORMANCE REGRESSION FROM 5.1 TO 5.5 : GROUP BY

    Best regards,

    Arnaud

  6. Adant,

    Thanks. Sorry for confusion I thought it was only added back in MySQL 5.6

    Is MySQL 5.5.16+ read ahead version is the same as added in MySQL 5.6 or as described here:
    http://blogs.innodb.com/wp/2011/07/reintroducing-random-readahead-in-innodb/

  7. Peter,

    >Is MySQL 5.5.16+ read ahead version is the same as added in MySQL 5.6 or as described here:
    >http://blogs.innodb.com/wp/2011/07/reintroducing-random-readahead-in-innodb/ ?

    Absolutely. The confusion may have come from the missing entry in the innodb parameters for 5.1 and 5.5.
    http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html

    I filed a doc bug to fix that : Bug 16092383 – DOCUMENT INNODB_RANDOM_READ_HEAD IN INNODB PARAMETERS

    Best regards,

    Arnaud

  8. Dimitri says:

    Hi Peter,

    what I meant about buffered I/O on Linux:
    – reads are fine (as data prefetch on FS is usually pretty good)
    – writes are problematic, as you never know what is really written (and you may find many cases where independently how often you’re calling fsync() within your code, the real sync to the disk will be involved not more frequent than once per second ;-))
    – then an overall FS cache management (having no limits in RAM) is very painful.. (several times I saw MySQL server start taking over 40min for ex. just because Linux was slow to decide which page to remove from FS cache and let them to be allocated to the InnoDB buffer pool, etc.)

    so, O_DIRECT is giving a workaround here from all these problems (and again, should be used combined with O_SYNC flag to be sure data were really moved to the disk on writes — see http://dimitrik.free.fr/blog/archives/2012/11/mysql-performance-linux-io-and-fusionio-part-2.html for details).. And in most cases you’ll be more *safe* when using O_DIRECT, while may still go faster with buffered I/O.. – but there is no silver bullet, have to test, that’s all ;-)

    Rgds,
    -Dimitri

  9. repls says:

    Hi Peter,
    I have two questions:
    1.
    As i know the variable innodb_read_io_threads is used to do pre-reading. but you memtioned the read_ahead feature is removed in MySQL 5.5, so the question comming: why read_ahead is disabled in MySQL 5.5 but the default value for innodb_read_io_threads is still 4 rather 0 ? or innodb_read_io_threads not only do pre-reading but also normal read(non-pre-reading)?

    2.
    Do you have a test on these two cases, and if you do , then how about their performance?
    case1: innodb_flush_method=O_DRIECT + read ahead in database level
    case2: innodb_flush_method=default(buffered IO) + read ahead in OS level

    Best regards

    Repls

  10. Repl,

    innodb_read_io_threads are used to issue requests and service completions as starting MySQL 5.5 you will normally use asynchronous IO. In MySQL 5.5 random read_ahead was disabled as it was causing performance issues with “linear” read ahead still enabled by default. In later MySQL 5.5 versions (starting 5.5.16) random read ahead was actually added back though it is still disabled by default.

    In this given test case2 was showing better performance while for a lot of real production use O_DIRECT works better.

    My theory is as follows – in case of concurrent inserts by multiple threads the data may appear in semi sequential order, for example logical rows will be in pages like 2,1,3,5,6,9,12,11 – which is increasing but with gaps and out of order. I think this does not trigger sequential read ahead and triggers random read ahead too late to be much of performance gain.

  11. Hi Peter,

    Since you are doing a CHECK TABLE, which will basically entail a table scan, so I wonder how much the linear_read_ahead has an impact on its own as compared to random_read_ahead. It must be worthwhile to tune innodb_read_ahead_threshold and see if that gives any positive benefit for linear_read_ahead alone. Lower values of innodb_read_ahead_threshold should trigger a linear_read_ahead much sooner. Though it would only trigger the read_ahead of the next extent and would not have an impact on how pages are read from the current extent as triggering a read_ahead of pages from the current extent can only be induced by random_read_ahead.

  12. Alfie John says:

    Was ALL_O_DIRECT removed in Percona 5.6? I’m getting “Unrecognized value”.

  13. Laurynas Biveinis says:

    Alfie –

    the RC releases don’t have ALL_O_DIRECT, but it will be added back.

Speak Your Mind

*