I prefer to run MySQL with innodb_flush_method=O_DIRECT in most cases – it makes sure there is no overhead of double buffering and I can save the limited amount of file system cache I would normally have on database server for those things which need to be cached — system files, binary log, FRM files, MySQL MyISAM system tables etc. Starting MySQL 5.5 MySQL uses asynchronous IO which should allow it to load IO subsystem very effectively being able to issue many outstanding requests which when can be merged on OS level or RAID controller so we should expect at least as good performance from O_DIRECT as from buffered mode ? It turns out it is not always the case.
I came to this take by accident so there is not a lot of science in coming up with it but I think it is still pretty well representative. I have an old test server having 4*7200RPM SATA hard drives in RAID10. It has 8GB of RAM and I’m running Percona Server 5.5.28-29.1 on it with 6GB allocated for innodb_buffer_pool which with other memory needs leaves roughly 1GB for OS Cache.
Here is pt-summary snippet:
# Percona Toolkit System Summary Report ######################
Date | 2013-01-03 22:29:39 UTC (local TZ: EST -0500)
Hostname | smt2
Uptime | 6 days, 6:21, 4 users, load average: 0.00, 0.01, 0.05
System | Supermicro; PDSMi; v0123456789 (Other)
Service Tag | 0123456789
Platform | Linux
Release | Ubuntu 12.04.1 LTS (precise)
Kernel | 3.2.0-35-generic
Architecture | CPU = 64-bit, OS = 64-bit
Threading | NPTL 2.15
SELinux | No SELinux detected
Virtualized | No virtualization detected
# Processor ##################################################
Processors | physical = 1, cores = 2, virtual = 2, hyperthreading = no
Speeds | 2x1200.000
Models | 2xIntel(R) Pentium(R) D CPU 3.00GHz
Caches | 2x2048 KB
I have and old table which was not optimized for prolonged period of time, so it reflect natural state of the table as it would be in the production over time:
mysql> show table status like "post_channel" \G
*************************** 1. row ***************************
Create_time: 2012-12-31 16:24:54
1 row in set (0.02 sec)
The table is approximately 30GB in size including indexes which is way more than buffer pool size and OS cache which is available. I run CHECK TABLE on this table which essentially performs data and indexes scan which will be reflective to large full table scan and index scan queries, though this is really not the point.
The point is I have discovered with the buffered IO mode the query was taking almost half the time of time it took it to run in O_DIRECT mode. This can’t be explained by extra cache as 1GB of OS cache compared to 6B of buffer pool means there is very little, if anything to gain from this and so the performance gains are likely to come from the Read Ahead operating system is doing instead. This could be confirmed with number of innodb_data_reads (values over 100 second intervals)
| Innodb_data_reads | 54561 |
| Innodb_data_reads | 38277 |
| Innodb_data_reads | 25895 |
| Innodb_data_reads | 31363 |
| Innodb_data_reads | 170816 |
| Innodb_data_reads | 207432 |
| Innodb_data_reads | 156896 |
| Innodb_data_reads | 19059 |
| Innodb_data_reads | 16805 |
with number of data reads spiking to over 2000 per second – a lot more than this software RAID could do in real IOPs.
I started to experiment with different read_ahead configuration. Percona Server still allows me to enable Random Read_Ahead which is not available in MySQL 5.5 (different version is added back in MySQL 5.6) so I tried that as well as disabling read ahead all together. In both cases for this given workload the performance was best with both read_ahead types enabled but it remained much better in buffered mode. Take a look at the graph:
What this tells me is there is a lot of untapped potential in MySQL IO as I think we should be getting very close numbers for buffered and unbuffered IO for read only workloads, if anything unbuffered performance should be better as it has less overhead with double buffering as well as more information MySQL has about data which will be accessed next – in case of full table scan, index scans etc the information about blocks which will be needed next is known well in advance with very high probability. Prefetch based on something along these lines could be very valuable addition.
MySQL 5.6 has more changes in IO code including what should be much improved read-ahead and I will check its behavior next. Lets see if situation is dramatically different.
The good news though is even though performance advantages of buffered mode can be dramatic for some workloads such cases are rare – most OLTP kind of workloads indeed work better with O_DIRECT mode. The RAID controller also can make significant difference here as O_DIRECT only applies to operating system logic while RAID controller may be doing some form of read-ahead on its, making OS read ahead help unneeded. Well there is a lot to experiment with even in regards to such a minor item – Database Scientists will never have their job complete.