It has taken a years to get a proper integration between operating system kernel, device driver and hardware to get behavior with caches and IO modes correctly. I remember us having a lot of troubles with fsync() not flushing hard drive write cache and so potential hard drives can be lost on power failure. Happily most of these are resolved now with “real hardware” and I’m pretty confident running Innodb with both default (fsync based) or O_DIRECT innodb_flush_method. Virtualization however adds yet another layer and we need to question again whenever IO really durable in virtualized environments. My simple testing shows this may not always be the case
I’m comparing O_DIRECT and fsync() single page writes to 1MB file using SysBench on Ubuntu, ext4 running on VirtualBox 4.0.4 running on Windows 7 on my desktop computer with pair of 7200 RPM hard drives in RAID1. Because there is no write cache I expect it to do no more than a bit over 100 writes per second as even in case there is no disk seek we need to wait for disk head to make a full round to do a rotation. I’m however getting rather bizarre results:
Using fsync()
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
pz@ubuntu:~/test$ sysbench --num-threads=1 --test=fileio --file-num=1 --file-test-mode=rndwr --file-total-size=1M --max-requests=10000000 --max-time=60 --file-fsync-freq=1 run sysbench 0.4.10: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Extra file open flags: 0 1 files, 1Mb each 1Mb total file size Block size 16Kb Number of random requests for random IO: 10000000 Read/Write ratio for combined random IO test: 1.50 Periodic FSYNC enabled, calling fsync() each 1 requests. Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random write test Threads started! Time limit exceeded, exiting... Done. Operations performed: 0 Read, 1343 Write, 1343 Other = 2686 Total Read 0b Written 20.984Mb Total transferred 20.984Mb (357.62Kb/sec) 22.35 Requests/sec executed Test execution summary: total time: 60.0863s total number of events: 1343 total time taken by event execution: 0.0808 per-request statistics: min: 0.04ms avg: 0.06ms max: 0.34ms approx. 95 percentile: 0.06ms Threads fairness: events (avg/stddev): 1343.0000/0.00 execution time (avg/stddev): 0.0808/0.00 |
Ignore response times here as it times only writes not fsync() calls…. 22 fsync requests per second is pretty bad though I assume It can be realistic with overhead.
Now lest see how it looks using O_DIRECT
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
pz@ubuntu:~/test$ sysbench --num-threads=1 --test=fileio --file-num=1 --file-test-mode=rndwr --file-extra-flags=direct --file-total-size=1M --max-requests=10000000 --max-time=60 run sysbench 0.4.10: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Extra file open flags: 16384 1 files, 1Mb each 1Mb total file size Block size 16Kb Number of random requests for random IO: 10000000 Read/Write ratio for combined random IO test: 1.50 Periodic FSYNC enabled, calling fsync() each 100 requests. Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random write test Threads started! Time limit exceeded, exiting... Done. Operations performed: 0 Read, 33900 Write, 339 Other = 34239 Total Read 0b Written 529.69Mb Total transferred 529.69Mb (8.8278Mb/sec) 564.98 Requests/sec executed Test execution summary: total time: 60.0019s total number of events: 33900 total time taken by event execution: 37.5364 per-request statistics: min: 0.10ms avg: 1.11ms max: 259.69ms approx. 95 percentile: 5.31ms Threads fairness: events (avg/stddev): 33900.0000/0.00 execution time (avg/stddev): 37.5364/0.00 |
I would expect rather similar results to the test with fsync() while we’re getting numbers 20 times better… and surely too good to be true. Meaning I can be sure the system is lying about write completion if we’re using O_DIRECT IO
What is my take away on this ? I did not have a time to research whenever the problem is related to VirtualBox or some configuration issue. Things may be working correctly in your case. The point is Virtualization adds complexity and there are at least some cases when you may be lied to about IO completion, so if you’re relying on system to be able to recover from power failure or VM crash make sure to test it carefully.