During preparation of Percona-XtraDB template to run in RightScale environment, I noticed that IO performance on EBS volume in EC2 cloud is not quite perfect. So I have spent some time benchmarking volumes. Interesting part with EBS volumes is that you see it as device in your OS, so you can easily make software RAID from several volumes.
So I created 4 volumes ( I used m.large instance), and made:
RAID0 on 2 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 2 -l 0 /dev/sdj /dev/sdk
RAID0 on 4 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 4 -l 0 /dev/sdj /dev/sdk /dev/sdl /dev/sdm
RAID5 on 3 volumes as:
mdadm -C /dev/md0 --chunk=256 -n 3 -l 5 /dev/sdj /dev/sdk /dev/sdl
RAID10 on 4 volumes in two steps:
mdadm -v --create /dev/md0 --chunk=256 --level=raid1 --raid-devices=2 /dev/sdj /dev/sdk
mdadm -v --create /dev/md1 --chunk=256 --level=raid1 --raid-devices=2 /dev/sdm /dev/sdl
mdadm -v --create /dev/md2 --chunk=256 --level=raid0 --raid-devices=2 /dev/md0 /dev/md1
And also in Linux you can create tricky RAID10,f2 (you can read what is this here http://www.mythtv.org/wiki/RAID)
mdadm -C /dev/md0 --chunk=256 -n 4 -l 10 -p f2 /dev/sdj /dev/sdk /dev/sdk /dev/sdm
and also I tested IO on single volume.
I used xfs filesystem mounted with noatime,nobarrier options
and for benchmark I used sysbench fileio modes on 16GB file with next script:
for size in 256M 16G; do
for mode in seqwr seqrd rndrd rndwr rndrw; do
./sysbench --test=fileio --file-num=1 --file-total-size=$size prepare
for threads in 1 4 8 16; do
echo PARAMS $size $mode $threads > sysbench-size-$size-mode-$mode-threads-$threads
./sysbench --test=fileio --file-total-size=$size --file-test-mode=$mode\
--max-time=60 --max-requests=10000000 --num-threads=$threads --init-rng=on \
--file-num=1 --file-extra-flags=direct --file-fsync-freq=0 run \
>> sysbench-size-$size-mode-$mode-threads-$threads 2>&1
./sysbench --test=fileio --file-total-size=$size cleanup
So tested modes: seqrd (sequential read), seqwr (sequential write), rndrd (random read), rndwr (random write), rndrw (random read-write). And sysbench uses 16KB pagesize to emulate work of InnoDB with 16KB pagesize.
Raw results you may find in Google Docs https://spreadsheets.google.com/ccc?key=0AjsVX7AnrCYwdFlBVW9KWVJGUGFqeVdpUHY0Y0VXYXc&hl=en
, but let me show most interesting results from my point of view. On graphs I show requests / second (more is better) and response time in ms for 95% cases (less is better).
What I see from the results is that if you are looking for IO performance in EC2/EBS environment it’s definitely worth to consider some RAID setup.
RAID5 does not show benefits comparing with others, and RAID10,f2 is worse than RAID10.
But speaking RAID0 vs RAID10 it’s your call. For sure in regular server I’d never suggest RAID0 for database, but speaking about EBS I am not sure what guarantee Amazon gives here. I’d expect under EBS volume there already exists redundant array, and it may not worth to add additional redundancy, but I am not sure in that.
For now I’d consider RAID10 on 4 – 10 volumes.
And of course to get benefit from multi-threading IO in MySQL you need to use XtraDB or MySQL 5.4 ®
However there may be small problem with backup over EBS. On single EBS volume you can just do snapshot, but on several volumes it may be tricky. But in this case you may consider LVM snapshots or XtraBackup