In many cases I speculate how things should work based on what they do and in number of cases this lead me forming too good impression about technology and when running in completely unanticipated bug or performance bottleneck. This is exactly the case with LVM
Number of customers have reported the LVM gives very high penalty when snapshots are enabled (leave along if you try to run backup at this time) and so I decided to look into it.
I used sysbench fileio test as our concern is general IO performance in this case – it is not something MySQL related.
I tested things on RHEL5, RAID10 volume with 6 hard drives (BBU disabled) though the problem can be seen on variety of other systems too (I just do not have all comparable numbers)
/tmp/sysbench --test=fileio --num-threads=1 --init-rng=on --max-time=60 --file-num=1 --file-total-size=8G --file-extra-flags=direct --file-test-mode=rndwr run
The performance without LVM snapshot was 159 io/sec which is quite expected for single thread and no BBU. With LVM snapshot enabled the performance was 25 io/sec which is about 6 times lower !
I honestly do not understand what LVM could be doing to make things such slow – the COW should require 1 read and 2 writes or may be 3 writes (if we assume meta data updated each time) but how ever it could reach 6 times ?
It looks like it is the time to dig further into LVM internals and well… may be I’m missing something here – I do not have the good insight on what is really happening inside, just how it looks from the user.
Interesting enough VMSTAT confirms there should 1 read and 2 writes theory:
0 1 0 24132256 73252 8248788 0 0 259 590 1271 531 0 0 92 8 0
0 1 0 24135976 73284 8244964 0 0 413 938 1427 761 0 0 87 12 0
0 1 0 24139572 73308 8241300 0 0 399 905 1412 736 0 0 87 12 0
0 1 0 24143416 73352 8237396 0 0 409 927 1416 739 0 0 87 12 0
As you can see there are about twice as many writes as reads.
SMALL FILE RUN
When I decided to check how things improve in case writes come over and over again in the same place – my assumption in this case would be to have overhead gradually going to zero as all pages become copied and so writes can just proceed normally.
/tmp/sysbench --test=fileio --num-threads=1 --init-rng=on --max-time=60 --file-num=1 --file-total-size=64M --file-extra-flags=direct --file-test-mode=rndwr run
With this run I got approximately 200 ios/sec without LVM snapshot enabled while with snapshot I got:
33.20 Requests/sec executed
46.08 Requests/sec executed
70.79 Requests/sec executed
123.68 Requests/sec executed
157.66 Requests/sec executed
163.50 Requests/sec executed
(All were 60 second runs)
As you see the performance indeed improves though there is still significant overhead remains. The progress is much slower than I would anticipate it. Before last run there were about 400MB totally written to the file (random writes) which is 6x of the file size and yet still we saw some 20% regression compared to run with no snapshot.
NO O_DIRECT RUNS
As you might know O_DIRECT often executes quite special path in Linux kernel so I did couple of other runs. First run syncing after each request instead of O_DIRECT
/tmp/sysbench --test=fileio --num-threads=1 --init-rng=on --max-time=60 --file-num=1 --file-total-size=8G --file-fsync-freq=1 --file-test-mode=rndwr run
This run gave 162 io/sec without snapshot and 32 io/sec with snapshot. The numbers are a bit better than with O_DIRECT but the gap is still astonishing.
The final run I did is emulating how Innodb would do buffer pool flushes – calling fsync every 100 writes rather than after each request:
/tmp/sysbench --test=fileio --num-threads=1 --init-rng=on --max-time=60 --file-num=1 --file-total-size=8G --max-requests=100000000 --file-fsync-freq=100 --file-test-mode=rndwr run
This gets some 740 req/sec without snapshot and 240 req/sec with snapshot. In this case we get close to expected 3x difference.
The numbers are much higher in this case because even though we have one thread OS is able to submit multiple requests at the same time (and drives can execute them) – I expect if there would be BBU in this system we would see similar results for other runs.
So Creating LVM snapshot indeed could cause tremendous overhead – in the benchmarks I’ve done it is ranging from 3x to 6x. It is however worth to note it is worse case scenario – many workloads have writes going to the same locations over and over again (ie innodb circular logs) – in this case the overhead will be quickly reduce. Though still it takes some time and I would expect any system doing writes to experience the “performance shock” when LVM snapshot is created with greatly reduced capacity which when will improve as smaller number of pages actually need to be copied on writes.
Because of this behavior you may consider not starting backups instantly after LVM snapshot is creating but allowing it to settle a bit before further overhead with data copying is added.
The question I had is how could LVM backups work for so many users ? The reality is – for many applications write speed is not so critical so they can sustain this performance drop, in particular during some slow times (which often have 2x-3x lower performance requirements)
So we’ll do some research around LVM and I hope to do more benchmarks – for example I’m very curios how good is read speed from snapshot (in particular sequential file reads).
If you’ve done some LVM performance tests yourself or will be repeating mine (parameters posted) please let me know.