For the same customer I am exploring ZFS for backups, the twin server is using regular LVM and XFS. On this twin, I have setup mylvmbackup for a more conservative backup approach. I quickly found some odd behaviors, the backup was taking much longer than what I was expecting. It is not the first time I saw that, but here it was obvious. So I recorded some metrics, bi from vmstat and percent of cow space used from lvs during a backup. Cow space is the Copy On Write buffer used by LVM to record the modified pages like they were at the beginning of the snapshot. Upon reads, LVM must scan the list to verify that there’s no newer version. Here’s the other details about the backup:
As you can see, the processing of the COW space has a huge impact on the read performance. For this database the backup time was 11h but if I stop the slave and let it calm down for 10 min. so that the insert buffer is cleared, the backup time is a bit less than 3h and could probably be less if I use a faster compressor since the bottleneck is now the CPU overhead of pbzip2, all cores at 100%.
So, for large filesystems, if you plan to use LVM snapshots, have in mind that read performance will degrade with COW space used and it might be a good idea to reduce the number of writes during the backup. You could also compress the backup in a second stage if you have the storage capacity.
UPDATE: I ran a comparison on the twin server which runs ZFS and I have been able to pull data from the snapshot at about 120MB/s. The reason for this, I believe, is that LVM works at the block level and has no knowledge of files while ZFS is a the file level and is able to perform read-ahead.