ZFS on Linux and MySQLYves Trudeau
I am currently working with a large customer and I am involved with servers located in two data centers, one with Solaris servers and the other one with Linux servers. The Solaris side is cleverly setup using zones and ZFS and this provides a very low virtualization overhead. I learned quite a lot about these technologies while looking at this, thanks to Corey Mosher.
On the Linux side, we recently deployed a pair on servers for backup purpose, boxes with 64 300GB SAS drives, 3 raid controllers and 192GB of RAM. These servers will run a few slave instances each of production database servers and will perform the backups. The write load is not excessive so a single server can easily handle the write load of all the MySQL instances. The original idea was to configure them with raid-10 + LVM, making sure to stripe the LV when we need to and align the partition correctly.
We got decent tpcc performance, nearly 37k NoTPM using 5.6.11 and xfs. Then, since ZFS on Linux is available and there is in house ZFS knowledge, we decided to reconfigure one of the server and give ZFS a try. So I trashed the raid-10 arrays, configure JBODs and gave all those drives to ZFS (30 mirrors + spares + OS partition mirror) and I limited the ARC size to 4GB. I don’t want to start a war but ZFS performance level was less than half of xfs for the tpcc test and that’s maybe just normal. We didn’t try too hard to get better performance because we already had more than enough for our purpose and some ZFS features are just too useful for backups (most apply also for btrfs). Let’s review them.
ZFS does snapshot, like LVM but… since it is a copy on write filesystem, the snapshots are free, no performance penalty. You can easily run a server with hundreds of snapshots. With LVM, your IO performance drops to 33% after the first snapshot so keeping a large number of snapshots running is simply not an option. With ZFS you can easily have:
- one snapshot per day for the last 30 days
- one snapshot per hour for the last 2 days
- one snapshot per 5min for the last 2 hours
and that will be perfectly fine. Since starting a snapshot take less than a second, you could even be more zealous. Pretty interesting to speed up point in time recovery when you dataset is 700GB. If you google a bit with “zfs snapshot script” you’ll many scripts ready for the task. Snapshots work best with InnoDB, with MyISAM you’ll have to start the snapshot while holding a “flush tables with read lock” and the flush operation will take some time to complete.
ZFS can compress data on the fly and it is surprisingly cheap. In fact the best tpcc results I got were when using compression. I still have to explain this, maybe it is related to better raid controller write cache use. Even the fairly slow gzip-1 mode works well. The tpcc database, which contains a lot of random data that doesn’t compress well showed a compression ration of 1.70 with gzip-1. Real data will compress much more. That gives us much more disk space than we expected so even more snapshots!
With ZFS each record on disk has a checksum. If a cosmic ray flip a bit on a drive, instead of crashing InnoDB, it will be caught by ZFS and the data will be read from the other drive in the mirror.
Better availability and disk usage
On purpose, I allocated mirror pairs using drives from different controllers. That way, if a controller dies, the storage will still be working. Also, instead of having 1 or 2 spare drives per controller, I have 2 for the whole setup. A small but yet interesting saving.
All put together, ZFS on Linux is a very interesting solution for MySQL backup servers. All backup solutions have an impact on performance with ZFS the impact is up front and the backups are almost free.