Backing up MySQL Database most people compress them – which can make a good sense in terms of backup and recovery speed as well as space needed or be a serious bottleneck depending on circumstances and approach used.
First I should mention this question mainly arises for medium and large size databases – for databases below 100GB in size compression performance is usually not the problem (though backup impact on server performance may well be).
We also assume backup is done on physical level here (cold backup, slave backup, innodb hot backup or snapshot backup) as this is only way practical at this point for databases of decent size.
Two important compression questions you need to decide for backup is where to do compression (on the source or target server if you backup over network) and which compression software to use.
Compression on source server is most typical approach and it is great, though it takes extra CPU resources on the source server in additional to IO resources which may not be available, especially for CPU bound MySQL Load. The benefit in this case is less space requirement if you’re keeping the local copy as well as less network bandwidth requirements in case you’re backing up to network storage.
Compression on the destination server offloads source server (though it may run our of CPU itself, if it is target for multiple backups, plus there are higher network bandwidth requirements to transfer uncompressed backup.
What is about compression tool ? The classical tool used for backup compression is gzip – it exists almost everywhere, it is stable and relatively fast.
In many cases however it is not fast enough and becomes the bottleneck for all the backup process.
Recently I did a little benchmark compressing 1GB binlog file with GZIP (compression done from OS cache and redirected to /dev/null so we only measure compression speed). On the test box with Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz CPU. GZIP would compress this file in
48 seconds (with default options) resulting in 260MB compressed file. This gives us compression speed of about 21MB/sec – clearly much less than even single SATA hard drive can read sequentially. This file when will take about 10 seconds to decompress, meaning source file will be read at 26MB/sec to do decompression – this is again much less than hard drive sequential read performance, though the fact this gives us about 100MB/sec of uncompressed data writing is more of the issue.
Such performance also means if your goal is faster local network transfer default GZIP compression will not speed things up on the standard point to point 1Gbit network connection.
If we try gzip -1 to get fastest compression we get the same file compressed to 320MB in 27 seconds. This gives us 37MB/sec which is a lot better but still not quite enough. Also note the serious leap in compressed file size. Though in this example we used MySQL binary log file which often contains plenty of similar events, which could be the reason for so large size difference based on compression ratio. The decompression takes about same 10 seconds which gives about 32MB/sec of archive read speed and same 100MB/sec of uncompressed data.
Do we have any faster alternatives to GZIP ? There are actually quite a few but I like LZO which I was playing with since later 1990’s and which is rather active project. There is also GZIP like command like compressor using LZO library called LZOP which makes it easy drop in replacement.
I got LZOP binary which was built against LZO 1.0, more resent version 2.0 promises further performance improvements especially on 64bit systems.
With LZO default compression file compressed in 10.5 seconds and resulted in 390MB compressed file, this gives us 97MB/sec compression speed which is good enough to compress all data you can read from single drive. The file decompresses in 3.7 seconds which gives 105MB/sec read speed from archive media and 276MB/sec write speed to the hard drive – this means restoring from backup compressed with LZO will often be as fast or faster as from not compressed one.
With LZO there is also “-1″ option for even faster compression which had rather interesting results. The file compressed in 10.0 seconds (102MB/sec) and was 385MB in size – so this lower compression rate actually compressed this a bit better while being about 5% faster. The decompression speed was about the same. I’m sure the results may change based on the data being compressed but it looks like LZO uses relatively fast compression by default already.
With real server grade CPU deployment the performance should be even better, meaning you should get over +-100MB/second you can pass through 1Gbit ethernet, meaning you actually can use LZO compression for faster data transfer between the boxes (ie together with netcat)
Now as in my benchmarks there is also overhead of reading (from file cache) and piping to the /dev/null which are constant the true difference in compression speed is even larger, though as most of backup operations will need reading and writing anyway they come with this static overhead naturally added.
UPDATE: It looks like people are wondering how BZIP2 compares so I should check it before I delete this particular file. BZIP compression for this file took 298 seconds which is just 3.4 MB/sec though compressed file was just 174MB in size. Decompression took 78 sec which means compressed data was read at 2.2MB/sec and result was generated with 13 MB/sec.
For all archivers it is possible to use parallel compression to get better speed though this also means a higher load which can be the issue if you’re not using dedicated server for backups.
I should also note for mysqldump backup typically tools with better and slower compression make sense because it takes longer to dump and much longer to load to the database anyway so overral compression impact is less than for physical level backup.