Playing with last version of xtrabackup and compress it I noticed that gzip is unacceptable slow for both compression and decompression operations. Actually Peter wrote about it some time ago, but I wanted to review that data having some new information. In current multi-core word the compression utility should utilize several CPU to speedup operation, and another my requirement was the ability to work with stdin / stdout, so I could do scripting something like: innobackupex –stream | compressor | network_copy.
My research gave me next list: pigz (parallel gzip), pbzip2 (parallel bzip2), qpress ( command line utility for QuickLZ) and I wanted to try LZO (as lzop 1.03 command line + LZO 2 libraries). Actually lzop does not support parallel operations, but it is know to have good decompression speed even with 1 thread. UPDATE 17-Mar-2009: I added lzma results also by request from comments.
For compression test I took ~12GB of InnoDB data files generated by tpcc benchmark with 100 warehouses.
I tested 1, 2, 4 parallel threads for tools that support it and different level of compression ( 1,2,3 for qpress; -1 and -5 for other tools)
The raw results are available here http://spreadsheets.google.com/ccc?key=pOIo5aX59b6biPZ0QTVMXHg&hl=en, and I copy table in place in case if Google stops to work.
To summarize results:
- pbzip2 obviously show good compression, but the speed of processing is too slow. What is interesting on Level 5 the compression is worse than in pigz Level 5
- pigz is good for compression and faster than pbzip2 but still not so fast; however multi-threaded processing may be OK, especially if you need to keep compatibility, e.g. copy result on boxes where only standard gzip available
- qpress is not so good in compression ration, but speed is impressive, and maybe we will ship xtrabackup with this compression
- LZO is even faster in decompression than qpress, but I would like to see parallel version. There is the patch for it, but it did not apply clean to lzop 1.02, so I skipped it
- In my opinion in all cases Level 1 of compression shows better tradeoff between size of archive and compression/decompression time
There is no obvious winner, it depends on what is more important for you – size or time, but having this data we can make decision.