Compression for InnoDB backup

Playing with last version of xtrabackup and compress it I noticed that gzip is unacceptable slow for both compression and decompression operations. Actually Peter wrote about it some time ago, but I wanted to review that data having some new information. In current multi-core word the compression utility should utilize several CPU to speedup operation, and another my requirement was the ability to work with stdin / stdout, so I could do scripting something like: innobackupex –stream | compressor | network_copy.

My research gave me next list: pigz (parallel gzip), pbzip2 (parallel bzip2), qpress ( command line utility for QuickLZ) and I wanted to try LZO (as lzop 1.03 command line + LZO 2 libraries). Actually lzop does not support parallel operations, but it is know to have good decompression speed even with 1 thread. UPDATE 17-Mar-2009: I added lzma results also by request from comments.


For compression test I took ~12GB of InnoDB data files generated by tpcc benchmark with 100 warehouses.

I tested 1, 2, 4 parallel threads for tools that support it and different level of compression ( 1,2,3 for qpress; -1 and -5 for other tools)

The raw results are available here http://spreadsheets.google.com/ccc?key=pOIo5aX59b6biPZ0QTVMXHg&hl=en, and I copy table in place in case if Google stops to work.

threads

level

compressed size

compress ratio

comression time, sec

compr speed, MB/s

decomp time, sec

decomp speed, MB/s
qpress

1

1

6,058.93

0.52

109

55.59

92

65.86
1

2

5,892.62

0.51

201

29.32

123

47.91
1

3

5,885.01

0.51

473

12.44

84

70.06
2

1

6,058.93

0.52

65

93.21

66

91.80
2

2

5,892.62

0.51

110

53.57

112

52.61
2

3

5,885.01

0.51

245

24.02

84

70.06
4

1

6,058.93

0.52

48

126.23

66

91.80
4

2

5,892.62

0.51

64

92.07

68

86.66
4

3

5,885.01

0.51

130

45.27

65

90.54
pigz

1

1

4,839.97

0.42

438

11.05

129

37.52
1

5

3,460.31

0.30

763

4.54

121

28.60
2

1

4,839.97

0.42

213

22.72

109

44.40
2

5

3,460.31

0.30

379

9.13

104

33.27
4

1

4,839.97

0.42

107

45.23

112

43.21
4

5

3,460.31

0.30

190

18.21

103

33.60
LZOP

1

1

5,831.25

0.50

184

31.69

83

70.26
1

5

5,850.16

0.50

179

32.68

87

67.24
pbzip2

1

1

4,154.41

0.36

1594

2.61

597

6.96
1

5

4,007.07

0.34

1702

2.35

644

6.22
2

1

4,154.41

0.36

800

5.19

605

6.87
2

5

4,007.07

0.34

844

4.75

648

6.18
4

1

4,154.41

0.36

399

10.41

602

6.90
4

5

4,007.07

0.34

421

9.52

645

6.21
LZMA

1

1

3,623.66

0.31

1454

2.49

501

7.23
1

5

NA

NA

not done in 2h

NA

NA

NA

To summarize results:

  • pbzip2 obviously show good compression, but the speed of processing is too slow. What is interesting on Level 5 the compression is worse than in pigz Level 5
  • pigz is good for compression and faster than pbzip2 but still not so fast; however multi-threaded processing may be OK, especially if you need to keep compatibility, e.g. copy result on boxes where only standard gzip available
  • qpress is not so good in compression ration, but speed is impressive, and maybe we will ship xtrabackup with this compression
  • LZO is even faster in decompression than qpress, but I would like to see parallel version. There is the patch for it, but it did not apply clean to lzop 1.02, so I skipped it
  • In my opinion in all cases Level 1 of compression shows better tradeoff between size of archive and compression/decompression time

There is no obvious winner, it depends on what is more important for you – size or time, but having this data we can make decision.

Share this post

Comments (18)

  • morgan

    The interesting thing is that decompression doesn’t seem to get the same speed boosts from added threads that compression does. I’ve always thought that decompression should be faster than compression, but in almost all of your 4 threaded tests that’s not the case.

    March 16, 2009 at 5:05 pm
  • Chip Turner

    Very nice comparison of parallel compression choices. This is a fun kind of analysis to perform.

    Ultimately the goal is probably to get the data off of the database as quickly as possible. It would be interesting to see compression_time + ultimate_size / network_speed to get the total time to actually get the data off of the machine and thereby have a completed backup. I imagine qpress 4.1 would still be optimal. Also worth factoring in is what rate xtrabackup can provide data to the algorithm; so long as the algorithm is faster than xtrabackup, you can decide strictly on space, right?

    Of course, sometimes you want to optimize for network data copied (preferring higher compression ratios) or less impact to the machine you are backing up (preferring fewer parallel cores, or more throughput when running nice’d).

    Are you just deciding a default/recommended algorithm for a pluggable system or will what you decide be the only option?

    March 16, 2009 at 8:09 pm
  • Geoffrey Lee

    I think it would be interesting to include p7zip in your benchmark as well. 7-Zip has been well known for its multi-threading support. http://en.wikipedia.org/wiki/Lempel-Ziv-Markov_chain_algorithm#7-Zip_reference_implementation

    March 16, 2009 at 8:56 pm
  • Vadim

    Chip,

    We actually propose stream which can be compressed any tool you want, that’s why my requirement was accept stdin and output to stdout.

    March 16, 2009 at 9:44 pm
  • Vadim

    Geoffrey Lee,

    I need tool to accept stdin and output to stdout, see comment above. I was not able to redirect a pipe to p7zip, i.e. cat files.tar | p7zip > files.tar.7zip

    March 16, 2009 at 9:46 pm
  • slavik

    Vadim, use lzma instead p7zip
    (http://tukaani.org/lzma/)

    March 16, 2009 at 11:57 pm
  • Mark

    Vadim,

    I use p7zip in some of our production machines where size matters, plus it can encrypt the file at almost no performance hit. On an 8-core machine, it can be quite fast as it can use all of the cores at once (make sure to either ‘nice’ it or use it off-peak because it can really slow things down!). It can be used to read from stdin using the “-si” option:

    cat files.tar | 7z a [various options] -si files.tar.7z

    I’ve found that using a compression level of 3 gives a good balance between compressed size and time spent compressing, or you can set it to 1 and it will be even faster. For a mulit-gigabyte database backup (mysqldump file), it can reduce the file to approximately 70% the size of a gzip’ed in about the same amount of time.

    March 17, 2009 at 12:50 am
  • Baron Schwartz

    Hmmm, does it have almost no performance hit, or does it really slow things down? I’m confused.

    March 17, 2009 at 5:58 am
  • Dennis Birkholz

    I think Mark meant that 7z slows the machine so extremly “really” down that additional encryption does not hit the performance any more 😉

    I would like to see a lzma benchmark, to but i am not sure if it supports multicore procession.

    March 17, 2009 at 8:25 am
  • slavik

    Baron,
    Mark meant that using aes encryption does not affect time spent to compression.
    This is typical for modern processors, usually because sheduler can’t always use all cores power (cache miss, io bottlenecks, kernel tasks and so on) so small number of cpu resources (but enough for encryption) is always avaiable.

    March 17, 2009 at 9:16 am
  • Vadim

    slavik,

    I added results for LZMA. with compression level 5 it was not able to finish in 2h, so I stopped that.

    March 17, 2009 at 11:34 am
  • Mark

    That will teach me to post at 1 in the morning!

    Yes, I meant that 7zip does take more resources than gzip, but adding encryption doesn’t add any *more*. As for the performance hit, the machines I use 7zip on have distinct busy and non-busy times, so I can schedule a 25-minute, 7zipped backup during a non-busy time fairly easily. I realize this isn’t necessarily normal for most servers though, for the rest of our off-line backups we use gzip. After looking at these results I’m thinking about lzo instead, especially since it’s less of a hit on a busy machine. Unfortunately, I see that version 2 doesn’t have a nice, gzip-like executable. Since we use 64-bit machines almost exclusively, and v2 promises better performance on 64-bit, does anyone have a link to a command-line archiver that can use lzo v2?

    March 17, 2009 at 4:46 pm
  • Vadim

    Mark,

    for me on Ubuntu 8.10 where I did test – lzop comes linked with LZO v2 libraries.

    So even your distributive has lzop with LZO v1 you probably can compile it linked to v2.

    March 17, 2009 at 10:31 pm
  • Steven Roussey

    lzjb

    Reminds me of discussions in this post:

    ZFS & MySQL/InnoDB Compression Update
    http://blogs.smugmug.com/don/2008/10/13/zfs-mysqlinnodb-compression-update/

    March 18, 2009 at 9:43 pm
  • slavik

    Vadim,
    too ugly results for lzma, can u post system spec?
    I test compress/decompress on windows 7zip, quad core amd 9500 with 8gb ram, so I have decompress speed 20 mb/s (too close for regular old 160gb PATA drive speed) and compression (fast mode) speed 8mb/s.
    I will try later on a similar system under Linux, and post results.

    March 18, 2009 at 10:24 pm
  • Vadim

    slavik,

    it is Dell PowerEdge R900, 4x quadcores

    vendor_id : GenuineIntel
    cpu family : 6
    model : 15
    model name : Intel(R) Xeon(R) CPU E7320 @ 2.13GHz
    stepping : 11
    cpu MHz : 2127.881
    cache size : 2048 KB

    with 32GB of RAM

    March 19, 2009 at 9:33 am
  • slavik

    I found that LZMA can’t scale, with -1 it can use only 1 thread, with -5 (or bigger) only 2 worker threads.
    “Sets multithread mode. If you have a multiprocessor or multicore system, you can get a increase with this switch. 7-Zip supports multithread mode only for LZMA compression and BZip2 compression / decompression. If you specify {N}, for example mt=4, 7-Zip tries to use 4 threads. LZMA compression uses only 2 threads.” http://www.bugaco.com/7zip/MANUAL/switches/method.htm
    In my tests on amd 9950 with 2gig of ram: 4 mb/s compression, and about 8 mb/s decompression.
    I think it’s results of terrible optimization of unix port

    March 21, 2009 at 12:27 am
  • Snarky

    The decompression speed computation is deceiving. You should divide by the size of the uncompressed data not the compressed one because the better the compression the worse the decompression speed will look when it is not the case.

    June 23, 2009 at 12:38 pm

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.