EmergencyEMERGENCY? Get 24/7 Help Now!

Real World Compression


Posted on:

|

By:


PREVIOUS POST
NEXT POST
Share Button

Benchmarking is a tricky thing, especially when it comes to compression. Some data compresses quite well while other data does not compress at all. Storing jpeg images in a BLOB column produces 0% compression, but storing the string “AAAAAAAAAAAAAAAAAAAA” in a VARCHAR(20) column produces extremely high (and unrealistic) compression numbers.

This week I was assisting a TokuDB customer understand the insertion performance of TokuDB versus InnoDB and MyISAM for their actual data. The table contained a single VARCHAR(50), multiple INTEGER, one SET, one DECIMAL, and a surrogate primary key.  To support a varied query workload they needed 6 indexes.

Here is an obfuscated schema of the table:


And the on-disk file sizes after loading the data:

Storage Engine Size (GB)
MyISAM 25.0
InnoDB 26.0
TokuDB 2.2

 

It’s important to note that one of the TokuDB indexes was defined as clustering, meaning that a second full copy of the table is stored.  Clustering indexes are helpful in that they always satisfy a query when used; there is no need to retrieve non-indexed column values from the primary key index. Clustering indexes can make queries significantly faster.  MyISAM and InnoDB do not support clustering secondary indexes.

TokuDB achieved 10.8x compression versus InnoDB.  This is in line with other tests on compression and performance that we have demonstrated vs. InnoDB (see http://www.tokutek.com/2012/04/tokudb-v6-0-even-better-compression).

The benefits of high compression are much more than buying smaller disks (or SSDs). Disk IO (reads and writes) is 10.8x more efficient, plus backups that employ file system snapshots are 10.8x smaller.
­­­
­­­

Share Button
PREVIOUS POST
NEXT POST


Tags:

, , , , , ,

Categories:
Tokutek, TokuView


Comments
  • You started out talking about insertion performance, but then changed the topic to comparing database size after insertion.

    How did the insertion performance compare between the two?

    Reply

    • Tim Callaghan Post author

      Good point. We’ve done quite a bit of benchmarking comparing our data loading performance here, here, and here. In this particular use-case, the schema and workload are very similar to iiBench (but with more secondary indexes) so the insertion performance difference versus InnoDB is even better.

      Reply

  • Did you try InnoDB with row_format=COMPRESSED and different key_block_size settings?

    Reply

    • Tim Callaghan Post author

      I’ve run InnoDB compression tests before. I’m sure I could have achieved 2x compression with key_block_size=8 and possibly 4x with key_block_size=4. Few people use InnoDB compression due to the performance penalty of splits and recompress operations, so I didn’t include it in these tests.

      Reply

Leave a Reply

Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.

Besides specific database help, the blog also provides notices on upcoming events and webinars.

Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below.

No, thank you. Please do not ask me again.