October 23, 2014

Adventures in archiving

Archive can solve disk storage issues - but you must always consider the caveats and bugs related to a storage engine before going ahead and using it.

Archive can solve disk storage issues – but you must always consider the caveats and bugs related to a storage engine before going ahead and using it.

One of our Remote DBA service clients recently had an issue with size on disk for a particular table; in short this table was some 25 million rows of application audit data with an on disk size of 345GB recorded solely for the purposes of debugging which may or may not occur.

Faced with the task of both reducing the size on disk and ensuring that the data is always available at all times (firmly ruling out off device archive), we began to explore the available options for the client.

We settled on the following options for further testing.

  1. InnoDB’s inbuilt row compression
  2. MySQL’s inbuilt compress() function leveraging zlib to provide compression of data.
  3. MySQL’s Archive storage engine.

Tooling

The tooling phase consisted of producing scripts to generate representative payloads for comparison, in this case this would be some IMDB XML data sliced to sit within the VARCHAR(1000) fields per client requirements.

First off the schema tables.

As can be seen we have the following:

  • archive_table (The only change here is the Engine)
  • compressed_data (The change here being VARCHAR(1000) -> BLOB for the intentions of using compress() )
  • compressed_row (The row format being set to compressed, ensure you have innodb_file_format Barracuda as Antelope does not support compression)
  • non_compressed_data (Our test control)

Now we needed to generate the ‘payload’ data for this I quickly authored a bash script for the task to provide an approximate 1M rows per table.

Scripts are available at GitHub

The script adventures_in_archiving_load_schema.sh will first drop and create the schema adventures_in_archiving.

It will then attempt to regenerate the payload file adventures_in_archiving_data.sql (4GB) if it does not exist in the current folder.

Subsequently loading the payload into the adventures_in_archiving schema, this is by no means a fast process however, taking some 4 minutes for data generation and a subsequent 63 minutes for import.

Results

Size on diskEngineCompression Ratio
7.1MArchive173.07:1
304MInnoDB row_format=compressed key_block_size=44.04:1
648Mcompress()1.897:1
1.2GInnoDB1:1

Immediately Archive seemed like the obvious choice there are however bugs to be taken into consideration..

Due diligence

What follows is a list reduced to the most likely to affect production deployments.

  • Use of Auto Increment is not recommended. #37182  #37871 #40216
  • If disk level is already critical and the originating table is MyISAM there may be data loss if there is not enough space to complete the operation.  #37648
  • Use of INSERT DELAYED is not recommended. #48339
  • Use of statement based replication is not recommended. #48343
  • Flush buffers before attempting a table checksum. #57188
  • Repair tools will not function on Archive tables. #62054

Summary

Archive in this instance is the solution to the issue of disk storage however you must always take into account the caveats and bugs related to a storage engine before going ahead and using it. Also you need to be aware that Archive does not have good performance.

e.g.

Also you must flush the buffer for example by running a select on the table before inserts are written to disk.

Followup

A few alternative suggestions have been made, as such there will be a follow up post taking a quick look at the size on disk for TokuDB and Infobright, soon.

Comments

  1. Steve Jackson says:

    With compression ratios like 173,000 : 1, I would say that this logging table would benefit from simple normalization?

  2. @Steve there’s not much to normalize in this example; short of parsing the xml into relevant fields discarding data that’s not required, which I agree would be a far more efficient model of storage if at all possible.

    Whilst not the case in this example, I have in the past seen it where the data has to be stored in an unmodified format, and retained for several years as part of a legally binding agreement.

    If you can normalize the data coming in, you may even see better storage efficiency; if you can not then the approach outlined in the post may be an option.

  3. Great article.

    Don’t forget to mention the drawback about Archive that it’s not transactional.

    What about ZFS compression (for example on FreeBSD) or other filesystem compression options?

  4. Daniël, ZFS is also coming to Linux. For example, it’s available for Ubuntu.

    http://zfsonlinux.org/faq.html

  5. mike morse says:

    We use Infobright (free version), more or less true to the 10:1 compression ratio it claims, and great performance on aggregated type queries (sum, count, etc..) due to it’s columnar nature. The main drawback is the limits they place on the free version, namely no update, delete, insert, etc… basically you can create, load (data infile), and drop. Also some considerable changes in transforming schemas from regular innodb to brighthouse storage engine, so in the case of a daily db reload, you’ll need to do some work in writing a usable ETL script. For situations like transaction logging where there are no update/deletes, you can simply do a daily load of the new data.

  6. Steve Jackson says:

    @David. Regardless, there is clearly a hell of a lot of redundancy in the data if ARCHIVE can achieve such high compression ratios. I would guess only a couple of bytes difference between docs – Thus, I suspect that the docs are too big to get enough of them into a page to allow good innodb compression. With such obviously high similarity, I would recommend to the client that they store diffs instead of the actual documents. They can store a base document, and then many diffs. If many of the differences are the same, or no difference at all, then they may compress very well, as they are much smaller and you will get more of them on a page… just a thought… without seeing some example data, I am only speculating.

  7. David, great write up. I’ve run the test on TokuDB v6.6.4 using each of the available compression types, results were as follows:

    lzma : size on disk = 10.06M, compression ratio = 118.85:1
    zlib : size on disk = 20.06M, compression ratio = 59.61:1
    quicklz : size on disk = 36.06M, compression ratio = 33.16:1

  8. @SteveJackson. I think you meant to say 173.07:1, not 173,000:1.

  9. Steve Jackson says:

    @Tim. Depends what country you’re from.

    Do we mean “one hundred and seventy three thousand and 7 to one”? or “173 point 07 to 1″?

    I am so used to use . now for thousands, living in scandinavia for too long.

    Anyway… the 173 : 1 is still a very high compression ratio, and stick by my previous comment

    I personally prefer the ease of access to the data which InnoDB compression allows, but then you have to do a bit of planning that results in similar data being clustered to allow for optimal page compression ratios, i.e write your app that way. I understand that its not always that simple…

  10. @Steve. Good point about the commas and periods, I hadn’t thought of that.

    TokuDB gives you the “ease of access” of InnoDB with extremely high compression ratios and more (full disclosure, I’m the VP of Engineering at Tokutek).

  11. Patrick Domack says:

    I struggled with this for awhile myself, archiving syslog into mysql. Archive wasnt a solution cause of indexing.
    Daily packing myisam tables worked but not very good compression, and alittle sloppy.
    When i redid this system, i put it on solaris and used gzip filesystem compression on it, getting a good 10:1 ratio currently. I dont notice any spead isssues doing searchs

  12. eRadical says:

    I’m doing something a little complicated in an in-house analytics system… I bet for an audit system might work as well.

    ETL process writes in InnoDB hourly tables (table_YYYY_MM_DD_HH).
    A daily process moves hourly tables into daily ones (table_YYYY_MM_DD).
    A monthly process move daily ones into montly ones (table_YYYY_MM). After this it alters the table to ARCHIVE.

    Of course disks where this is stored are slower and slower as we approach the archive phase.

    In an audit system where I helped we did a similar thing but only using the daily table (table_YYYY_MM_DD)… tables older than 5 days would be altered by a cronjob to ARCHIVE. They only kept it for 120 days so it was acceptable.

  13. Holger Haag says:

    Just been playing around with innodb compression and KEY_BLOCK_SIZE. One should really try *all* permitted values.

    Most tables I’ve tested look like this:

    unompcressed -> 11M
    KEY_BLOCK_SIZE=1 -> 2M
    KEY_BLOCK_SIZE=2 -> 3M
    KEY_BLOCK_SIZE=4 -> 3M
    KEY_BLOCK_SIZE=8 -> 5M
    KEY_BLOCK_SIZE=16 -> 10M

    so compression level goes down with higher values of KEY_BLOCK_SIZE

    but one specific table shows the following compression characteristics:

    uncompressed -> 404M
    KEY_BLOCK_SIZE=1 -> ERROR 1118 (42000): Row size too large (> 8126)
    KEY_BLOCK_SIZE=2 -> 284M
    KEY_BLOCK_SIZE=4 -> 308M
    KEY_BLOCK_SIZE=8 -> 232M
    KEY_BLOCK_SIZE=16 -> 364M

    here the optimum value is 8 with 4 and 16 providing worse compression

    Here’s a simple way of probing the different values (Note: if alter table fails due to ERROR 1118 (42000): Row size too large (> 8126), the default vaule of KEY_BLOCK_SIZE=will be applied, which is 8 on my system).

    #drop table if exists $table_dynamic;
    create table $table_dynamic like $table;
    alter table $table_dynamic row_format=dynamic;
    insert into $table_dynamic select * from $table limit 100000;

    #drop table if exists $table_01;
    create table $table_01 like $table;
    alter table $table_01 row_format=compressed key_block_size=1;
    insert into $table_01 select * from $table limit 100000;

    #drop table if exists $table_02;
    create table $table_02 like $table;
    alter table $table_02 row_format=compressed key_block_size=2;
    insert into $table_02 select * from $table limit 100000;

    #drop table if exists $table_04;
    create table $table_04 like $table;
    alter table $table_04 row_format=compressed key_block_size=4;
    insert into $table_04 select * from $table limit 100000;

    #drop table if exists $table_08;
    create table $table_08 like $table;
    alter table $table_08 row_format=compressed key_block_size=8;
    insert into $table_08 select * from $table limit 100000;

    #drop table if exists $table_16;
    create table $table_16 like $table;
    alter table $table_16 row_format=compressed key_block_size=16;
    insert into $table_16 select * from $table limit 100000;

  14. @Holger, the downside to going with smaller key_block_size values is a significant performance hit in OLTP applications, as the compression misses, splits, and re-compresses get very expensive.

  15. Holger Haag says:

    @Ted – that’s right, of course. But checking file sizes was a quick thing to do and an end-to-end benchmark (application level) is a little bit more work. But we’re preparing that and will also try to include TokuDB :)

  16. InnoDB’s row compression also uses zlib, Archive storage engine
    also makes use of it and so does inbuilt (as pointed in the
    post).

    The difference in compression ratios may be due to different
    reasons – buffer size, different algorithm being used (zlib takes
    different options here, for InnoDB and others it is not optional
    but hardcoded), compression level and so on.

Speak Your Mind

*