Trying Archive Storage Engine

Today I noticed one of server used for web request profiling stats logging is taking about 2GB per day for logs, which are written in MyISAM table without indexes. So I thought it is great to try how much archive storage engine could help me in this case.

Original Table:

Lets see what table compression gives us:

Surprise. It would not tell you how much space table takes. Quite unfortunate especially as it should not be hard to do – There are no indexes and information about size on disk and compression buffer stage in memory should be readily available. So we have to go and check size of on disk file:

-rw-rw—- 1 mysql mysql 19 Nov 12 11:31 requests_061111.ARM
-rw-rw—- 1 mysql mysql 984628803 Nov 12 11:29 requests_061111.ARZ
-rw-rw—- 1 mysql mysql 9162 Nov 12 11:00 requests_061111.frm

Can’t say I’m impressed we have only about 2 times compression rate for this table which is heavily redundant (urls, referers, user agents etc)

To check if this is fair compression rate I also run gzip requests_061111.MYD and I got file which as about 520MB in size – almost half the size of archive table. I guess this is due to incremental compression which Archive engine uses, still it should have done much better. Might be configuration option should be added to allow compression buffer to be larger and so compression more efficient. I also run OPTIMIZE table and size stayed the same – not a big surprise as I got this table by converting MyISAM table rather than incremental insertions.

I also compared it to compressed MyISAM – this would take 1250MB so Archive does compress a bit better than myisampack.

I also checked how much table would take if compressed with bzip2 – it took a while to compress but result was 310MB 40% better than compressed by gzip. I wish Archive storage engine would specify archiving method and compression level. Some people may like fast compression while others would prefer to have smallest size possible.

Now regarding full table scan speed – in my case running simple full table scan query on the table changed from 5 min to about 4 minutes, less than table size reduction which is surprising. May be this is because server is loaded and having concurrent IO which randomizes otherwise sequential IO. It would be interesting to know which buffer size Archive storage engine using for reads – I could not find it as difference could be due to larger read_buffer_size used by MyISAM.

Summary: Archive storage engine does compresses your data and should be good for log storage or storing something which needs no updates and no indexes, however compression ratio is not perfect, configuration and stats could be improved.

Share this post

Comments (7)

  • AMCT

    Nice stuff.

    November 12, 2006 at 12:00 am
  • Brian Aker

    Hi Peter!

    What version of the server are you using? More information should be available in 5.0 for Archive then what is produced above.


    November 12, 2006 at 10:05 pm
  • peter


    This is 4.1.21
    Good it was fixed in 5.0. Any changes regarding compression ratio ?

    November 12, 2006 at 10:12 pm
  • Andrew Aksyonoff

    I suppose it’s compressing the records independently; hence not-so-good compression ratio.

    November 17, 2006 at 1:24 am
  • peter


    It does not compress row by row this would make it unusable for small row sizes. It does use 32K buffer as Brian tells me , which is kind of small to get good compression ratio.

    November 17, 2006 at 9:03 am
  • Indra

    6 years later, any better?

    I guess the compression ratio isnt good because it needs to be selected as quick as possible (fast compression).
    How bout to compress the *.myd as small as possible but then voila, need to do select?

    September 14, 2012 at 10:48 am

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.