October 21, 2014

Trying Archive Storage Engine

Today I noticed one of server used for web request profiling stats logging is taking about 2GB per day for logs, which are written in MyISAM table without indexes. So I thought it is great to try how much archive storage engine could help me in this case.

Original Table:

Lets see what table compression gives us:

Surprise. It would not tell you how much space table takes. Quite unfortunate especially as it should not be hard to do – There are no indexes and information about size on disk and compression buffer stage in memory should be readily available. So we have to go and check size of on disk file:

-rw-rw—- 1 mysql mysql 19 Nov 12 11:31 requests_061111.ARM
-rw-rw—- 1 mysql mysql 984628803 Nov 12 11:29 requests_061111.ARZ
-rw-rw—- 1 mysql mysql 9162 Nov 12 11:00 requests_061111.frm

Can’t say I’m impressed we have only about 2 times compression rate for this table which is heavily redundant (urls, referers, user agents etc)

To check if this is fair compression rate I also run gzip requests_061111.MYD and I got file which as about 520MB in size – almost half the size of archive table. I guess this is due to incremental compression which Archive engine uses, still it should have done much better. Might be configuration option should be added to allow compression buffer to be larger and so compression more efficient. I also run OPTIMIZE table and size stayed the same – not a big surprise as I got this table by converting MyISAM table rather than incremental insertions.

I also compared it to compressed MyISAM – this would take 1250MB so Archive does compress a bit better than myisampack.

I also checked how much table would take if compressed with bzip2 – it took a while to compress but result was 310MB 40% better than compressed by gzip. I wish Archive storage engine would specify archiving method and compression level. Some people may like fast compression while others would prefer to have smallest size possible.

Now regarding full table scan speed – in my case running simple full table scan query on the table changed from 5 min to about 4 minutes, less than table size reduction which is surprising. May be this is because server is loaded and having concurrent IO which randomizes otherwise sequential IO. It would be interesting to know which buffer size Archive storage engine using for reads – I could not find it as difference could be due to larger read_buffer_size used by MyISAM.

Summary: Archive storage engine does compresses your data and should be good for log storage or storing something which needs no updates and no indexes, however compression ratio is not perfect, configuration and stats could be improved.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. AMCT says:

    Nice stuff.

  2. Brian Aker says:

    Hi Peter!

    What version of the server are you using? More information should be available in 5.0 for Archive then what is produced above.

    Cheers,
    -Brian

  3. peter says:

    Brian,

    This is 4.1.21
    Good it was fixed in 5.0. Any changes regarding compression ratio ?

  4. I suppose it’s compressing the records independently; hence not-so-good compression ratio.

  5. peter says:

    Andrew,

    It does not compress row by row this would make it unusable for small row sizes. It does use 32K buffer as Brian tells me , which is kind of small to get good compression ratio.

  6. Indra says:

    6 years later, any better?

    I guess the compression ratio isnt good because it needs to be selected as quick as possible (fast compression).
    How bout to compress the *.myd as small as possible but then voila, need to do select?

Speak Your Mind

*