Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

The MySQL ARCHIVE storage engine – Alternatives

February 24, 2014

Author

Share this Post:

In my previous post I pointed out that the existing ARCHIVE storage engine in MySQL may not be the one that will satisfy your needs when it comes to effectively storing large and/or old data. But are there any good alternatives? As the primary purpose of this engine is to store rarely accessed data in disk space efficient way, I will focus here on data compression abilities rather then on performance.

The InnoDB engine provides compressed row format, but is it’s efficiency even close to the one from that available in archive engine? You can also compress MyISAM tables by using myisampack tool, but that also means a table will be read only after such operation.

Moreover, I don’t trust MyISAM nor Archive when it comes to data durability. Fortunately along came a quite new (open source since April 2013) player into this field – TokuDB! It seems to provide an excellent compression ratios, but also it’s fully ACID compliant, and does not have any of the limitations present in Archive, so it’s functionality is much more like InnoDB! This may allow you also to store production data on SSD drives, which disk space cost is still higher then on traditional disks, where otherwise it could be too expensive.

To better illustrate what choice do we have, I made some very simple disk savings comparison of all the mentioned variants.
I have used an example table with some scientific data fetched from here (no indexes):

CREATE TABLE `table1` (
  `snp_id` int(11) DEFAULT NULL,
  `contig_acc` varchar(32) DEFAULT NULL,
  `contig_ver` tinyint(4) DEFAULT NULL,
  `asn_from` int(11) DEFAULT NULL,
  `asn_to` int(11) DEFAULT NULL,
  `locus_id` int(11) DEFAULT NULL,
  `locus_symbol` varchar(128) DEFAULT NULL,
  `mrna_acc` varchar(128) DEFAULT NULL,
  `mrna_ver` int(11) DEFAULT NULL,
  `protein_acc` varchar(128) DEFAULT NULL,
  `protein_ver` int(11) DEFAULT NULL,
  `fxn_class` int(11) DEFAULT NULL,
  `reading_frame` int(11) DEFAULT NULL,
  `allele` text,
  `residue` text,
  `aa_position` int(11) DEFAULT NULL,
  `build_id` varchar(4) NOT NULL,
  `ctg_id` int(11) DEFAULT NULL,
  `mrna_start` int(11) DEFAULT NULL,
  `mrna_stop` int(11) DEFAULT NULL,
  `codon` text,
  `protRes` char(3) DEFAULT NULL,
  `contig_gi` int(11) DEFAULT NULL,
  `mrna_gi` int(11) DEFAULT NULL,
  `mrna_orien` tinyint(4) DEFAULT NULL,
  `cp_mrna_ver` int(11) DEFAULT NULL,
  `cp_mrna_gi` int(11) DEFAULT NULL,
  `verComp` varchar(7) NOT NULL
)

CREATE TABLE `table1` (

`snp_id` int(11) DEFAULT NULL,

`contig_acc` varchar(32) DEFAULT NULL,

`contig_ver` tinyint(4) DEFAULT NULL,

`asn_from` int(11) DEFAULT NULL,

`asn_to` int(11) DEFAULT NULL,

`locus_id` int(11) DEFAULT NULL,

`locus_symbol` varchar(128) DEFAULT NULL,

`mrna_acc` varchar(128) DEFAULT NULL,

`mrna_ver` int(11) DEFAULT NULL,

`protein_acc` varchar(128) DEFAULT NULL,

`protein_ver` int(11) DEFAULT NULL,

`fxn_class` int(11) DEFAULT NULL,

`reading_frame` int(11) DEFAULT NULL,

`allele` text,

`residue` text,

`aa_position` int(11) DEFAULT NULL,

`build_id` varchar(4) NOT NULL,

`ctg_id` int(11) DEFAULT NULL,

`mrna_start` int(11) DEFAULT NULL,

`mrna_stop` int(11) DEFAULT NULL,

`codon` text,

`protRes` char(3) DEFAULT NULL,

`contig_gi` int(11) DEFAULT NULL,

`mrna_gi` int(11) DEFAULT NULL,

`mrna_orien` tinyint(4) DEFAULT NULL,

`cp_mrna_ver` int(11) DEFAULT NULL,

`cp_mrna_gi` int(11) DEFAULT NULL,

`verComp` varchar(7) NOT NULL

)

ARCHIVE storage engine

mysql >show table status like 'table1'G
*************************** 1. row ***************************
           Name: table1
         Engine: ARCHIVE
        Version: 10
     Row_format: Compressed
           Rows: 19829016
 Avg_row_length: 11
    Data_length: 221158267
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: NULL
    Update_time: 2013-12-22 23:58:51
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.28 sec)

-rw-rw----. 1 przemek przemek 211M Dec 22 23:58 table1.ARZ

mysql >show table status like 'table1'G

*************************** 1. row ***************************

Name: table1

Engine: ARCHIVE

Version: 10

Row_format: Compressed

Rows: 19829016

Avg_row_length: 11

Data_length: 221158267

Max_data_length: 0

Index_length: 0

Data_free: 0

Auto_increment: NULL

Create_time: NULL

Update_time: 2013-12-22 23:58:51

Check_time: NULL

Collation: latin1_swedish_ci

Checksum: NULL

Create_options:

Comment:

1 row in set (0.28 sec)

-rw-rw----. 1 przemek przemek 211M Dec 22 23:58 table1.ARZ

TokuDB engine, default compression

mysql >show table status like 'table1'G
*************************** 1. row ***************************
           Name: table1
         Engine: TokuDB
        Version: 10
     Row_format: tokudb_zlib
           Rows: 19829016
 Avg_row_length: 127
    Data_length: 2518948412
Max_data_length: 9223372036854775807
   Index_length: 0
      Data_free: 6615040
 Auto_increment: NULL
    Create_time: 2013-12-23 00:03:47
    Update_time: 2013-12-23 00:12:14
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.13 sec)

-rwxrwx--x. 1 przemek przemek 284M Dec 23 00:12 _b_tokudb_table1_main_32_1_18_B_0.tokudb

mysql >show table status like 'table1'G

*************************** 1. row ***************************

Name: table1

Engine: TokuDB

Version: 10

Row_format: tokudb_zlib

Rows: 19829016

Avg_row_length: 127

Data_length: 2518948412

Max_data_length: 9223372036854775807

Index_length: 0

Data_free: 6615040

Auto_increment: NULL

Create_time: 2013-12-23 00:03:47

Update_time: 2013-12-23 00:12:14

Check_time: NULL

Collation: latin1_swedish_ci

Checksum: NULL

Create_options:

Comment:

1 row in set (0.13 sec)

-rwxrwx--x. 1 przemek przemek 284M Dec 23 00:12 _b_tokudb_table1_main_32_1_18_B_0.tokudb

TokuDB engine, highest compression

mysql [localhost] {msandbox} (b_tokudb) > show table status like 'table1'G
*************************** 1. row ***************************
           Name: table1
         Engine: TokuDB
        Version: 10
     Row_format: tokudb_lzma
           Rows: 19829016
 Avg_row_length: 127
    Data_length: 2518948412
Max_data_length: 9223372036854775807
   Index_length: 0
      Data_free: 6950912
 Auto_increment: NULL
    Create_time: 2013-12-23 00:43:47
    Update_time: 2013-12-23 00:49:14
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: row_format=TOKUDB_LZMA
        Comment: 
1 row in set (0.01 sec)

-rwxrwx--x. 1 przemek przemek 208M Dec 23 00:49 _b_tokudb_sql_980_2_main_1b92_2_18.tokudb

mysql [localhost] {msandbox} (b_tokudb) > show table status like 'table1'G

*************************** 1. row ***************************

Name: table1

Engine: TokuDB

Version: 10

Row_format: tokudb_lzma

Rows: 19829016

Avg_row_length: 127

Data_length: 2518948412

Max_data_length: 9223372036854775807

Index_length: 0

Data_free: 6950912

Auto_increment: NULL

Create_time: 2013-12-23 00:43:47

Update_time: 2013-12-23 00:49:14

Check_time: NULL

Collation: latin1_swedish_ci

Checksum: NULL

Create_options: row_format=TOKUDB_LZMA

Comment:

1 row in set (0.01 sec)

-rwxrwx--x. 1 przemek przemek 208M Dec 23 00:49 _b_tokudb_sql_980_2_main_1b92_2_18.tokudb

(btw, did you notice how the file name changed after altering with different compression?
It’s no longer reflecting the real table name, so quite confusing 🙁 )

InnoDB engine, uncompressed

mysql > show table status like 'table1'G
*************************** 1. row ***************************
           Name: table1
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 19898159
 Avg_row_length: 117
    Data_length: 2343567360
Max_data_length: 0
   Index_length: 0
      Data_free: 4194304
 Auto_increment: NULL
    Create_time: 2014-01-01 16:47:03
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.42 sec)

-rw-rw----. 1 przemek przemek 2.3G Jan  1 16:37 table1.ibd

mysql > show table status like 'table1'G

*************************** 1. row ***************************

Name: table1

Engine: InnoDB

Version: 10

Row_format: Compact

Rows: 19898159

Avg_row_length: 117

Data_length: 2343567360

Max_data_length: 0

Index_length: 0

Data_free: 4194304

Auto_increment: NULL

Create_time: 2014-01-01 16:47:03

Update_time: NULL

Check_time: NULL

Collation: latin1_swedish_ci

Checksum: NULL

Create_options:

Comment:

1 row in set (0.42 sec)

-rw-rw----. 1 przemek przemek 2.3G Jan 1 16:37 table1.ibd

InnoDB engine, compressed with default page size (8kB)

mysql > show table status like 'table1'G
*************************** 1. row ***************************
           Name: table1
         Engine: InnoDB
        Version: 10
     Row_format: Compressed
           Rows: 19737546
 Avg_row_length: 59
    Data_length: 1171783680
Max_data_length: 0
   Index_length: 0
      Data_free: 5767168
 Auto_increment: NULL
    Create_time: 2014-01-01 18:51:22
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: row_format=COMPRESSED
        Comment: 
1 row in set (0.31 sec)

-rw-rw----. 1 przemek przemek 1.2G Jan  1 18:51 table1.ibd

mysql > show table status like 'table1'G

*************************** 1. row ***************************

Name: table1

Engine: InnoDB

Version: 10

Row_format: Compressed

Rows: 19737546

Avg_row_length: 59

Data_length: 1171783680

Max_data_length: 0

Index_length: 0

Data_free: 5767168

Auto_increment: NULL

Create_time: 2014-01-01 18:51:22

Update_time: NULL

Check_time: NULL

Collation: latin1_swedish_ci

Checksum: NULL

Create_options: row_format=COMPRESSED

Comment:

1 row in set (0.31 sec)

-rw-rw----. 1 przemek przemek 1.2G Jan 1 18:51 table1.ibd

InnoDB engine, compressed with 4kB page size

mysql > show table status like 'table1'G
*************************** 1. row ***************************
           Name: table1
         Engine: InnoDB
        Version: 10
     Row_format: Compressed
           Rows: 19724692
 Avg_row_length: 30
    Data_length: 592445440
Max_data_length: 0
   Index_length: 0
      Data_free: 3932160
 Auto_increment: NULL
    Create_time: 2014-01-01 19:41:12
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=4
        Comment: 
1 row in set (0.03 sec)

-rw-rw----. 1 przemek przemek 584M Jan  1 19:41 table1.ibd

mysql > show table status like 'table1'G

*************************** 1. row ***************************

Name: table1

Engine: InnoDB

Version: 10

Row_format: Compressed

Rows: 19724692

Avg_row_length: 30

Data_length: 592445440

Max_data_length: 0

Index_length: 0

Data_free: 3932160

Auto_increment: NULL

Create_time: 2014-01-01 19:41:12

Update_time: NULL

Check_time: NULL

Collation: latin1_swedish_ci

Checksum: NULL

Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=4

Comment:

1 row in set (0.03 sec)

-rw-rw----. 1 przemek przemek 584M Jan 1 19:41 table1.ibd

MyISAM engine, uncompressed

mysql > show table status like 'table1'G
*************************** 1. row ***************************
           Name: table1
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 19829016
 Avg_row_length: 95
    Data_length: 1898246492
Max_data_length: 281474976710655
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2013-12-23 11:02:28
    Update_time: 2013-12-23 11:03:45
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment: 
1 row in set (0.01 sec)

-rw-rw----. 1 przemek przemek 1.8G Dec 23 11:03 table1.MYD

mysql > show table status like 'table1'G

*************************** 1. row ***************************

Name: table1

Engine: MyISAM

Version: 10

Row_format: Dynamic

Rows: 19829016

Avg_row_length: 95

Data_length: 1898246492

Max_data_length: 281474976710655

Index_length: 1024

Data_free: 0

Auto_increment: NULL

Create_time: 2013-12-23 11:02:28

Update_time: 2013-12-23 11:03:45

Check_time: NULL

Collation: latin1_swedish_ci

Checksum: NULL

Create_options:

Comment:

1 row in set (0.01 sec)

-rw-rw----. 1 przemek przemek 1.8G Dec 23 11:03 table1.MYD

MyISAM engine, compressed (myisampack)

mysql > show table status like  'table1'G
*************************** 1. row ***************************
           Name: table1
         Engine: MyISAM
        Version: 10
     Row_format: Compressed
           Rows: 19829016
 Avg_row_length: 42
    Data_length: 848098828
Max_data_length: 281474976710655
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2013-12-23 11:02:28
    Update_time: 2013-12-23 11:03:45
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: 853535317
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

-rw-rw----. 1 przemek przemek 809M Dec 23 11:03 table1.MYD

mysql > show table status like 'table1'G

*************************** 1. row ***************************

Name: table1

Engine: MyISAM

Version: 10

Row_format: Compressed

Rows: 19829016

Avg_row_length: 42

Data_length: 848098828

Max_data_length: 281474976710655

Index_length: 1024

Data_free: 0

Auto_increment: NULL

Create_time: 2013-12-23 11:02:28

Update_time: 2013-12-23 11:03:45

Check_time: NULL

Collation: latin1_swedish_ci

Checksum: 853535317

Create_options:

Comment:

1 row in set (0.00 sec)

-rw-rw----. 1 przemek przemek 809M Dec 23 11:03 table1.MYD

Compression summary table

Engine	Compression	Table size [MB]
InnoDB	none	2272
InnoDB	KEY_BLOCK_SIZE=8	1144
InnoDB	KEY_BLOCK_SIZE=4	584
MyISAM	none	1810
MyISAM	compressed with myisampack	809
Archive	default	211
TokuDB	ZLIB	284
TokuDB	LZMA	208

So the clear winner is TokuDB, leaving InnoDB far behind. But this is just one test – the results may be very different for your specific data.

To get even better idea, let’s compare several crucial features available in mentioned storage engines

Feature	Archive	MyISAM (compressed)	InnoDB	TokuDB
DML	only INSERTs	no	yes	yes
Transactions	no	no	yes	yes
ACID	no	no	yes	yes
Indexes	no	yes	yes	yes
Online DDL	no	no	yes *	yes **

* – since version 5.6, with some limitations
** – supports add/drop indexes, add/drop/rename columns and expand int, char, varchar and varbinary data types

Summary

TokuDB seems to be an excellent alternative when it comes to disk space usage efficiency, but this is not the only reason why you should try it perhaps.
You may want to check these articles too:

- Percona Server 5.6.14 with TokuDB 7.1 Experimental build

- Benchmarking Percona Server TokuDB vs InnoDB

- Using TokuDB: A Guided Walk Through a TokuDB Implementation

- Archive vs Percona XtraDB vs Tokutek TokuDB LOAD DATA performance

- Adventures in archiving

- Data Analytics: MyISAM vs ARCHIVE vs Infobright ICE vs Infobright IEE

0 0 votes

Article Rating

18 Comments

Oldest

Newest Most Voted

Shlomi Noach

12 years ago

Worth noting that TokuDB does not support foreign keys.
On compression it’s worth noting that ALTER TABLE into TokuDB with high compression is way faster than ALTE RTABLE into innodb with mild compression.

Last, though I perfectly understand the topic and cause of this post (finding an engine with good compression), calling TokuDB “an ARCHIVE alternative” is so amusing (“what? no ARCHIVE left on the shelf? Gosh, I guess I’ll have to do with the supported, faster, smaller, transactional, acid compliant, online-ddl, open source, cheap wannabe engine called TokuDB”)…

Last-last, worth noting that mixture of InnoDB & TokuDB requires carefull memory allocations.

Normann

12 years ago

im waiting for an Percona Server 5.6.x with TokuDB GA to test it 🙂

Tim Callaghan

12 years ago

Nice writeup/analysis, just a few clarifications about TokuDB:

– Yes, file names are obfuscated after a slow alter operation, but you can always get a mapping of database/table to file name using information_schema.tokudb_file_map
– TokuDB also supports online DDL for “expanding” integer and char/varchar/varbinary column types.

Lastly, strictly comparing raw compression without measuring it’s impact on performance can be misleading. Do you have performance numbers for this test?

Justin Swanhart

12 years ago

Shlomi,

For some use-cases it is a MyISAM or an InnoDB replacement. For others, where FK are an absolute deal breaker, it can not be an InnoDB alternative.

So discussing TokuDB in the context of ARCHIVE replacement does not mean it is only good as an ARCHIVE replacement. As you noted, we’ve talked about (and I imagine will continue to talk about) TokuDB in the context of many workloads.

It should of course be noted that TokuDB does not support the WORM (write-once, read many) properly of ARCHIVE.

Author

Przemysław Malkowski

12 years ago

‘@Schlomi, @Justin, thank you for valuable notes. Indeed I didn’t want to go into much details, since this post was meant to focus mostly on compression aspect.

@Tim, thank you for clarifications, however I could not find the details about online expanding data types in TokuDB, could you point me to the documentation?
Unfortunately I did not take a notes for this particular test performance numbers as it was not the goal of this post, but at least I can confirm altering to TokuDB was much faster then to compressed InnoDB. Besides there were posts on that already here, like: http://www.mysqlperformanceblog.com/2013/08/29/considering-tokudb-as-an-engine-for-timeseries-data/

Tim Callaghan

12 years ago

We refer to it as “hot column expansion” and it’s covered in section 3.4 of our documentation (we are currently reworking our documentation, after which I’ll be able to give a URL to the exact section), for now I’ll just past in the relevant section text:

“Hot column expansion operations are only supported to char, varchar, varbinary, and integer data types. Hot column expansion is not supported if the given column is part of the primary key or any secondary keys.”

Author

Przemysław Malkowski

12 years ago

‘@Tim, thank you, I’ve updated the post accordingly.
Btw. the ability to online expand int to bigint of an auto_increment PK would be awesome to have.

Justin Swanhart

12 years ago

It would have been nice if InnoDB has added the “change log” for their ALTER TABLE as an open interface in the server than any engine could use. As it stands, TokuDB could probably create a changelog table similar to the FlexCDC log tables, then apply the changes after the ALTER completed. New transactions might have to read from the log too. This would be very flexible and similar to the InnoDB change buffer.

Vladislav Rastrusny

12 years ago

What can be the reason for storing data both compressed and on SSD? If you store data compressed, probably, you don’t need it very often (because decompression will take CPU cycles). So, you can store it on a slower storage like regular HDD, right?

Nils

12 years ago

You’d probably want to check out the TokuDB documentation, especially tokudb_cache_size if you just want to use it for archiving, as it would reserve half of physical memory otherwise.

Author

Przemysław Malkowski

12 years ago

‘@Vladislav, TokuDB uses compression by default and according to Tokutek, you don’t have to make a compromise between speed and disk space savings. So my point was that this engine may be a good choice not only for archiving old data, but also for usual workloads. Check out this for instance: http://www.tokutek.com/2012/09/three-ways-that-fractal-tree-indexes-improve-ssd-for-mysql/

@Nils, you are right, and I expect everyone at least reads documentation before using a new toy in a production environment 😉

Rick James

12 years ago

Which 5.x version did you measure against? Oracle recently made some significant improvements in InnoDB compression, both in speed and compressibility.

MariaDB already includes TokuDB, so you don’t need to wait for Percona. https://mariadb.com/kb/en/mariadb-5536-release-notes/

Tim Callaghan

12 years ago

‘@RickJames, MySQL 5.6 can reduce the performance hit of compression by via dynamic padding, but keep in mind that any padding of the 16K block itself will eat into the InnoDB cache and thus mean less cache hits. It’s a trade-off. Also, the only improvement to compressibility that I’ve seen is that you can now define the zlib level, but increasing it will increase CPU consumption and likely lower performance, again a trade-off.

As for TokuDB’s availability in MariaDB right now, that is true. There is partial TokuDB functionality in MariaDB 5.5, and full functionality in MariaDB 10. See their knowledge base for more information at http://mariadb.com/kb/en/tokudb-differences/

Author

Przemysław Malkowski

12 years ago

‘@Rick, this was on InnoDB 5.5.30 and Tokudb 7.1.0. The same table on InnoDB 5.6.16 is innodb_compression_level=6 (default):
1072MB with KEY_BLOCK_SIZE=8
592MB with KEY_BLOCK_SIZE=4
Changing compression level to 9 did not change anything in result size in this case.

Paul Kamp

12 years ago

‘@Przemysław, nice write up. Have you looked at other database engines and how they compare across a number of different metrics? InnoDB and MyISAM are fairly mature.

I started working with WiredTiger recently and they are showing some interesting results with their preliminary testing with MySQL when compared to InnoDB and LevelDB.

http://wiredtiger.com/products/performance/

I’d like to learn a bit more about how performance dynamics may change across applications and functions. What have you seen?

Brano

11 years ago

TokuDB seems like really good engine. I wish I could use it on OSX (devel) and FREEBSD (server).

amar

8 years ago

toku db seems to be very efficient than those of others.

saloni shirodkar

8 years ago

We are not currently using the Archive storage engine, but have done some research on it. One requirement of our application is that it stores lots of historical information for long periods of time. That data is never updated, but needs to be selected occasionally.