In my previous post I pointed out that the existing ARCHIVE storage engine in MySQL may not be the one that will satisfy your needs when it comes to effectively storing large and/or old data. But are there any good alternatives? As the primary purpose of this engine is to store rarely accessed data in disk space efficient way, I will focus here on data compression abilities rather then on performance.
The InnoDB engine provides compressed row format, but is it’s efficiency even close to the one from that available in archive engine? You can also compress MyISAM tables by using myisampack tool, but that also means a table will be read only after such operation.
Moreover, I don’t trust MyISAM nor Archive when it comes to data durability. Fortunately along came a quite new (open source since April 2013) player into this field – TokuDB! It seems to provide an excellent compression ratios, but also it’s fully ACID compliant, and does not have any of the limitations present in Archive, so it’s functionality is much more like InnoDB! This may allow you also to store production data on SSD drives, which disk space cost is still higher then on traditional disks, where otherwise it could be too expensive.
To better illustrate what choice do we have, I made some very simple disk savings comparison of all the mentioned variants.
I have used an example table with some scientific data fetched from here (no indexes):
|
1 |
CREATE TABLE `table1` (<br> `snp_id` int(11) DEFAULT NULL,<br> `contig_acc` varchar(32) DEFAULT NULL,<br> `contig_ver` tinyint(4) DEFAULT NULL,<br> `asn_from` int(11) DEFAULT NULL,<br> `asn_to` int(11) DEFAULT NULL,<br> `locus_id` int(11) DEFAULT NULL,<br> `locus_symbol` varchar(128) DEFAULT NULL,<br> `mrna_acc` varchar(128) DEFAULT NULL,<br> `mrna_ver` int(11) DEFAULT NULL,<br> `protein_acc` varchar(128) DEFAULT NULL,<br> `protein_ver` int(11) DEFAULT NULL,<br> `fxn_class` int(11) DEFAULT NULL,<br> `reading_frame` int(11) DEFAULT NULL,<br> `allele` text,<br> `residue` text,<br> `aa_position` int(11) DEFAULT NULL,<br> `build_id` varchar(4) NOT NULL,<br> `ctg_id` int(11) DEFAULT NULL,<br> `mrna_start` int(11) DEFAULT NULL,<br> `mrna_stop` int(11) DEFAULT NULL,<br> `codon` text,<br> `protRes` char(3) DEFAULT NULL,<br> `contig_gi` int(11) DEFAULT NULL,<br> `mrna_gi` int(11) DEFAULT NULL,<br> `mrna_orien` tinyint(4) DEFAULT NULL,<br> `cp_mrna_ver` int(11) DEFAULT NULL,<br> `cp_mrna_gi` int(11) DEFAULT NULL,<br> `verComp` varchar(7) NOT NULL<br>) |
|
1 |
mysql >show table status like 'table1'G<br>*************************** 1. row ***************************<br> Name: table1<br> Engine: ARCHIVE<br> Version: 10<br> Row_format: Compressed<br> Rows: 19829016<br> Avg_row_length: 11<br> Data_length: 221158267<br>Max_data_length: 0<br> Index_length: 0<br> Data_free: 0<br> Auto_increment: NULL<br> Create_time: NULL<br> Update_time: 2013-12-22 23:58:51<br> Check_time: NULL<br> Collation: latin1_swedish_ci<br> Checksum: NULL<br> Create_options: <br> Comment: <br>1 row in set (0.28 sec)<br><br>-rw-rw----. 1 przemek przemek 211M Dec 22 23:58 table1.ARZ |
|
1 |
mysql >show table status like 'table1'G<br>*************************** 1. row ***************************<br> Name: table1<br> Engine: TokuDB<br> Version: 10<br> Row_format: tokudb_zlib<br> Rows: 19829016<br> Avg_row_length: 127<br> Data_length: 2518948412<br>Max_data_length: 9223372036854775807<br> Index_length: 0<br> Data_free: 6615040<br> Auto_increment: NULL<br> Create_time: 2013-12-23 00:03:47<br> Update_time: 2013-12-23 00:12:14<br> Check_time: NULL<br> Collation: latin1_swedish_ci<br> Checksum: NULL<br> Create_options: <br> Comment: <br>1 row in set (0.13 sec)<br><br>-rwxrwx--x. 1 przemek przemek 284M Dec 23 00:12 _b_tokudb_table1_main_32_1_18_B_0.tokudb |
|
1 |
mysql [localhost] {msandbox} (b_tokudb) > show table status like 'table1'G<br>*************************** 1. row ***************************<br> Name: table1<br> Engine: TokuDB<br> Version: 10<br> Row_format: tokudb_lzma<br> Rows: 19829016<br> Avg_row_length: 127<br> Data_length: 2518948412<br>Max_data_length: 9223372036854775807<br> Index_length: 0<br> Data_free: 6950912<br> Auto_increment: NULL<br> Create_time: 2013-12-23 00:43:47<br> Update_time: 2013-12-23 00:49:14<br> Check_time: NULL<br> Collation: latin1_swedish_ci<br> Checksum: NULL<br> Create_options: row_format=TOKUDB_LZMA<br> Comment: <br>1 row in set (0.01 sec)<br><br>-rwxrwx--x. 1 przemek przemek 208M Dec 23 00:49 _b_tokudb_sql_980_2_main_1b92_2_18.tokudb |
(btw, did you notice how the file name changed after altering with different compression?
It’s no longer reflecting the real table name, so quite confusing 🙁 )
|
1 |
mysql > show table status like 'table1'G<br>*************************** 1. row ***************************<br> Name: table1<br> Engine: InnoDB<br> Version: 10<br> Row_format: Compact<br> Rows: 19898159<br> Avg_row_length: 117<br> Data_length: 2343567360<br>Max_data_length: 0<br> Index_length: 0<br> Data_free: 4194304<br> Auto_increment: NULL<br> Create_time: 2014-01-01 16:47:03<br> Update_time: NULL<br> Check_time: NULL<br> Collation: latin1_swedish_ci<br> Checksum: NULL<br> Create_options: <br> Comment: <br>1 row in set (0.42 sec)<br><br>-rw-rw----. 1 przemek przemek 2.3G Jan 1 16:37 table1.ibd |
|
1 |
mysql > show table status like 'table1'G<br>*************************** 1. row ***************************<br> Name: table1<br> Engine: InnoDB<br> Version: 10<br> Row_format: Compressed<br> Rows: 19737546<br> Avg_row_length: 59<br> Data_length: 1171783680<br>Max_data_length: 0<br> Index_length: 0<br> Data_free: 5767168<br> Auto_increment: NULL<br> Create_time: 2014-01-01 18:51:22<br> Update_time: NULL<br> Check_time: NULL<br> Collation: latin1_swedish_ci<br> Checksum: NULL<br> Create_options: row_format=COMPRESSED<br> Comment: <br>1 row in set (0.31 sec)<br><br>-rw-rw----. 1 przemek przemek 1.2G Jan 1 18:51 table1.ibd |
|
1 |
mysql > show table status like 'table1'G<br>*************************** 1. row ***************************<br> Name: table1<br> Engine: InnoDB<br> Version: 10<br> Row_format: Compressed<br> Rows: 19724692<br> Avg_row_length: 30<br> Data_length: 592445440<br>Max_data_length: 0<br> Index_length: 0<br> Data_free: 3932160<br> Auto_increment: NULL<br> Create_time: 2014-01-01 19:41:12<br> Update_time: NULL<br> Check_time: NULL<br> Collation: latin1_swedish_ci<br> Checksum: NULL<br> Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=4<br> Comment: <br>1 row in set (0.03 sec)<br><br>-rw-rw----. 1 przemek przemek 584M Jan 1 19:41 table1.ibd |
|
1 |
mysql > show table status like 'table1'G<br>*************************** 1. row ***************************<br> Name: table1<br> Engine: MyISAM<br> Version: 10<br> Row_format: Dynamic<br> Rows: 19829016<br> Avg_row_length: 95<br> Data_length: 1898246492<br>Max_data_length: 281474976710655<br> Index_length: 1024<br> Data_free: 0<br> Auto_increment: NULL<br> Create_time: 2013-12-23 11:02:28<br> Update_time: 2013-12-23 11:03:45<br> Check_time: NULL<br> Collation: latin1_swedish_ci<br> Checksum: NULL<br> Create_options:<br> Comment: <br>1 row in set (0.01 sec)<br><br>-rw-rw----. 1 przemek przemek 1.8G Dec 23 11:03 table1.MYD |
|
1 |
mysql > show table status like 'table1'G<br>*************************** 1. row ***************************<br> Name: table1<br> Engine: MyISAM<br> Version: 10<br> Row_format: Compressed<br> Rows: 19829016<br> Avg_row_length: 42<br> Data_length: 848098828<br>Max_data_length: 281474976710655<br> Index_length: 1024<br> Data_free: 0<br> Auto_increment: NULL<br> Create_time: 2013-12-23 11:02:28<br> Update_time: 2013-12-23 11:03:45<br> Check_time: NULL<br> Collation: latin1_swedish_ci<br> Checksum: 853535317<br> Create_options: <br> Comment: <br>1 row in set (0.00 sec)<br><br>-rw-rw----. 1 przemek przemek 809M Dec 23 11:03 table1.MYD |
Compression summary table
| Engine | Compression | Table size [MB] |
|---|---|---|
| InnoDB | none | 2272 |
| InnoDB | KEY_BLOCK_SIZE=8 | 1144 |
| InnoDB | KEY_BLOCK_SIZE=4 | 584 |
| MyISAM | none | 1810 |
| MyISAM | compressed with myisampack | 809 |
| Archive | default | 211 |
| TokuDB | ZLIB | 284 |
| TokuDB | LZMA | 208 |
So the clear winner is TokuDB, leaving InnoDB far behind. But this is just one test – the results may be very different for your specific data.
To get even better idea, let’s compare several crucial features available in mentioned storage engines
| Feature | Archive | MyISAM (compressed) | InnoDB | TokuDB |
| DML | only INSERTs | no | yes | yes |
| Transactions | no | no | yes | yes |
| ACID | no | no | yes | yes |
| Indexes | no | yes | yes | yes |
| Online DDL | no | no | yes * | yes ** |
* – since version 5.6, with some limitations
** – supports add/drop indexes, add/drop/rename columns and expand int, char, varchar and varbinary data types
TokuDB seems to be an excellent alternative when it comes to disk space usage efficiency, but this is not the only reason why you should try it perhaps.
You may want to check these articles too:
Resources
RELATED POSTS