Buy Percona ServicesBuy Now!

Compressed columns with dictionaries

In 5.6.33-79.0 Percona Server has been extended with a new per-column compression feature. It is a data type modifier, independent from user-level SQL and InnoDB data compression, that causes the data stored in the column to be compressed on writing to storage and decompressed on read. For all other purposes the data type is identical to the one without modifier, i.e. no new data types are created. Compression is done by using the zlib library.

Additionally it is possible to pre-define a set of strings for each compressed column to achieve a better compression ratio on a relatively small individual data items.

This feature provides:

  • a better compression ratio for text data which consist of a large number of predefined words (e.g. JSON or XML) using compression methods with static dictionaries,
  • in contrast to InnoDB row compression method it provides a way to select which columns in the table must be compressed.

This feature is based on a patch provided by Weixiang Zhai.

Specifications

The feature is limited to InnoDB/XtraDB storage engine and to:

  • BLOB (including TINYBLOB, MEDIUMBLOB, LONGBLOG),
  • TEXT (including LONGTEXT),
  • VARCHAR (including NATIONAL VARCHAR), and
  • VARBINARY columns.

The syntax to declare a compressed column is using an extension of an existing COLUMN_FORMAT modifier: COLUMN_FORMAT COMPRESSED. If this modifier is applied on an unsupported column type or storage engine, an error is returned.

The compression can be specified:

  • at the table create time: CREATE TABLE ... (..., foo BLOB COLUMN_FORMAT COMPRESSED, ...);
  • or a table can be modified to compress/decompress a column later: ALTER TABLE ... MODIFY [COLUMN] ... COLUMN_FORMAT COMPRESSED, or ALTER TABLE ... CHANGE [COLUMN] ... COLUMN_FORMAT COMPRESSED.

To decompress a column, a COLUMN_FORMAT value any other than COMPRESSED can be specified: FIXED, DYNAMIC, or DEFAULT. If there is a column compression/decompression request in an ALTER TABLE, it is forced to the COPY algorithm.

Two new variables: innodb_compressed_columns_zip_level and innodb_compressed_columns_threshold have been implemented.

Compression dictionary support

To achieve better compression ration on relatively small individual data items, it is possible to pre-define compression dictionary, which is a set of strings for each compressed column.

Compression dictionaries can be represented as a list of words in a form of a string (comma or any other character can be used as a delimiter although not required). In other words, a,bb,ccc, a bb ccc and abbccc will have the same effect. However, the latter is more space-efficient. Quote symbol quoting is handled by regular SQL quoting. Maximum supported dictionary length is 32506 bytes (zlib limitation).

The compression dictionary is stored in a new system InnoDB table. As this table is of the data dictionary kind, concurrent reads are allowed, but writes are serialized, and reads are blocked by writes. Table read through old read views are unsupported, similarly to InnoDB internal DDL transactions.

Example

In order to use the compression dictionary you’ll need first to create it. This can be done by running:

mysql> SET @dictionary_data = 'one' 'two' 'three' 'four';
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE COMPRESSION_DICTIONARY numbers (@dictionary_data);
Query OK, 0 rows affected (0.00 sec)

To create a table that has both compression and compressed dictionary support you should run:

mysql> CREATE TABLE t1(
        id INT,
        a BLOB COLUMN_FORMAT COMPRESSED,
        b BLOB COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY numbers
      ) ENGINE=InnoDB;

Following example shows how to insert a sample of JSON data into the table:

SET @json_value =
 '[\n'
 ' {\n'
 ' "one" = 0,\n'
 ' "two" = 0,\n'
 ' "three" = 0,\n'
 ' "four" = 0\n'
 ' },\n'
 ' {\n'
 ' "one" = 0,\n'
 ' "two" = 0,\n'
 ' "three" = 0,\n'
 ' "four" = 0\n'
 ' },\n'
 ' {\n'
 ' "one" = 0,\n'
 ' "two" = 0,\n'
 ' "three" = 0,\n'
 ' "four" = 0\n'
 ' },\n'
 ' {\n'
 ' "one" = 0,\n'
 ' "two" = 0,\n'
 ' "three" = 0,\n'
 ' "four" = 0\n'
 ' }\n'
 ']\n'
;
mysql> INSERT INTO t1 VALUES(0, @json_value, @json_value);
Query OK, 1 row affected (0.01 sec)

INFORMATION_SCHEMA Tables

This feature implemented two new INFORMATION_SCHEMA tables.

table INFORMATION_SCHEMA.XTRADB_ZIP_DICT
Columns:
  • id (BIGINT(21)_UNSIGNED) – dictionary ID
  • name (VARCHAR(64)) – dictionary name
  • zip_dict (BLOB) – compression dictionary string

This table provides a view over the internal compression dictionary table. SUPER privilege is required to query it.

table INFORMATION_SCHEMA.XTRADB_ZIP_DICT_COLS
Columns:
  • table_id (BIGINT(21)_UNSIGNED) – table ID from INFORMATION_SCHEMA.INNODB_SYS_TABLES
  • column_pos (BIGINT(21)_UNSIGNED) – column position (starts from 0 as in INFORMATION_SCHEMA.INNODB_SYS_COLUMNS)
  • dict_id (BIGINT(21)_UNSIGNED) – dictionary ID

This table provides a view over the internal table that stores the mapping between the compression dictionaries and the columns using them. SUPER privilege is require to query it.

Limitations

Compressed columns cannot be used in indices (neither on their own nor as parts of composite keys).

Note

CREATE TABLE t2 AS SELECT * FROM t1 will create a new table with a compressed column, whereas CREATE TABLE t2 AS SELECT CONCAT(a,'') AS a FROM t1 will not create compressed columns.

At the same time, after executing CREATE TABLE t2 LIKE t1 statement, t2.a will have COMPRESSED attribute.

ALTER TABLE ... DISCARD/IMPORT TABLESPACE is not supported for tables with compressed columns. To export and import tablespaces with compressed columns, you need to uncompress them first with: ALTER TABLE ... MODIFY ... COLUMN_FORMAT DEFAULT.

mysqldump command line parameters

By default, with no additional options, mysqldump will generate a MySQL compatible SQL output.

All /*!50633 COLUMN_FORMAT COMPRESSED */ and /*!50633 COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY <dictionary> */ won’t be in the dump.

When a new option enable-compressed-columns is specified, all /*!50633 COLUMN_FORMAT COMPRESSED */ will be left intact and all /*!50633 COLUMN_FORMAT COMPRESSED WITH COMPRESSION_DICTIONARY <dictionary> */ will be transformed into /*!50633 COLUMN_FORMAT COMPRESSED */. In this mode the dump will contain the necessary SQL statements to create compressed columns, but without dictionaries.

When a new enable-compressed-columns-with-dictionaries option is specified, dump will contain all compressed column attributes and compression dictionary.

Moreover, the following dictionary creation fragments will be added before CREATE TABLE statements which are going to use these dictionaries for the first time.

/*!50633 DROP COMPRESSION_DICTIONARY IF EXISTS <dictionary>; */
/*!50633 CREATE COMPRESSION_DICTIONARY <dictionary>(...); */

Two new options add-drop-compression-dictionary and skip-add-drop-compression-dictionary will control if /*!50633 DROP COMPRESSION_DICTIONARY IF EXISTS <dictionary> */ part from previous paragraph will be skipped or not. By default, add-drop-compression-dictionary mode will be used.

When both enable-compressed-columns-with-dictionaries and --tab=<dir> (separate file for each table) options are specified, necessary compression dictionaries will be created in each output file using the following fragment (regardless of the values of add-drop-compression-dictionary and skip-add-drop-compression-dictionary options).

/*!50633 CREATE COMPRESSION_DICTIONARY IF NOT EXISTS <dictionary>(...); */

Downgrade scenario

If it is necessary to perform Percona Server downgrade from a version 5.6.33-79.0 (or newer) to a version older than 5.6.33-79.0 and if user databases have one or more table with compressed columns, there are two options to do this safely:

  1. Use mysqldump in compatible mode (no compressed columns extensions must be specified).
  2. Manually remove the COMPRESSED attribute from all columns which have it via ALTER TABLE ... MODIFY ... COLUMN_FORMAT DEFAULT before updating server binaries. In this case, the downgraded server can start safely with old data files.

Version Specific Information

  • 5.6.33-79.0 Feature implemented in Percona Server 5.6

System Variables

variable innodb_compressed_columns_zip_level
Command Line:Yes
Config File:Yes
Scope:Global
Dynamic:Yes
Variable Type:Numeric
Default Value:6
Range:0-9

This variable is used to specify the compression level used for compressed columns. Specifying 0 will use no compression, 1 the fastest and 9 the best compression. Default value is 6.

variable innodb_compressed_columns_threshold
Command Line:Yes
Config File:Yes
Scope:Global
Dynamic:Yes
Variable Type:Numeric
Default Value:96
Range:1 - 2^64-1 (or 2^32-1 for 32-bit release)

By default a value being inserted will be compressed if its length exceeds innodb_compressed_columns_threshold bytes. Otherwise, it will be stored in raw (uncompressed) form.

Please also notice that because of the nature of some data, its compressed representation can be longer than the original value. In this case it does not make sense to store such values in compressed form as Percona Server would have to waste both memory space and CPU resources for unnecessary decompression. Therefore, even if the length of such non-compressible values exceeds innodb_compressed_columns_threshold, they will be stored in an uncompressed form (however, an attempt to compress them will still be made).

This parameter can be tuned in order to skip unnecessary attempts of data compression for values that are known in advance by the user to have bad compression ratio of their first N bytes.

Visit Percona Store


General Inquiries

For general inquiries, please send us your question and someone will contact you.