Thoughts About Column Compression, with Optional Predefined Dictionary

November 10, 2016
Author
Shahriyar Rzayev
Share this Post:

column compressionThis blog discusses column compression with an optional predefined dictionary.

Compression, more compression with different algorithms, compress again, compress multiple times! 🙂 Compression is a hot topic in our lives.

In general, testing new things is great if the processes are well-described and easy to follow. Let’s try to think like a QA engineer: the first golden rule of QA is “everything is buggy, life is full of bugs: good night bugs, good morning bugs, hello my old bug friends.”

The second golden rule of QA is “OK, now let’s find a way to catch a bug — but remember that your methods can be buggy, too.”

Remember: always test! No bugs, no happiness!

When you start to test, the first goal is getting an idea of what is going on. This blog will demonstrate a test scenario for column compression with an optional predefined dictionary. For reference on column compression, read Compressed columns with dictionaries.”

To begin, let’s set up a basic environment:
The installation process requires installing Percona Server which is already documented here -> PS 5.6 installation

Secondly, find an already existing test: xtradb_compressed_columns_ibd_sizes.test.

Third, write a simple script to get started:

As you might notice, column compression might be with or without a compression dictionary. The visible difference, of course, is in the size of the tables. If you want to compress columns based on a predefined dictionary, you should create it with frequently used data. It is possible to create an empty dictionary, but it will have no effect. (See here: #1628231.)

The result of running this script is:

100.000 rows tables

t1 -> uncompressedt2 -> compressed column, t3 -> compressed column with compression dictionary, ‘names2’ dictionary

t2 -> compressed column,

t3 -> compressed column with compression dictionary, ‘names2’ dictionary.

Table size difference:

After running an optimize table:

The resulted size:

I want more:

Using  ROW_FORMAT=COMPRESSED requires innodb_file_format  to be >  Antelope. But this is not true for COLUMN_FORMAT.

Again, check the size:

Question: How do I get information about column compression dictionary and tables? Answer: tables from information_schema:

Question: How do I drop the compression dictionary? Answer: if it is in use, you will get:

Before dropping it, make sure there are no tables using the dictionary. There is an extreme condition where you are unable to drop the dictionary (see #1628824).

Question: Great! How about mysqldump? Answer: read here: mysqldump.

I might make this the topic of a dedicated post. Thanks for reading!

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved