September 17, 2014

To pack or not to pack – MyISAM Key compression

MyISAM storage engine has key compression which makes its indexes much smaller, allowing better fit in caches and so improving performance dramatically. Actually packed indexes not a bit longer rows is frequent reason of MyISAM performing better than Innodb. In this article I’ll get in a bit more details about packed keys and performance implications it causes.

First lets see how key compression works. Key compression applies on the block boundaries – first index value is stored fully while for following value index will contain value of the bytes sames as in previous cases + the all bytes which are different. So lets say you have page starting with index values “aaaaaa” and “aaaaab”. They will be encoded as “aaaaaa”, “<5>b”. The similar compression applies to data pointer for the same key value. It also can be used for numbers as numbers stored with high byte first. This is obviously a bit of simplification but it should be enough to understand performance implications of this approach.

Note trailing spaces for strings are not stored in the index even if it is not packed.

Compressed blocks need to be treated differently. For uncompressed index blocks MySQL can do binary search inside the page – kind of jumping in the middle and looking at the key value and repeating it then repeating it for left or right half. For compressed block this is not going to work and MySQL will need to scan keyblock from the start uncompressing keys to find matching key value. This means the following will apply:

  • Forward range/index scan will be fast (read_next) as MyISAM will remember previous position and will not uncompress block from the start to retrieve next value
  • Reverse range/index scan will be slower (read_prev) as MyISAM will have to start bock uncompressing over to get previous value
  • Random single row lookup by index (ie joins) will also suffer as key block needs to be uncompressed for each of these
  • Larger pages means slower index lookup as you need to scan half the page in average to find the key value

So how you control which indexes are packed and which are not ? Unfortunately at this point you can’t set compression on per index basics. Neither you can set index block size for the keys. So you can’t look at your index and say “OK. This is going to be typically scanned forward, so I can pack it” The only settings available is PACK_KEYS=0|1|DEFAULT. Value 0 disables compression for all keys. Value 1 forces it for all keys. Value DEFAULT will use key compression for character column type but not for numerical and other column types.

Also according to my tests it looks there is something special in how integer primary key is handled. I can’t see the index size difference between DEFAULT and 1 values for such key. However it I change it to “key” the size reduces if PACK_KEYS=1. Might be it is some kind of special hack :)
The fact MyISAM will by default compress string keys but not integer keys is the biggest reason joins using string keys is so much slower compared to integer keys. Integer keys still best but for Innodb for example difference is not so large.

Let us try couple of benchmarks now. I’ve created simple table with 100.000 rows with columns. And I set id=c to keep it simple:

CREATE TABLE t1 (
id int(10) unsigned NOT NULL ,
c varchar(20) NOT NULL default ”,
KEY c (c),
KEY id (id)
) ENGINE=MyISAM

Index size:

  • PACK_KEYS=DEFAULT – 1550K
  • PACK_KEYS=1 – 1453K
  • PACK_KEYS=0 – 8176K

As we can see difference between 1 and DEFAULT is rather small for this case, as integers are relatively short.

Now lets do some benchmarks. I’m testing 4 types of queries:

  1. select count(*) from t1, t1 t2 where t1.c=t2.c - simple join by string column
  2. select count(*) from t1, t1 t2 where t1.id=t2.id - same but using id column
  3. select c from t1 where c like “%abc%” order by c limit 1 - forward index scan
  4. select c from t1 where c like “%abc%” order by c desc limit 1 – reverse index scan
  • PACK_KEYS=DEFAULT Q1: 3.50s Q2: 0.35s Q3: 0.12s Q4: 1.26s
  • PACK_KEYS=0 Q1: 1.25s Q2:0.40s Q3: 0.14s Q4: 0.16s
  • PACK_KEYS=1 Q1: 3.50s Q2:1.55s Q3: 0.12s Q4: 1.26s

So we can see theory goes pretty much in hand with practice here. With DEFAULT configuration lookups for integer keys are fast and forward scans are. It is however slow to do reverse scan on the string index.

It is interesting to see PACK_KEYS=0 does not always improve performance even if data fits in memory. Take a look at Q3 for. The the reason perhaps is – smaller index side, which means less traffic to memory. With modern CPUs it might be actually faster to do a bit of uncompression than to traverse large memory areas. Reverse index scan performance got some 8 times faster and join on string value is almost 3 times faster.
With PACK_KEYS=1 we got join by integer key performing almost 5 times slower than it was – this is something really to watch out for.

Note – these numbers are for CPU bound workload. If your load is disk bound there are more reasons to have keys packed so cache hit is better. Especially if you can get hot indexes in memory by packing them it is almost no brainer

Summary:

  • Key prefix compression is cool feature and may give great space savings.
  • Defaults for MyISAM are good for most cases
  • Avoid joining on strings, it is much slower anyway
  • Watch for reverse index scans (see Handler_read_prev) these are much slower for packed keys
  • Be careful with PACK_KEYS=1 as it can slow down your integer joins a lot
  • Be careful with PACK_KEYS=0 as it can blow up your index size 10x times.
About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. ivanhoe says:

    I understand this is an old post, but it’s still very helpful.
    However, one thing got me a bit puzzled: I’ve read & heard numerous times that mysql wouldn’t use indices for queries with LIKE ‘%abc%’ condition, which sort of made sense as it couldn’t go down the btree without knowing the first character… and indeed, whenever I tested it (with explain), it wouldn’t. But you suggest it does use index for you in this case? What’s the catch, am I missing something here?

  2. kL says:

    pack_keys doesn’t seem to do anything for me (MySQL 5.0).

    I have table with index in reverse order (primary key(col DESC)). If I run myisampack, it generates 1kb file which doesn’t work. myisamchk rebuilds the index, and rebuilt index file has sensible size, but still doesn’t work – all I get is “126 Incorrect key file for table”.

  3. arun says:

    can u post some numbers for compressed rows vs. uncompressed rows, pleaes?

  4. It was just in time that I read your article and gain important information.

    I have done a test to turn all the pack_keys in a Discuz! database into 1, in a middle size forum, it will reduce about 4MB index. But I find not obvious performance increasement. That is when I became confused about the introduction of MySQL, saying it will be faster for reading, which is what we need in Discuz! Forum application.

    I do some small tests and find that the speed is not increasing(besides, in my database, maybe there is very few string indexes, the speed declining is also not obvious). Then I search and find your article to confirm my idea.

    Your detail and profound statement and convining test is highly appericated.

    I also have a question, that you said “Be careful with PACK_KEYS=0 as it can blow up your index size 10x times.”, I think this is the general situation, but I am confused with a table, whose index declines when PACK_KEYS=0(changed from DEFAULT).

    MyISAM seems to set a limit to the mixinum of index data, in our kind of forum, this is not always apperciated. Because the number of the tables is considerable but many tables contain very little information(maybe during the long history, the actual data the table contain will be less than 4KB, but the index steaily remains 2MB or 4MB), will there be any solution to combat with this issue?

  5. Jayaram says:

    After pack the table using myisampack , while using the table in joins its give the error as ‘table is markeda as carshed and should be repaired’

    After repair the table eaither by using myisamchk –safe-recover or by using repiar table statement the data will be lost.

    Please give the solution

Speak Your Mind

*