Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live. Register

Downloads

Blog

Using Multiple Key Caches for MyISAM Scalability

November 26, 2008

Author

Peter Zaitsev

Insight for DBAs

Share this Post:

I have written before – MyISAM Does Not Scale, or it does quite well – two main things stopping you is table locks and global mutex on the KeyCache.

Table Locks are not the issue for Read Only workload and write intensive workloads can be dealt with by using with many tables but Key Cache Mutex will still hunt you. If you aware of MySQL history you may think Key Cache scalability was fixed with new Key Cache in MySQL 4.1, and indeed previously it did not even scale with one CPU as global lock was held during IO duration, In MySQL 4.1 the lock is held only when key block (1KB to 4KB) is being copied from Key Cache to thread local buffer, which is terrible contention spot in particular on systems with many CPU cores.

Happily there is solution, or at least half of it.

If you have chosen a way of using Multiple Tables to solve Table Locks problem you can also use multiple Key Caches to reduce or virtually eliminate key cache contention. Too bad you can only map single table to single key cache – it would be so much more helpful if you could use multiple key caches for the same table, for example caching even/odd key blocks or something similar, or actually just keep hash of locks instead of one.

When you decide to use Multiple Key Caches the question is how many to use, what sizes to allocate and how to map tables to them. One simple solution I use – create separate key cache for all actively accessed tables (assuming there are only few of them), allocating key_cache proportional to their size and load, but no more than the index size (assuming table sizes are relatively static)

To get accurate information about table usage I will use Percona Patches:

mysql> select * from information_schema.table_statistics where table_schema='test' and table_name in ('a','b');
+--------------+------------+-----------+--------------+------------------------+
| TABLE_SCHEMA | TABLE_NAME | ROWS_READ | ROWS_CHANGED | ROWS_CHANGED_X_INDEXES |
+--------------+------------+-----------+--------------+------------------------+
| test         | b          |    589824 |       589824 |                 589824 |
| test         | a          |   2949111 |       589824 |                 589824 |
+--------------+------------+-----------+--------------+------------------------+
2 rows in set (0.00 sec)

mysql> select * from information_schema.table_statistics where table_schema='test' and table_name in ('a','b');

+--------------+------------+-----------+--------------+------------------------+

+--------------+------------+-----------+--------------+------------------------+

| test | b | 589824 | 589824 | 589824 |

| test | a | 2949111 | 589824 | 589824 |

+--------------+------------+-----------+--------------+------------------------+

2 rows in set (0.00 sec)

For table sizes we can use traditional TABLES table:

mysql> select table_schema,table_name,index_length from information_schema.tables where table_schema='test' and table_name in ('a','b');
+--------------+------------+--------------+
| table_schema | table_name | index_length |
+--------------+------------+--------------+
| test         | a          |     39514112 |
| test         | b          |     28390400 |
+--------------+------------+--------------+
2 rows in set (0.01 sec)

mysql> select table_schema,table_name,index_length from information_schema.tables where table_schema='test' and table_name in ('a','b');

+--------------+------------+--------------+

| table_schema | table_name | index_length |

+--------------+------------+--------------+

| test | a | 39514112 |

| test | b | 28390400 |

+--------------+------------+--------------+

2 rows in set (0.01 sec)

Now with a bit of INFORMATION_SCHEMA magic and a bit of waiting on “efficient” Information Schema Query Execution (as you may guess we just need to join two previous results sets here) we can get the information about relative table index sizes and their relative use activity. I just summed rows modified and updated but you can surely use different formula if you like.

mysql> select t.table_name,index_length/(select sum(index_length) from information_schema.tables where table_schema='test' and t.table_name in ('a','b')) s_ratio, (rows_read+rows_changed)/(select sum(rows_read+rows_changed) from information_schema.table_statistics where table_schema='test' and t.table_name in ('a','b')) u_ratio from information_schema.table_statistics ts join information_schema.tables t on t.table_name=ts.table_name and t.table_schema=ts.table_schema where t.table_schema='test' and t.table_name in ('a','b');
+------------+---------+---------+
| table_name | s_ratio | u_ratio |
+------------+---------+---------+
| a          |  0.5818 |  0.7488 |
| b          |  0.4180 |  0.2496 |
+------------+---------+---------+
2 rows in set (3 min 23.67 sec)

mysql> select t.table_name,index_length/(select sum(index_length) from information_schema.tables where table_schema='test' and t.table_name in ('a','b')) s_ratio, (rows_read+rows_changed)/(select sum(rows_read+rows_changed) from information_schema.table_statistics where table_schema='test' and t.table_name in ('a','b')) u_ratio from information_schema.table_statistics ts join information_schema.tables t on t.table_name=ts.table_name and t.table_schema=ts.table_schema where t.table_schema='test' and t.table_name in ('a','b');

+------------+---------+---------+

| table_name | s_ratio | u_ratio |

+------------+---------+---------+

| a | 0.5818 | 0.7488 |

| b | 0.4180 | 0.2496 |

+------------+---------+---------+

2 rows in set (3 min 23.67 sec)

A bit more query hacking and we get a query which will return statements to initialize key buffers according to table sizes and activity (in this case taken with 50-50 weight though you may use other formula), while maintaining the restriction on the sum key buffer size (4000000000 in this case) and actual index size:

select concat("SET GLOBAL ", t.table_schema,"_",t.table_name,".key_buffer_size=",round(least(index_length,(index_length/(select sum(index_length) from information_schema.tables where table_schema="test" and t.table_name in ("a","b")) + (rows_read+rows_changed)/(select sum(rows_read+rows_changed) from information_schema.table_statistics where table_schema="test" and t.table_name in ("a","b")))/2*4000000000)),";") cmd from information_schema.table_statistics ts join information_schema.tables t on t.table_name=ts.table_name and t.table_schema=ts.table_schema where t.table_schema="test" and t.table_name in ("a","b");
+---------------------------------------------+
| cmd                                         |
+---------------------------------------------+
| SET GLOBAL test_a.key_buffer_size=39514112; |
| SET GLOBAL test_b.key_buffer_size=28390400; |
+---------------------------------------------+
2 rows in set (2 min 31.68 sec)

select concat("SET GLOBAL ", t.table_schema,"_",t.table_name,".key_buffer_size=",round(least(index_length,(index_length/(select sum(index_length) from information_schema.tables where table_schema="test" and t.table_name in ("a","b")) + (rows_read+rows_changed)/(select sum(rows_read+rows_changed) from information_schema.table_statistics where table_schema="test" and t.table_name in ("a","b")))/2*4000000000)),";") cmd from information_schema.table_statistics ts join information_schema.tables t on t.table_name=ts.table_name and t.table_schema=ts.table_schema where t.table_schema="test" and t.table_name in ("a","b");

+---------------------------------------------+

| cmd |

+---------------------------------------------+

| SET GLOBAL test_a.key_buffer_size=39514112; |

| SET GLOBAL test_b.key_buffer_size=28390400; |

+---------------------------------------------+

2 rows in set (2 min 31.68 sec)

Pass this via SELECT INTO OUTFILE or pipe it to mysql directly as explained here to create key caches.

Now you can use much more simple command to assign tables to the key caches:

mysql> select concat("CACHE INDEX ",table_schema,".",table_name," IN ",table_schema,"_",table_name,";") from information_schema.tables where table_schema='test' and table_name in ("a","b");
+-------------------------------------------------------------------------------------------+
| concat("CACHE INDEX ",table_schema,".",table_name," IN ",table_schema,"_",table_name,";") |
+-------------------------------------------------------------------------------------------+
| CACHE INDEX test.a IN test_a;                                                             |
| CACHE INDEX test.b IN test_b;                                                             |
+-------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> select concat("CACHE INDEX ",table_schema,".",table_name," IN ",table_schema,"_",table_name,";") from information_schema.tables where table_schema='test' and table_name in ("a","b");

+-------------------------------------------------------------------------------------------+

| concat("CACHE INDEX ",table_schema,".",table_name," IN ",table_schema,"_",table_name,";") |

+-------------------------------------------------------------------------------------------+

| CACHE INDEX test.a IN test_a; |

| CACHE INDEX test.b IN test_b; |

+-------------------------------------------------------------------------------------------+

2 rows in set (0.00 sec)

So going through complex or not multiple key cache creation exercise you probably wonder how much performance gains should you expect. In fact it can be very significant.

For CPU bound workload with 16 Cores Inserting data to about 20 tables I’ve seen performance gains as much as 10x compared to using single shared key cache of the same size.

0 0 votes

Article Rating

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Thomas B.

17 years ago

Hi peter,

thank you very much for all your valuable tips! I did not even think the KeyCache could be a problem.

But I wonder how to find out if I am hit by this problem or not. Does “For CPU bound workload” mean that I should have a look at this if the CPU usage on my database server is high?

Maybe it would make the articles even more useful if they started with “You should look at this if you have this problem: …”.

Best Regards

Thomas B.

Author

Peter Zaitsev

17 years ago

Thomas,

It is hard to explain all the cases when one or another problem may be affecting you. I try best I can when possible. You are also always welcome to hire us to help you to understand what problem you’re really running into.

Abhishek

17 years ago

Hi Peter,

How can we monitor performance of multiple key caches, like for monitoring global key cache we use “show status like ‘key_read%’; ” but how i can see same information for new created key cache.

John Dzilvelis

17 years ago

To view the usage of the individual key buffers, I use “mysqladmin debug” and then view the error log.

Allen D.

16 years ago

Hi, Peter:

First off, your blog is an AWESOME resource and I want to thank you for it. I have found the intel contained herein of great value.

I just wanted to post and confirm that smart use of multiple key caches for MyISAM tables can *dramatically* increase performance in high write applications where you have very little reads (think master tables). The MySQL documentation is not quite as informative as I think it should be for this feature – they suggest creating 3 key caches without really talking about the flexibility you can have. While the MySQL docuementation appraoch may help some you can definitely squeeze more performance with N key caches as you describe.

We have a very high-write environment that uses MyISAM for a number of reason (too much history to go into). Before tinkering with multiple key caches I was able to only get 6k – 7k QPS (really all writes). This was not utilizing the full capacity of the hardware we were running this particular master on. We took the time to examine and segment the tables that were candidates for giving their own key cache too. Our approach was not one cache per table but one per group of tables that are typically updated together by a single process. We choose this route really for manageability (we have a large warehouse with lots of daily tables). We were able to size key caches easily based on index sizes and we now routinely see 15k-20k QPS now! I have seen it go as high as 40k with the current configuration. The good news is the more ram and processor you can throw at your writes, the further you can segment your key caches. Obviously there will be upper bounds but we haven’t even come close to hammering them yet.

As Peter suggests, take time and look at your operational posture and your index sizes and you will be shocked at how much more you can get out of MyISAM. It does not solve all the locking contention but if you have good table designs for your operational posture MyISAM can fly.

Ted Osadchuk

15 years ago

I experimented with separate key caches on a very busy instance and cpu intensive queries and did not see ANY improvement using separate key caches

This is yet another tuning knob that’s completely useless!

Author

Peter Zaitsev

15 years ago

Ted,

It did not work for you but it works in a lot of other cases. CPU bound load does not mean it it bottleneck on the key cache mutex.

Also note there is another solution in the meanwhile. MariaDB has got fix to the key cache splitting this mutex to several hashes. You may want to try that.

Allen D.

15 years ago

Ted:

I would agree with Peter. We do not have anything CPU-bound rather all concurrency and IO. The posture of the servers in question is 90% writes and NO deletes. Reads are filled by memcacxhe with the DB as backup if the cache for a record has expired. Without the key cahces performance is abysimal! In our scenario, having the right indexes (read primary and a unique key only) and memory allocated to each key cache helps concurrency a ton.