I have written before – MyISAM Does Not Scale, or it does quite well – two main things stopping you is table locks and global mutex on the KeyCache.
Table Locks are not the issue for Read Only workload and write intensive workloads can be dealt with by using with many tables but Key Cache Mutex will still hunt you. If you aware of MySQL history you may think Key Cache scalability was fixed with new Key Cache in MySQL 4.1, and indeed previously it did not even scale with one CPU as global lock was held during IO duration, In MySQL 4.1 the lock is held only when key block (1KB to 4KB) is being copied from Key Cache to thread local buffer, which is terrible contention spot in particular on systems with many CPU cores.
Happily there is solution, or at least half of it.
If you have chosen a way of using Multiple Tables to solve Table Locks problem you can also use multiple Key Caches to reduce or virtually eliminate key cache contention. Too bad you can only map single table to single key cache – it would be so much more helpful if you could use multiple key caches for the same table, for example caching even/odd key blocks or something similar, or actually just keep hash of locks instead of one.
When you decide to use Multiple Key Caches the question is how many to use, what sizes to allocate and how to map tables to them. One simple solution I use – create separate key cache for all actively accessed tables (assuming there are only few of them), allocating key_cache proportional to their size and load, but no more than the index size (assuming table sizes are relatively static)
To get accurate information about table usage I will use Percona Patches:
|
1 |
<br>mysql> select * from information_schema.table_statistics where table_schema='test' and table_name in ('a','b');<br>+--------------+------------+-----------+--------------+------------------------+<br>| TABLE_SCHEMA | TABLE_NAME | ROWS_READ | ROWS_CHANGED | ROWS_CHANGED_X_INDEXES |<br>+--------------+------------+-----------+--------------+------------------------+<br>| test | b | 589824 | 589824 | 589824 |<br>| test | a | 2949111 | 589824 | 589824 |<br>+--------------+------------+-----------+--------------+------------------------+<br>2 rows in set (0.00 sec)<br> |
For table sizes we can use traditional TABLES table:
|
1 |
<br>mysql> select table_schema,table_name,index_length from information_schema.tables where table_schema='test' and table_name in ('a','b');<br>+--------------+------------+--------------+<br>| table_schema | table_name | index_length |<br>+--------------+------------+--------------+<br>| test | a | 39514112 |<br>| test | b | 28390400 |<br>+--------------+------------+--------------+<br>2 rows in set (0.01 sec)<br> |
Now with a bit of INFORMATION_SCHEMA magic and a bit of waiting on “efficient” Information Schema Query Execution (as you may guess we just need to join two previous results sets here) we can get the information about relative table index sizes and their relative use activity. I just summed rows modified and updated but you can surely use different formula if you like.
|
1 |
<br>mysql> select t.table_name,index_length/(select sum(index_length) from information_schema.tables where table_schema='test' and t.table_name in ('a','b')) s_ratio, (rows_read+rows_changed)/(select sum(rows_read+rows_changed) from information_schema.table_statistics where table_schema='test' and t.table_name in ('a','b')) u_ratio from information_schema.table_statistics ts join information_schema.tables t on t.table_name=ts.table_name and t.table_schema=ts.table_schema where t.table_schema='test' and t.table_name in ('a','b');<br>+------------+---------+---------+<br>| table_name | s_ratio | u_ratio |<br>+------------+---------+---------+<br>| a | 0.5818 | 0.7488 |<br>| b | 0.4180 | 0.2496 |<br>+------------+---------+---------+<br>2 rows in set (3 min 23.67 sec)<br> |
A bit more query hacking and we get a query which will return statements to initialize key buffers according to table sizes and activity (in this case taken with 50-50 weight though you may use other formula), while maintaining the restriction on the sum key buffer size (4000000000 in this case) and actual index size:
|
1 |
<br>select concat("SET GLOBAL ", t.table_schema,"_",t.table_name,".key_buffer_size=",round(least(index_length,(index_length/(select sum(index_length) from information_schema.tables where table_schema="test" and t.table_name in ("a","b")) + (rows_read+rows_changed)/(select sum(rows_read+rows_changed) from information_schema.table_statistics where table_schema="test" and t.table_name in ("a","b")))/2*4000000000)),";") cmd from information_schema.table_statistics ts join information_schema.tables t on t.table_name=ts.table_name and t.table_schema=ts.table_schema where t.table_schema="test" and t.table_name in ("a","b");<br>+---------------------------------------------+<br>| cmd |<br>+---------------------------------------------+<br>| SET GLOBAL test_a.key_buffer_size=39514112; |<br>| SET GLOBAL test_b.key_buffer_size=28390400; |<br>+---------------------------------------------+<br>2 rows in set (2 min 31.68 sec)<br> |
Pass this via SELECT INTO OUTFILE or pipe it to mysql directly as explained here to create key caches.
Now you can use much more simple command to assign tables to the key caches:
|
1 |
<br>mysql> select concat("CACHE INDEX ",table_schema,".",table_name," IN ",table_schema,"_",table_name,";") from information_schema.tables where table_schema='test' and table_name in ("a","b");<br>+-------------------------------------------------------------------------------------------+<br>| concat("CACHE INDEX ",table_schema,".",table_name," IN ",table_schema,"_",table_name,";") |<br>+-------------------------------------------------------------------------------------------+<br>| CACHE INDEX test.a IN test_a; |<br>| CACHE INDEX test.b IN test_b; |<br>+-------------------------------------------------------------------------------------------+<br>2 rows in set (0.00 sec)<br> |
So going through complex or not multiple key cache creation exercise you probably wonder how much performance gains should you expect. In fact it can be very significant.
For CPU bound workload with 16 Cores Inserting data to about 20 tables I’ve seen performance gains as much as 10x compared to using single shared key cache of the same size.