November 24, 2014

table_cache negative scalability

Couple of months ago there was a post by FreshBooks on getting great performance improvements by lowering table_cache variable. So I decided to investigate what is really happening here.

The “common sense” approach to tuning caches is to get them as large as you can if you have enough resources (such as memory). With MySQL common sense however does not always works – we’ve seen performance issues with large query_cache_size also sort_buffer_size and read_buffer_size may not give you better performance if you increase them. I found this also applies to some other buffers.

Even though having previous experience of surprised behavior I did not expect such a table_cache issue – the LRU for cache management is classics and there are scalable algorithms to deal with it. I would expect Monty to implement one of them.

To do the test I have created 100.000 empty tables containing single integer column and no indexes and when ran SELECT * FROM tableN in the loop. Each table in such case is accessed only once and on any but first run each access would require table replacement in table cache based on LRU logic.
MySQL Sandbox helped me to test this with different servers easily.

I did test on CentOS 5.3, Xeon E5405, 16GB RAM and EXT3 file system on the SATA hard drive.

MySQL 5.0.85 Created 100.000 tables in around 3min 40 sec which is about 450 tables/sec – This indicates the “fsync” is lying on this test system as default sync_frm option is used.

With default table_cache=64 accessing all tables take 12 sec which is almost 8500 tables/sec which is a great speed. We can note significant writes to the disk during this read-only benchmark. Why ? Because for MyISAM tables table header has to be modified each time the table is opened. In this case the performance was so great because all 100.000 tables data (first block of index) was placed close by on disk as well as fully cached which made updates to headers very slow. In the production systems with table headers not in OS cache you often will see significantly low numbers – 100 or less.

With significantly larger table_cache=16384 (and appropriately adjusted number of open files) the same operation takes 660 seconds which is 151 tables/sec which is around 50 times slower. Wow. This is the slow down. We can see the load becomes very CPU bound in this case and it looks like some of the table_cache algorithms do not scale well.

The absolute numbers are also very interesting – 151 tables/sec is not that bad if you look at it as an absolute number. So if you tune table cache is “normal” case and is able to bring down your miss rate (opened_tables) to 10/sec or less by using large table_cache you should do so. However if you have so many tables you still see 100+ misses/sec while your data (at least table headers) is well cached so the cost of table cache miss is not very high, you may be better of with significantly reduced table cache size.

The next step for me was to see if the problem was fixed in MySQL 5.1 – in this version table_cache was significantly redone and split in table_open_cache and table_definition_cache and I assumed the behavior may be different as well.

MySQL 5.1.40
I started testing with default table_open_cache=64 and table_definition_cache=256 – the read took about 12 seconds very close to MySQL 5.0.85.
As I increased table_definition_cache to 16384 result remained the same so this variable is not causing the bottleneck. However increasing table_open_cache to 16384 causes scan to take about 780 sec which is a bit worse than MySQL 5.0.85. So the problem is not fixed in MySQL 5.1, lets see how MySQL 5.4 behaves.

MySQL 5.4.2
MySQL 5.4.2 has higher default table_open_cache so I took it down to 64 so we can compare apples to apples. It performs same as MySQL 5.0 and MySQL 5.1 with small table cache.
With table_open_cache increased to 16384 the test took 750 seconds so the problem exists in MySQL 5.4 as well.

So the problem is real and it is not fixed even in Performance focused MySQL 5.4. As we can see large table_cache (or table_open_cache_ values indeed can cause significant performance problems. Interesting enough Innodb has a very similar task of managing its own cache of file descriptors (set by innodb_open_files) As the time allows I should test if Heikki knows how to implement LRU properly so it does not have problem with large number. We’ll see.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Hi Peter,

    Interesting! We have a setup with a few tens of thousands of MyISAM tables (it looks like you tested with InnoDB?).

    It’s been a few years since I tested, but we only got decent performance after setting the table cache really high. I wonder if it makes a difference that your tables were empty?

    – ask

  2. peter says:

    Ask,

    In fact I tested MyISAM. In Innodb table_cache is not so much relevant as it keeps all tables open anyway.

    How many misses per second are you getting and how large is your data. The point is from CPU point the large table cache scales pretty badly however even with large number of tables in cache it still may be cheaper to invalidate the table than to really open it.

    In case of MyISAM to close old table and open new one we’re speaking about 2 write operations (modify index header) which will also cause 2 reads (because key block is typically just 1K) in case OS cache is under pressure. This means in IO bound case larger caches make sense.

    There is one more difference – I tested with uniform distribution to tables – in real life you will most likely see skewed distribution with some tables accessed more than others, which is completely different picture.

    As the common case large table cache is indeed better. I just wanted to show there are indeed cases like freshbooks which benefit from cache decrease.

    I believe in their case the miss ratio was not changed dramatically as table_cache was increased – so they got the problem and little benefits.

  3. Walter says:

    Hi Peter,

    very interesting. I’d like to see some more data on how the scaling performs. eg. table_cache values of 1000, 4000 and 8000 or so. That way we can get a better feel for what might be an optimal number. I would not be surprised if e.g. 1000 or less even would still perform fast, maybe even faster then 64?

    Walter

  4. peter says:

    Walter,

    You surely are welcome to repeat the test with different table cache sizes and different data access distribution.
    I predict in this workload (because it Always causes misses as long as table_cache is less than 100000) there will be no gains due to miss rate reduction with increase of table_cache. And I expect semi-linear regression with increased table cache size.

  5. Alex says:

    is it possible to “play around” with the table_cache setting without restarting mysql?

    i currently use 2048 on a r900 / 64gb , we have 180 tables and about 300 to 400 queries per second, so far perforamnce is not any issue but its always good to stay ahead of possible bottlenecks :)

  6. Robert says:

    I played with table_open_cache in 5.1 a few month ago for the same reason. I have about 1900 tables distributed over 32 databases which are randomly accessed.

    I also started with a size of 64 and increased the exponent of the power of two with every test. So I tested 64, 128, 256, 512, 1024… The best value for my setup was 512 compared with the opened_tables rate. Perhaps it’s a coincidence, but for my setup I selected a value of approximately 1/4 of the number of tables.

    But it depends on the usage of the tables. In such a benchmark, as Peter has done it, all tables are in use. In the real world, not all tables are constantly being accessed.

    To repeat Peter’s conclusion, use a small value of table(_open)_cache. If you have a high number of (re-)opened tables, increase the value and see what’s happening.

  7. Peter,

    Thanks for digging further into this one! At FreshBooks we’re still running a tiny table cache — 128 on our 32GB R900s. But historical problems with our schema means that we’re looking at millions of tables, unfortunately, which is why the “make it much, much larger” approach didn’t work for us.

    I suspect that for a reasonably huge number of tables (10k’s?) you can still come up with a table cache that caches enough of them that you get more benefits from the cached ones being held open than from the cost of the LRU algorithm — in our case, though, we had an *unreasonable* number and the cache would never be “large enough” — and that’s the case where smaller is better.

    Alex: SET GLOBAL table_cache = [value]

    Cheers,

    -Rich from FreshBooks

  8. Mark Callaghan says:

    Nice work. I think the penalty for reopening InnoDB tables includes the cost of resampling index cardinality and that can do several disk reads per index. I hope Percona fixes this because I would rather not learn that code myself.

  9. peter says:

    Mark,

    The reopening tables in Innodb never happens as it always keeps tables open (I guess you meant it does this because cost of re-sampling is too high) innodb_dict_size_limit in Percona patches allows to change this behavior. I agree we should change this behavior.

  10. peter says:

    Rich,

    Thanks. You confirmed my expectations. Somebody asked me how many tables did Freshbooks had and I guessed millions, it turns out I was close.

    Indeed the math one would expect – increasing table_cache in such case would allow to decrease miss ratio (just a bit) for small expense of a bit of memory (may be some 8KB per table cache entry). So it comes as a huge surprise when you see this small decrease in miss ratio comes in so huge increase of CPU time.

  11. peter says:

    Also,

    On setting table_cache online. Most time you can decrease it online, however increasing it may cause issues because MySQL will not change the number of open files limit in this case.

  12. Wagner Bianchi says:

    Hi Peter,

    On my database schema I have many tables, the exactly number is 853 tables and all these tables are accessed all time. Well, goig ahead with what exlain the manual, all thread connected will use the same table that was opened, i.e., in cache and it is configured by the value of table_cache.

    If I have 1000 active users or simultaneous conectiosn on my server (MySQL), if I use exchanges is most fast than use the high number configures in table_cache variable? What about the comparisons look at Open and Opened_tables?

    If I am missunderstood, plese corrects me.

    Thanks.

  13. peter says:

    Wagner,

    In your case number of tables is tiny and I trust you can fit them all in the cache easily – you can use large enough table cache. On the other hand i must say I did not see how table_cache size affects hits. I’m assuming only misses/invalidations slowdown but I have not tested it.

  14. Chris says:

    I don’t think 853 tables is tiny. Why so many? Are some of them duplicates providing duplicate funtionality to different groups of users? Why not combine like data in a single table for all users? Or am I missing something?

  15. Wagner Bianchi says:

    Maybe this number is tiny compared with the number up on the test did by Peter. That was 100.000.

    Regards.

  16. Igor says:

    I think table_cache values between 1,000 and 5,000 are of much higher practical relevance for vast majority of MySQL users (those that are not hosting providers etc).

  17. Sergei Golubchik says:

    I see, you didn’t bother to submit a bug report about it.

    Am I the only reader of your blog that want this fixed ?
    http://bugs.mysql.com/49177

  18. Mark Callaghan says:

    @Sergei – I want it fixed and I shared that with MySQL.

    @Peter – whether or not Innodb keeps a file descriptor open for the tablespace, index cardinality stats (~8 leaf blocks per index) are sampled when MySQL creates a table handler instance for an InnoDB table. That is the cost I mentioned above.

  19. Kostja says:

    And as you can see from the analysis in the bug report, this is not a table cache problem, but a MyISAM one.

  20. DaveJ says:

    Great Post, Thank you all for your info.

    Question:

    After kicking off mk-table-checksum in the middle of the night I start getting errors such as:

    100414 0:21:52 [ERROR] Error in accept: Too many open files
    100414 0:21:52 [ERROR] /usr/local/mysql/bin/mysqld: Can’t open file: ‘./wc_miehr_posasmp/panel_decert.frm’ (errno: 24)
    100414 0:21:52 [ERROR] /usr/local/mysql/bin/mysqld: Can’t open file: ‘./wc_miehr_posasmp/panel_decert.frm’ (errno: 24)

    [root@wcdb /db]# perror 24
    OS error code 24: Too many open files

    I have this many MyISAM tables:

    [root@xxx /db/mydata]# find -L . -name “*.MYI”|wc
    13713 13713 558523

    Number of possible needed fd’s: 13713 * 2 = 27426

    I have this many InnoDB tables:

    [root@xxx /db/mydata]# find -L . -name “*.ibd”|wc
    31014 31014 1144412

    Innodb is handled differently right? So 31014 table handles are always opened and would not be the cause of the errno: 24?

    If I add the two, 31014 + 27426 = 58440 os file descriptors.

    So –

    1) Do I get errno: 24 becuase I have the open_files_limit set to 40000?
    2) Could you explain the difference between MyISAM and InnoDB and the difference in how the OS handles opened files?
    3) Would converting the MyISAM tables to InnoDB fix or help the os ulimit max open tables issue?

    Thanks for your help!

    Server info:

    Server version: 5.0.79-enterprise-log MySQL Enterprise Server (Commercial)

    my.cnf:

    [mysqld]
    server-id = 2
    port = 3306
    basedir = /usr/local/mysql
    datadir = /db/mydata
    tmpdir = /db-logs/ibdlogs/dbtmp
    pid-file = /db/mysql.pid
    log-error = /db/mysql.err
    log-slow-queries = /db/slow-queries.log
    log-bin = /db-logs/bin-logs/mysql-logs/mysql-bin
    expire_logs_days = 30 # Expire logs after in X days.
    long_query_time = 2
    thread_cache_size = 100
    max_connections = 400
    table_cache = 20000
    open_files_limit = 40000
    max_binlog_size = 500M
    max_allowed_packet = 20M
    tmp_table_size = 100M
    max_heap_table_size = 100M
    query_cache_size = 100M
    default_table_type = MYISAM
    log_long_format

    # Innodb
    innodb_status_file = 1
    innodb_file_per_table = 1
    innodb_data_home_dir = /db2/innodb
    innodb_log_group_home_dir = /db-logs/ibdlogs
    innodb_data_file_path = ibdata1:10M:autoextend
    innodb_buffer_pool_size = 30G
    innodb_log_buffer_size = 8M
    innodb_additional_mem_pool_size = 8M
    innodb_log_file_size = 256M
    innodb_log_files_in_group = 2
    innodb_thread_concurrency = 16
    innodb_open_files = 20000
    innodb_flush_method = O_DIRECT
    innodb_flush_log_at_trx_commit = 2
    transaction_isolation = READ-COMMITTED

  21. DaveJ says:

    Great Post, Thank you all for your info.

    Question:

    After kicking off mk-table-checksum in the middle of the night I start getting errors such as:

    100414 0:21:52 [ERROR] Error in accept: Too many open files
    100414 0:21:52 [ERROR] /usr/local/mysql/bin/mysqld: Can’t open file: ‘./wc_miehr_posasmp/panel_decert.frm’ (errno: 24)
    100414 0:21:52 [ERROR] /usr/local/mysql/bin/mysqld: Can’t open file: ‘./wc_miehr_posasmp/panel_decert.frm’ (errno: 24)

    I have this many MyISAM tables:
    [root@xxx /db/mydata]# find -L . -name “*.MYI”|wc
    13713 13713 558523

    Number of possible needed fd’s: 13713 * 2 = 27426

    I have this many InnoDB tables:
    [root@xxx /db/mydata]# find -L . -name “*.ibd”|wc
    31014 31014 1144412

    Innodb is handled differently right? So 31014 table handles are always opened?

    If I add the two, 31014 + 27426 = 58440 os file descriptors.

    So –

    1) Do I get errno: 24 becuase I have the open_files_limit set to 40000?
    2) Could you explain the difference between MyISAM and InnoDB table cache?
    3) Would converting the MyISAM tables to InnoDB fix or help the os ulimit max open tables issue?

    Thanks for your help!

    Server info:

    Server version: 5.0.79-enterprise-log MySQL Enterprise Server (Commercial)

    my.cnf:

    [mysqld]
    server-id = 2
    port = 3306
    basedir = /usr/local/mysql
    datadir = /db/mydata
    tmpdir = /db-logs/ibdlogs/dbtmp
    pid-file = /db/mysql.pid
    log-error = /db/mysql.err
    log-slow-queries = /db/slow-queries.log
    log-bin = /db-logs/bin-logs/mysql-logs/mysql-bin
    expire_logs_days = 30 # Expire logs after in X days.
    long_query_time = 2
    thread_cache_size = 100
    max_connections = 400
    table_cache = 20000
    open_files_limit = 40000
    max_binlog_size = 500M
    max_allowed_packet = 20M
    tmp_table_size = 100M
    max_heap_table_size = 100M
    query_cache_size = 100M
    default_table_type = MYISAM
    log_long_format

    # Innodb
    innodb_status_file = 1
    innodb_file_per_table = 1
    innodb_data_home_dir = /db2/innodb
    innodb_log_group_home_dir = /db-logs/ibdlogs
    innodb_data_file_path = ibdata1:10M:autoextend
    innodb_buffer_pool_size = 30G
    innodb_log_buffer_size = 8M
    innodb_additional_mem_pool_size = 8M
    innodb_log_file_size = 256M
    innodb_log_files_in_group = 2
    innodb_thread_concurrency = 16
    innodb_open_files = 20000
    innodb_flush_method = O_DIRECT
    innodb_flush_log_at_trx_commit = 2
    transaction_isolation = READ-COMMITTED

  22. Chris Howells says:

    This has bitten me once or twice too. It still seems to be an issue in the most recent Percona release of MySQL 5.5.14 too.

    A patch has been written, have you thought about including it in Percona?

    http://bugs.mysql.com/bug.php?id=49177
    http://lists.mysql.com/commits/121507

  23. sudeep says:

    I have opened only one table but if i run SHOW GLOBAL STATUS LIKE ‘Opened_tables’ it is showing 100 open tables

  24. David Keith says:

    Sudeep, that is because you used Opened_tables, try:
    show global status like ‘Open_tables';

    And you will most likely see only one.

  25. Rick James says:

    The table cache used to be scanned linearly; has that ever been fixed? I see that 5.6.8 now defaults to 2000 — without a fix, this could lead to some degradation?

    Peter, your blog mentions the defunct version 5.4. I’m here because the antique mysql_tuner is pointing to this blog. Can you add some updated information?

    Some useful computed metrics (from GLOBAL STATUS):

    Open_tables / table_open_cache
    open_files_limit / table_open_cache

Speak Your Mind

*