October 21, 2014

MySQL opening .frm even when table is in table definition cache

or… “the case of Stewart recognizing parameters to the read() system call in strace output”.

Last week, a colleague asked a question:

I have an instance of MySQL with 100 tables and the table_definition_cache set to 1000. My understanding of this is that MySQL won’t revert to opening the FRM files to read the table definition, but we can see from strace:

So, why is this? It turns out that this triggered a memory for me from several years ago. I’ve since discovered the blog post in which I mention it: drop table fail (on the road to removing the FRM). That blog post is from 2008, almost three years ago to the day.

Since we completely reworked how metadata works in Drizzle, it has enabled us to do some truly wonderful things, including more in depth testing of the server. Amazingly enough, spin-offs from this work included being able to find out and then test that the ENUM limit of 65,535 has never been true (but now is in Drizzle), produce a CREATE TABLE statement that took over four minutes to execute and get a more complete view of how the Storage Engine API is called.

But back to what the above strace shows. In MySQL 5.5 you can find in sql/datadict.cc a function named dd_frm_type(). In MySQL 5.1, for some reason yet unknown to humans, it lives in sql/sql_view.cc as mysql_frm_type(). What this code snippet does is:

  • open the FRM
  • read 10 bytes (“header”)
  • check if it’s a view by doing a string compare for “TYPE=VIEW\n” being the first bytes of the FRM file. This is due to VIEWs being stored as the plain text of the SQL query inside the FRM file instead of the normal binary format FRM.
  • some legacy check for a generic table type (I think, I haven’t gone back into the deep history of the FRM file format to confirm)
  • return the fourth byte for the DB_TYPE. i.e. what storage engine it is.

We can ignore the upper limit on number of storage engines for MySQL and understanding the relationship between the range of numbers for dynamic assignment and what this means for on-disk compatibility of data directories is left as an exercise for the reader.

This code is called from several code paths in the server:

  • DROP TABLE
  • RENAME TABLE
  • DROP VIEW
  • open table
  • filling INFORMATION_SCHEMA tables (I think it is actually the TABLES table, but didn’t look closely)
An example of how this is used is that in the DROP TABLE code path, MySQL uses this magic byte to work out which Storage Engine to ask to drop the table. The main consequence of this bit of code is that MySQL may cause unnecessary disk IO for information it already has cached (often at least twice – in InnoDB itself and in the table_definition_cache).

Further reading:

About Stewart Smith

Stewart Smith has a deep background in database internals including MySQL, MySQL Cluster, Drizzle, InnoDB and HailDB. he is also one of the founding core developers of the Drizzle database server. He served at Percona from 2011-2014. He is a former Percona employee.

Comments

  1. Stewart,

    So how frequent does this opening frm will happen ? Will it happen only when the table_open_cache miss happens or will it happen on all table opens ?

    I would point our opening and reading few bytes from .frm should come from OS cache as the working set is very small in most cases, yet it can get expensive with NFS

  2. the MySQL open_table() code path isn’t the most obvious one in some cases…. it looks as though if it’s not a temporary table and we have a lock on the table, we’ll check to see if it’s a VIEW, and this is the dd_frm_type() call – so not the most common code path in open_table(), but one that some apps could hit often.

    On highly loaded servers however, things like the binlog will probably push a lot of FRMs out of the OS cache.

  3. oh, and this is before open table cache is checked.

  4. grace says:

    Stewart,
    does number of records in the table affects the RENAME function performance?

  5. No, not at all. For the most part RENAME TABLE just renames the appropriate files and changes the appropriate rows in the InnoDB data dictionary.

    Other engines may vary of course.

  6. Davi Arnaut says:

    Certain statements (such as truncate, drop table) should work even though the table is corrupted (and can’t be opened).

  7. this will depend on the storage engine of course.. for file based ones such as MyISAM this is certainly true. But for certain types of InnoDB data dictionary corruption, it could be more of a problem.

  8. Interesting. Now this may seem obvious, but is it not possible to cache this information in a cache akin to table_definition_cache or in that cache itself ?

Speak Your Mind

*