October 24, 2014

Slow DROP TABLE

It is a known fact that ext3 is not the most efficient file system out there and for example file removals can be painfully slow and cause a lot of random I/O. However, as it turns out, it can sometimes have a much more severe impact on the MySQL performance that it would seem. When or why?

When you run

there are several things that need to happen – write lock on a table so it cannot be used by any other thread, the data file(s) removal by the storage engine and of course in the end MySQL has to destroy the definition file (.frm). That’s not all that happens, there is one other thing:

The entire code which removes a table is surrounded by

mutex. The mutex is used in various places inside MySQL, but primarily when tables are being opened or closed. It means that when

is locked, no query can be executed because they are stopped from accessing any table.

Here is when the slow file removal on the ext3 file system starts to be a pain. Deleting a 10GB file can take a few seconds and if that is a MySQL table, the mutex remains locked for all that time stalling all queries:

I tried some alternative approaches to trick MySQL into removing smaller files with

to minimize the effect, such as:

  • TRUNCATE TABLE large_table; ALTER TABLE large_table ENGINE=…; DROP TABLE large_table;
  • TRUNCATE TABLE large_table; OPTIMIZE TABLE large_table; DROP TABLE large_table;

Unfortunately as it turned out each of the administrative commands like

or

one way or another uses

mutex when the old table files are deleted:

The only alternative seems to be changing the file system. To XFS for example, which handles file removals much more efficiently:

EXT3

XFS

A better solution through MySQL internals could be to simulate the table drop by renaming the corresponding data file or files and physically removing them outside of the mutex lock. However it may not be that simple, because the actual removal is performed by the storage engine, so it’s not something MySQL code can control.

This is certainly not a common situation, but may become a problem to anyone when it’s the least expected (e.g. dropping old unused tables).

About Maciej Dobrzanski

Maciek is a former Percona employee.
An IT consultant with the primary focus on systems, databases and application stacks performance and scalability. Expert on open source technologies such as Linux, BSD, Apache, nginx, MySQL, and many more. Co-author of dba square - a blog about how to manage, scale, and optimize MySQL performance!

Comments

  1. Olivier B. says:

    And what about ext4 ? Since some months all my new MySQL servers use only ext4 instead of xfs ; is it a wrong choice ?

  2. Maciej Dobrzanski says:

    Oliver,

    I have seen ext4 with extents enabled to show a significant improvement over ext3 in large file removals, but still far behind xfs. It was a while ago though, so things could have changed since then.

    Maciek

  3. Toby says:

    Surely the correct solution is, as you say, to change the metadata (quickly) and let the file system operations happen asynchronously. Changing filesystems to solve this sounds pretty backasswards. (That said, how does reiser3 do here?)

  4. A solution from a completely different field.

    The developers of mythtv (a TV recording system) achieved better performing deletes of large files on ext3 through the “slow delete” feature. Typically TV recordings are 1-12GB per hour, so a movie could be 20+GB in size. Without the feature, a delete of the file on ext3 would lock up the entire system for about 20-30 seconds.

    The “slow delete” simply truncates the files in chunks in the background.

  5. Maciej Dobrzanski says:

    Toby,

    I haven’t used or even seen reiserfs in ages, but if I remember correctly it is not good with larger files.

    Maciek

  6. Maciej Dobrzanski says:

    Richard,

    I think that’s probably another step which could be considered to reduce the load spikes, however the primary problem is doing it inside a global mutex lock.

    Maciek

  7. Patrick E says:

    Does this also apply to temp tables? Seems like this could be a huge problem and would possibly cause a large sort to block other arbitrary select queries?

  8. pat says:

    Generally, temp tables are going to show up in /tmp which is usually tempfs. I wouldn’t expect to see slow delete mechanics at work in that case since you’re basically memory mapped although I’ll admit to not having run a test :).

  9. Olivier B. says:

    For me there is 3 cases where tables are created in /tmp :
    – when you use “create temporary table”
    – when MySQL need a temporary table which use BLOB or TEXT column
    – when MySQL need a temporary table which is bigger than tmp_table_size/max_heap_table_size

    Right ?

    And because of this last case, we can’t use /dev/shm to store temporary tables : it may use too much memory.

  10. pat says:

    Maybe I’m wrong here (its been a while since I looked seriously at filesystems), but isn’t tempfs always implemented as virtual memory on linux?

    http://en.wikipedia.org/wiki/TMPFS

  11. Ryan Huddleston says:

    This issue has been a big problem for us for a long time as we constantly create and drop schemas. I worked around the issue by having a special method that drops tables one at a time and sleeps if the process count gets too high on the master or slaves. Though the solution is far from ideal as large tables still lock things up for too long.

    See the mysql Worklog task for this at: forge.mysql.com/worklog/task.php?id=3983

  12. Patrick E says:

    On our production redhat machines at least it seems to be ext3, not sure if that’s default or a quirk of our particular setup though.

    /dev/vg00/lvtmp /tmp ext3 defaults 1 2

    And indeed tmpfs is virtual memory (RAM+SWAP) backed (http://en.wikipedia.org/wiki/TMPFS), however for a dedicated mysql machine that doesn’t make too much sense, since if you had the ram you’d just increase your tmp_table_size etc to use that ram directly i imagine. I don’t mind that these large sorts use disk, but i do mind if it blocks my other select queries for no good reason. As a general rule, locks should absolutely never be held around any I/O operation…

  13. Maciej Dobrzanski says:

    Patrick E,

    I haven’t examined the sources for temp tables, so I do not know if they suffer from a similar problem. I suppose since they do not require such locking, because there is no possibility for any concurrent access to them, MySQL may do things differently. However obviously if there is a large temporary table located on ext3 partition, it will cause a lot of random I/O on delete thus affecting the system performance.

    Maciek

  14. Maciej Dobrzanski says:

    pat,

    Actually typically /tmp is created on the system storage (i.e. local disk) using whatever file system chosen for the installation. To boost some occasional but rather smallish temporary tables some people move it to tmpfs (which I think is unreasonable design, because you would rather want MySQL tmpdir on tmpfs rather than the system-wide /tmp).

    Maciek

  15. Patrick E says:

    To explain our rationale for not using tmpfs, and using a disk store like ext3:

    1) Our database is typically 100% cached (both data and indexes), 16 – 20 GB in a few hundred MyISAM tables (out of 24 GB RAM on the machine). It is 99.9999% reads on a very static but highly random set of data.
    2) Most sorts are small or medium sized, we have max_heap_table_size=100M and tmp_table_size=100M in order to do these in memory
    3) We have occasional large (2-4 GB +) sorts that we’d like to spill to disk. It’s okay if these are somewhat slow. I/O contention is not an issues since all reads are cached and the few writes we have are not performance critical.

    I’m just wondering if in this case this lock will bite us for these large sorts (which afaics, we definitely do need on disk, since our ram is mostly dedicated to caching the db / indexes). I’d go spelunking, but have no experience with the mysql codebase really. Just a concerned user here :-)

  16. peter says:

    Maciek,

    I should note this does not only applies to MyISAM tables but to Innodb tables in innodb_file_per_table mode as well.

    I recently spoke to the customer which simply can’t drop 400G table because it stalls everything for long time.

  17. Ryan Huddleston says:
  18. Georgi Alexandrov says:

    Maciej,

    you’re testing this with ext3’s journal data, ordered data or write-back mode? As the default is ordered data and the closest behavior to XFS can be achieved via the write-back mode (which btw will become the default soon).

  19. Michael Will says:

    Note that ext4 as it is included in RHEL5.3 is clearly marked as a technology preview, not production quality. But since they refused to support XFS in favor of the outdated ext3 for years now, the best bet for heavy I/O machines is Novell SLES anyways :-)

    Michael

  20. Garrett says:

    I’ve found doing a TRUNCATE on a MYISAM table will resize the .MYD file without a lock. You can then drop the table instanstly. With InnoDB tables, the .IBD file doesn’t get resized/deleted until you OPTIMIZE or DROP.

  21. Mark Cotner says:

    If you’re using percona xtradb binaries you can use this option to reduce the pain.

    http://www.percona.com/doc/percona-server/5.1/performance/innodb_lazy_drop_table.html

    ‘njoy,
    Mark

Speak Your Mind

*