It is a known fact that ext3 is not the most efficient file system out there and for example file removals can be painfully slow and cause a lot of random I/O. However, as it turns out, it can sometimes have a much more severe impact on the MySQL performance that it would seem. When or why?

When you run

there are several things that need to happen – write lock on a table so it cannot be used by any other thread, the data file(s) removal by the storage engine and of course in the end MySQL has to destroy the definition file (.frm). That’s not all that happens, there is one other thing:

The entire code which removes a table is surrounded by

mutex. The mutex is used in various places inside MySQL, but primarily when tables are being opened or closed. It means that when

is locked, no query can be executed because they are stopped from accessing any table.

Here is when the slow file removal on the ext3 file system starts to be a pain. Deleting a 10GB file can take a few seconds and if that is a MySQL table, the mutex remains locked for all that time stalling all queries:

I tried some alternative approaches to trick MySQL into removing smaller files with

to minimize the effect, such as:

  • TRUNCATE TABLE large_table; ALTER TABLE large_table ENGINE=…; DROP TABLE large_table;
  • TRUNCATE TABLE large_table; OPTIMIZE TABLE large_table; DROP TABLE large_table;

Unfortunately as it turned out each of the administrative commands like


one way or another uses

mutex when the old table files are deleted:

The only alternative seems to be changing the file system. To XFS for example, which handles file removals much more efficiently:



A better solution through MySQL internals could be to simulate the table drop by renaming the corresponding data file or files and physically removing them outside of the mutex lock. However it may not be that simple, because the actual removal is performed by the storage engine, so it’s not something MySQL code can control.

This is certainly not a common situation, but may become a problem to anyone when it’s the least expected (e.g. dropping old unused tables).

Share this post

Comments (23)

  • Olivier B. Reply

    And what about ext4 ? Since some months all my new MySQL servers use only ext4 instead of xfs ; is it a wrong choice ?

    June 16, 2009 at 11:47 am
  • Maciej Dobrzanski Reply


    I have seen ext4 with extents enabled to show a significant improvement over ext3 in large file removals, but still far behind xfs. It was a while ago though, so things could have changed since then.


    June 16, 2009 at 12:24 pm
  • Toby Reply

    Surely the correct solution is, as you say, to change the metadata (quickly) and let the file system operations happen asynchronously. Changing filesystems to solve this sounds pretty backasswards. (That said, how does reiser3 do here?)

    June 16, 2009 at 9:54 pm
  • Richard Dale Reply

    A solution from a completely different field.

    The developers of mythtv (a TV recording system) achieved better performing deletes of large files on ext3 through the “slow delete” feature. Typically TV recordings are 1-12GB per hour, so a movie could be 20+GB in size. Without the feature, a delete of the file on ext3 would lock up the entire system for about 20-30 seconds.

    The “slow delete” simply truncates the files in chunks in the background.

    June 17, 2009 at 12:54 am
  • Maciej Dobrzanski Reply


    I haven’t used or even seen reiserfs in ages, but if I remember correctly it is not good with larger files.


    June 17, 2009 at 1:56 am
  • Maciej Dobrzanski Reply


    I think that’s probably another step which could be considered to reduce the load spikes, however the primary problem is doing it inside a global mutex lock.


    June 17, 2009 at 2:01 am
  • Patrick E Reply

    Does this also apply to temp tables? Seems like this could be a huge problem and would possibly cause a large sort to block other arbitrary select queries?

    June 17, 2009 at 10:47 am
  • pat Reply

    Generally, temp tables are going to show up in /tmp which is usually tempfs. I wouldn’t expect to see slow delete mechanics at work in that case since you’re basically memory mapped although I’ll admit to not having run a test :).

    June 17, 2009 at 11:09 am
  • Olivier B. Reply

    For me there is 3 cases where tables are created in /tmp :
    – when you use “create temporary table”
    – when MySQL need a temporary table which use BLOB or TEXT column
    – when MySQL need a temporary table which is bigger than tmp_table_size/max_heap_table_size

    Right ?

    And because of this last case, we can’t use /dev/shm to store temporary tables : it may use too much memory.

    June 17, 2009 at 12:23 pm
  • pat Reply

    Maybe I’m wrong here (its been a while since I looked seriously at filesystems), but isn’t tempfs always implemented as virtual memory on linux?

    June 17, 2009 at 1:15 pm
  • Ryan Huddleston Reply

    This issue has been a big problem for us for a long time as we constantly create and drop schemas. I worked around the issue by having a special method that drops tables one at a time and sleeps if the process count gets too high on the master or slaves. Though the solution is far from ideal as large tables still lock things up for too long.

    See the mysql Worklog task for this at:

    June 17, 2009 at 2:17 pm
  • Patrick E Reply

    On our production redhat machines at least it seems to be ext3, not sure if that’s default or a quirk of our particular setup though.

    /dev/vg00/lvtmp /tmp ext3 defaults 1 2

    And indeed tmpfs is virtual memory (RAM+SWAP) backed (, however for a dedicated mysql machine that doesn’t make too much sense, since if you had the ram you’d just increase your tmp_table_size etc to use that ram directly i imagine. I don’t mind that these large sorts use disk, but i do mind if it blocks my other select queries for no good reason. As a general rule, locks should absolutely never be held around any I/O operation…

    June 17, 2009 at 2:52 pm
  • Maciej Dobrzanski Reply

    Patrick E,

    I haven’t examined the sources for temp tables, so I do not know if they suffer from a similar problem. I suppose since they do not require such locking, because there is no possibility for any concurrent access to them, MySQL may do things differently. However obviously if there is a large temporary table located on ext3 partition, it will cause a lot of random I/O on delete thus affecting the system performance.


    June 17, 2009 at 3:15 pm
  • Maciej Dobrzanski Reply


    Actually typically /tmp is created on the system storage (i.e. local disk) using whatever file system chosen for the installation. To boost some occasional but rather smallish temporary tables some people move it to tmpfs (which I think is unreasonable design, because you would rather want MySQL tmpdir on tmpfs rather than the system-wide /tmp).


    June 17, 2009 at 3:20 pm
  • Patrick E Reply

    To explain our rationale for not using tmpfs, and using a disk store like ext3:

    1) Our database is typically 100% cached (both data and indexes), 16 – 20 GB in a few hundred MyISAM tables (out of 24 GB RAM on the machine). It is 99.9999% reads on a very static but highly random set of data.
    2) Most sorts are small or medium sized, we have max_heap_table_size=100M and tmp_table_size=100M in order to do these in memory
    3) We have occasional large (2-4 GB +) sorts that we’d like to spill to disk. It’s okay if these are somewhat slow. I/O contention is not an issues since all reads are cached and the few writes we have are not performance critical.

    I’m just wondering if in this case this lock will bite us for these large sorts (which afaics, we definitely do need on disk, since our ram is mostly dedicated to caching the db / indexes). I’d go spelunking, but have no experience with the mysql codebase really. Just a concerned user here 🙂

    June 17, 2009 at 3:42 pm
  • peter Reply


    I should note this does not only applies to MyISAM tables but to Innodb tables in innodb_file_per_table mode as well.

    I recently spoke to the customer which simply can’t drop 400G table because it stalls everything for long time.

    June 17, 2009 at 4:34 pm
  • Ryan Huddleston Reply June 17, 2009 at 8:28 pm
  • Georgi Alexandrov Reply


    you’re testing this with ext3’s journal data, ordered data or write-back mode? As the default is ordered data and the closest behavior to XFS can be achieved via the write-back mode (which btw will become the default soon).

    June 30, 2009 at 12:32 am
  • Michael Will Reply

    Note that ext4 as it is included in RHEL5.3 is clearly marked as a technology preview, not production quality. But since they refused to support XFS in favor of the outdated ext3 for years now, the best bet for heavy I/O machines is Novell SLES anyways 🙂


    June 30, 2009 at 3:41 pm
  • Garrett Reply

    I’ve found doing a TRUNCATE on a MYISAM table will resize the .MYD file without a lock. You can then drop the table instanstly. With InnoDB tables, the .IBD file doesn’t get resized/deleted until you OPTIMIZE or DROP.

    December 9, 2009 at 1:43 pm
  • Mark Cotner Reply

    If you’re using percona xtradb binaries you can use this option to reduce the pain.


    May 24, 2012 at 8:59 am
  • Robert Lu Reply

    You can create hard link of table file, and drop table.

    drop will only minus 1 from table file’s count, it doesn’t lock table very long time.

    And then, we can delete hard link, which will doesn’t lock table.

    March 1, 2017 at 10:20 am
  • Murali Krishna Reply

    This is still a problem even after 10 years. I recently experienced this with a degrade in the performance when dropping a database which is hosted along with 10 other databases on the node and almost everything stalled for 3 minutes with a rapid increase of connections from 80 to 500. I am using MySQL 5.5.19

    July 19, 2019 at 3:15 pm

Leave a Reply