November 23, 2014

How fast is FLUSH TABLES WITH READ LOCK?

A week or so ago at the MySQL conference, I visited one of the backup vendors in the Expo Hall. I started to chat with them about their MySQL backup product. One of the representatives told me that their backup product uses FLUSH TABLES WITH READ LOCK, which he admitted takes a global lock on the whole database server. However, he proudly told me that it only takes a lock for “a couple of milliseconds.” This is a harmful misconception that many backup vendors seem to hold dear.

The truth is, this command can take a lock for an indeterminate amount of time. It might complete in milliseconds on a test system in a laboratory, but I have seen it take an extremely long time on production systems, measured in many minutes, or potentially even hours. And during this time, the server will get completely blocked (not just read-only!) To understand why, let’s look at what this command actually does. There are several important parts of processing involved in the command.

Requesting the lock

The FLUSH TABLES WITH READ LOCK command immediately requests the global read lock. As soon as this happens, even before the lock is granted to it, all other processes attempting to modify anything in the system are locked out. In theory, this might not seem so bad because after all, the command acquires only a read lock. Other commands that need only a read lock can coexist with this. However, in practice, most tables are both read and written. The first write query to each table will immediately block against the requested global read lock, and subsequent read queries will block against the write query’s requested table lock, so the real effect is that the table is exclusively locked, and all new requests into the system are blocked. Even read queries!

Waiting for the lock

Before the FLUSH TABLES WITH READ LOCK command can successfully acquire the lock, anything else that currently holds the lock must finish what it’s doing. That means that every currently running query, including SELECT queries, must finish. So if there is a long-running query on the system, or an open transaction or another process that holds a table lock, the FLUSH TABLES WITH READ LOCK command itself will block until the other queries finish and all locks are released. This can take a very long time. It is not uncommon for me to log on to a customer’s system and see a query that has been running for minutes or hours. If such a query were to begin running just before the FLUSH TABLES WITH READ LOCK command is issued, the results could be very bad.

Here’s one example of what the system can look like while this process is ongoing:

Notice that connection 6 can’t even log in because it was a MySQL command-line client that wasn’t started with -A, and it’s trying to get a list of tables and columns in the current database for tab-completion. Note also that “Flushing tables” is a misnomer — connection 5 is not flushing tables yet. It’s waiting to get the lock.

Flushing tables

After the FLUSH TABLES WITH READ LOCK command finally acquires the lock, it must begin flushing data. This does not apply to all storage engines. However, MyISAM does not attempt to flush its own data to the disk during normal processing. It relies on the operating system to flush the data blocks to disk when it decides to. As a result, a system that has a lot of MyISAM data might have a lot of dirty blocks in the operating system buffer cache. This can take a long time to flush. During that time, the entire system is still locked. After all the data is finished, the FLUSH TABLES WITH READ LOCK command completes and sends its response to the client that issued it.

Holding the lock

The final part of this command is the duration during which the lock is held. The lock is released with UNLOCK TABLES or a number of other commands. Most backup systems that use FLUSH TABLES WITH READ LOCK are performing a relatively short operation inside of the lock, such as initiating a filesystem snapshot. So in practice, this often ends up being the shortest portion of the operation.

Conclusion

A backup system that is designed for real production usage must not assume that FLUSH TABLES WITH READ LOCK will complete quickly. In some cases, it is unavoidable. This includes backing up a mixture of MyISAM and InnoDB data. But many installations do not mix their data this way, and should be able to configure a backup system to avoid this global lock. There is no reason to take a lock at all for backing up only InnoDB data. Completely lock-free backups are easy to take. Backup vendors should build this capability into their products.

About Baron Schwartz

Baron is the lead author of High Performance MySQL.
He is a former Percona employee.

Comments

  1. I have a story about this, actually. One morning I woke up to find my website (twcenter.net) was down. I figured out that everything was waiting on MySQL, so I restarted it to see if that would fix it without any further downtime. It did, so I went on with my life — thankfully just a game fan site, no real money lost.

    Then I received an e-mail, saying “Backup failed: MySQL has gone away” or something to that effect. Eventually I realized what had happened. The previous day I had run some statistical query or other on a little-used log table. The query took too long to complete, so I aborted with Ctrl-C. In fact, in that version of MySQL, this only caused the client to exit, and left the query running in the background for hours.

    That night, the backup script had run FLUSH TABLES WITH READ LOCK. It immediately acquired locks on all tables — effectively write locks, as you say — except this one log table. It held those locks waiting for the lock on the log table, shutting the site down until I came around to fix it manually, because the statistics-gathering query I ran was taking many hours to run. Some convoluted thing with a subquery, if I remember correctly.

    I’ve since moved to InnoDB for everything, and use mysqldump –single-transaction for backups. Hopefully more people will start using InnoDB now that Oracle is making it the default engine in 5.5!

  2. It should be worthwhile to note that even for InnoDB-only systems, using FLUSH TABLES WITH READ LOCK may be a requirement: in the case you want to do incremental backups, hence must store the master position.
    That is, if your’e doing “mysqldump –single-transaction –master-data” – you’re good. But a mylvmbackup would have to use the FLUSH TABLES… in order to store the master’s log and pos.

  3. peter says:

    Shlomi,

    If you’re using LVM for backups for Innodb only tables there are 2 other methods to find your binary log position, if you’re performing backup on master.
    The last binary log file on the snapshot and its size will show the position. Also when Innodb does crash recovery it prints the binary log position.
    This unfortunately does not work if you are backing up from the slave. You may hack around with master.info but I have not really looked into it.

  4. Peter,
    Thanks!
    With regard to method #1: this is assuming sync_binlog=1, right? Otherwise innodb transaction log is not in sync with binary log?
    And, actually, the same for method #2?

    Shlomi

  5. peter says:

    Shlomi,

    Yes you’re right. Regarding position which Innodb prints on recovery – it is logical position so it will match backup, however it is possible at the time backup is taken the binary log is not yet flushed up to that point, though it is not really that important – it will be in a few seconds.

    In general there is some of black magic here no question. If you can afford flush tables with readlock you may not want to bother.

  6. How exactly do I get the master binlog position consistent with a transaction snapshot, in a way which does NOT involve FLUSH TABLES WITH READ LOCK?

    My understanding is that mysqldump –single-transaction –master-data does a FLUSH TABLES WITH READ LOCK, only so that it can do a BEGIN TRANSACTION; SHOW MASTER STATUS then immediately UNLOCK TABLES.

    This means we have to “stop the world” even if only for a short time, but the world may take quite a long time to stop, as noted.

    Does MySQL have any plans to fix this, or is there a workaround I’m not aware of?

  7. Sergei Golubchik says:

    Okay. But xtrabackup takes a global read lock too, as I’ve been told. Why does it do it ?

  8. Kostja says:

    Baron,
    when you say that backup systems should not need any locks, you’re not taking into account DDL operations.
    Backup needs to stop DDL at least, which is not transactional on most systems, so some sort of lock is still necessary, even if the data can be backed up fully online.

    I agree, however, that using FLUSH TABLES WITH READ LOCK for backups is a misnomer. This entire command is a MySQL and MyISAM specific hack.
    To help the matters a little bit in 5.5 we implement FLUSH TABLES WITH READ LOCK, so that you at least don’t have to wait for all tables for your backups, which in many cases only touch a part of database instance.

  9. Kostja says:

    WordPress ate part of my formatting. In 5.5 we implement FLUSH TABLES table1, table2, table3 WITH READ LOCK.

  10. Kostja says:

    Mark,
    InnoDB does store master binlog position in its system tables. I believe Marc Callaghan at some point worked on a patch that implemented START TRANSACTION WITH CONSISTENT SNAPSHOT statement, that would print the binlog position as part of it. This should be sufficient for hot backups of InnoDB. But one still needs some sort of metadata lock to protect the system against DDL.

  11. Sergei,

    XtraBackup doesn’t do that. The wrapper script innobackupex, which is a patched copy of innobackup (which comes from InnoDB) does that. innobackupex has an option to disable this behavior, but innobackup doesn’t have such an option.

  12. Kostja says:

    Baron,
    this only means that your backup tool is not DDL-safe. A table may be dropped in MySQL half-way through the external tool backing it up.

  13. Sergei Golubchik says:

    Baron,

    Vadim said that it does. Of course, it could be that he meant innobackupex, but didn’t say that.

    Are you saying that XtraBackup takes the binlog position from the xtradb logs ?

  14. Mark Callaghan says:

    How do you get the binlog offset when using Xtrabackup?

    START TRANSACTION WITH CONSISTENT INNODB SNAPSHOT gets the binlog offset without blocking for mysqldump backup. See http://bugs.mysql.com/bug.php?id=48124

  15. Venu says:

    Actually instead of issuing FlUSH TABLES WITH READ LOCK on the first call; instead one should issue FLUSH TABLES and may be another FLUSH TABLES followed by the READ LOCK as the 3rd call; so that if there is any long running queries; then rest of them are not blocked…and everything can run as usual.. the only tradeoff is if there is new query in the middle of flushes.

    FLUSH TABLES;
    FLUSH TABLES;
    FLUSH TABLES WITH READ LOCK;

  16. Vadim says:

    Sergei,

    I meant innobackupex, and this is what usually used to make backups.

    xtrabackup binary by itself copies only .ibd and ibdata1, and does not execute FLUSH TABLES WITH READ LOCK, but I
    do not consider it as regular way to perform backups.

  17. peter says:

    Vadim,

    I think there are couple of good and easy ideas in this thread :)
    We can add a feature to Percona Server to return binary log position while starting transaction. It should be easy enough. I know Heikki was not very eager to do anything to improve backup in MySQL because this would canibalize innodb hot backup sales. Now it is not the problem any more.
    Indeed this however would NOT be very safe regarding DDL but I believe most systems do not need DDL during backups so it is mainly what if it happens issue ? I belive same as MySQL has read_only option we can have something like backup mode which does not allow to create/alter/drop any non temporary tables.

  18. peter says:

    Kostya,

    There are 3 ways of backup (at least) – in case you’re using LVM DDLs are completely fine.

  19. Vadim says:

    Peter,

    In this post
    http://www.mysqlperformanceblog.com/2008/06/09/estimating-undo-space-needed-for-lvm-snapshot/

    you say “do not place [binary] logs on LVM”

    which means you can’t use get binary logs without FLASH TABLES

  20. There has been some talk of ways to start a transaction and get binary log position:
    http://bugs.mysql.com/bug.php?id=48124

  21. Vadim says:

    Peter,

    We can port START TRANSACTION WITH CONSISTENT INNODB SNAPSHOT to Percona Server,
    and I also wait for Facebook’s fix of broken group commit. That will be really great combination.

  22. peter says:

    Vadim,

    You can still get binary log position with LVM backup as Innodb will print it during crash recovery :)

  23. swhiteman says:

    Hey all, didn’t see any mention here of flush_time, which as I understand it has a major effect on the “…might have a lot of dirty blocks in the operating system buffer cache…” note. Speeding up the automatic flushes in the hours before the backup, then setting back to zero or whatever you normally use, can optimize this part of the FTWRL process. In fact on a Windows install the flush_time is, of necessity, so low that dirty blocks during backup are unlikely to be an issue (of course the periodic flushing on Win can make the server perform suboptimally in general, but that isn’t under discussion right now!).

  24. Sergei Golubchik says:

    I thought that the idea of innodb hotbackup was to copy the date and logs without communicating with the server at all. So, I don’t quite understand why you guys discuss different ways of getting the binlog position from the server instead of doing it the hotbackup way – just copy the files. After all, InnoDB records binlog position in the log, you can get it from there after you copied the files.

  25. Setting flush_time on a server that is heavily used will destroy performance server-wide every flush_time seconds. It would be better to have a tool that periodically looks at SHOW OPEN TABLES and flushes tables one at a time. It would be even better to just tell the operating system to lower the fraction of dirty blocks in its cache. There are a lot of bugs with various operating systems and filesystems that make approaches such as running “sync” really bad. See Richard Hipp’s slideshows about “the great fsync() bug” for example.

  26. peter says:

    Sergei,

    Innodb Hot Backup and XtraBackup indeed does a hot copy without any communication with server – this works for Innodb part of the story. There is however also .frm, “mysql” database and binary log position. You can get binary log position from the innodb log file indeed however you still need to have a consistent copy of .frm files and mysql database to have solution which works in all cases. In case you know there is no DDL are being run and no changes to mysql database (such as new stored procedures, no new users etc) you do not have to take the lock. This looks dirty but it can work in practice for a lot of installations.

  27. swhiteman says:

    I wasn’t trying to debate the merits of flush_time, but like it or not, there are thousands of installs flushing every 30m. Of course, comparatively few of them are in prod and need to worry at all about FTWRL delays, but I thought this variance should be kept in mind.

  28. arun says:

    FLUSH TABLES WITH READ LOCK; also tries to close open files that obviously takes time too. I did not double check it using lsof etc but the post http://www.geeksww.com/tutorials/database_management_systems/mysql/installation/mysql_tablecache_informationschema_and_open_files_limit.php says so.

    However, the things I do not understand: Is there really a way to take online backups of MyISAM tables without issuing FLUSH TABLES WITH READ LOCK command and/or setting flush_time (performance killer)? As far as I know, xtrabackup does the same thing, please correct me if I am wrong.

    Also, I wasn’t clear about data being flushed to disk 100% ie. leaving the OS buffers and getting flushed to disk after the FLUSH TABLES WITH READ LOCK command returns? If it still exists in OS buffer then backups might be different from the actual data, so what is the proper solution in that case?

  29. schmurx says:

    Hello,

    i know this topic is quite old but maybe someone can help me. I would like to know why flush tables with read lock decreases performance on queries started before the lock command. In my case queries take up 5-10 times as long with a waiting flush tables with read lock.

    Thanks in advance

  30. jon says:

    MySql sucks. There I said it.

Speak Your Mind

*