When taking backups, FLUSH TABLES WITH READ LOCK is being used before the non-InnoDB files are being backed up to ensure backup is being consistent. FLUSH TABLES WITH READ LOCK can be run even though there may be a running query that has been executing for hours. In this case everything will be locked up in Waiting for table flush or Waiting for master to send event states. Killing the FLUSH TABLES WITH READ LOCK does not correct this issue either. In this case the only way to get the server operating normally again is to kill off the long running queries that blocked it to begin with. This means that if there are long running queries FLUSH TABLES WITH READ LOCK can get stuck, leaving server in read-only mode until waiting for these queries to complete.
In order to prevent this from happening two things have been implemented:
Good moment to issue a global lock is the moment when there are no long queries running. But waiting for a good moment to issue the global lock for extended period of time isn’t always good approach, as it can extend the time needed for backup to take place. To prevent innobackupex from waiting to issue FLUSH TABLES WITH READ LOCK for too long, new option has been implemented: innobackupex --lock-wait-timeout option can be used to limit the waiting time. If the good moment to issue the lock did not happen during this time, innobackupex will give up and exit with an error message and backup will not be taken. Zero value for this option turns off the feature (which is default).
Another possibility is to specify the type of query to wait on. In this case innobackupex --lock-wait-query-type. Possible values are all and update. When all is used innobackupex will wait for all long running queries (execution time longer than allowed by innobackupex --lock-wait-threshold) to finish before running the FLUSH TABLES WITH READ LOCK. When update is used innobackupex will wait on UPDATE/ALTER/REPLACE/INSERT queries to finish.
Although time needed for specific query to complete is hard to predict, we can assume that queries that are running for a long time already will likely not be completed soon, and queries which are running for a short time will likely be completed shortly. innobackupex can use the value of innobackupex --lock-wait-threshold option to specify which query is long running and will likely block global lock for a while.
Second option is to kill all the queries which prevent global lock from being acquired. In this case all the queries which run longer than FLUSH TABLES WITH READ LOCK are possible blockers. Although all queries can be killed, additional time can be specified for the short running queries to complete. This can be specified by innobackupex --kill-long-queries-timeout option. This option specifies the time for queries to complete, after the value is reached, all the running queries will be killed. Default value is zero, which turns this feature off.
innobackupex --kill-long-query-type option can be used to specify all or only SELECT queries that are preventing global lock from being acquired. In order to use this option xtrabackup user should have PROCESS and SUPER privileges.
Running the innobackupex with the following options:
$ innobackupex --lock-wait-threshold=40 --lock-wait-query-type=all --lock-wait-timeout=180 --kill-long-queries-timeout=20 --kill-long-query-type=all /data/backups/
will cause innobackupex to spend no longer than 3 minutes waiting for all queries older than 40 seconds to complete. After FLUSH TABLES WITH READ LOCK is issued, innobackupex will wait 20 seconds for lock to be acquired. If lock is still not acquired after 20 seconds, it will kill all queries which are running longer that the FLUSH TABLES WITH READ LOCK.
This feature has been implemented in Percona XtraBackup 2.1.4.