Redesign of –lock-ddl-per-table in Percona XtraBackup

Redesign of -lock-ddl-per-table in Percona XtraBackupMySQL 5.7, alongside other many improvements, brought bulk load for creating an index (WL#7277 to be specific), which made ADD INDEX operations much faster by disabling redo logging and making the changes directly to tablespace files. This change requires extra care for backup tools. To block DDL statements on an instance, Percona Server for MySQL implemented LOCK TABLES FOR BACKUP. Percona XtraBackup (PXB) uses this lock for the duration of the backup. This lock does not affect DML statements.

MySQL 5.7 doesn’t have an option to block an instance against DDL and to allow all MDL’s. Thus, Percona XtraBackup has also implemented –lock-ddl-per-table. Before we go into other details, let’s understand how –lock-ddl-per-table works up to now:

  1. PXB starts its work by parsing and copying all redo logs after the checkpoint mark.
  2. Fork a dedicated thread that will keep following new redo log entries.
  3. Get a list of all tablespaces that it will require to copy.
  4. Iterate through the list of tablespaces, for each tablespace, it does the following:
    •  Query INFORMATION_SCHEMA.INNODB_SYS_TABLES or in case of a 8.0 server INFORMATION_SCHEMA.INNODB_TABLES check which table or tables belong to that tablespace ID and take an MDL on the underlying table or tables in case of a shared tablespace.
    • Copy the tablespace .ibd file.

This approach works on the promise that if an MLOG_INDEX_LOAD event (Redo log event generated by bulk load operations to notify backup tools that changes to data files have been omitted from redo log) is encountered by the redo follow thread, it’s safe to continue as tablespaces that we have already copied are protected by MDL and the  MLOG_INDEX_LOAD event is for a tablespace that is yet to be copied.

This promise is not always correct and can lead to inconsistent backup; here are a few examples:

Full-Text Index

Full-Text Index has its own tablespace:

With the current approach, PXB will try to run a SELECT on FTS_000000000000002e_0000000000000508_INDEX_1 for example, which is not something we can do. Here the underlying table that this FTS belongs to may or may not have been protected by MDL, which can cause the FTS index to be copied without protection.

Full-Text Index has a defined name schema, on the above situation we can easily extract the table ID by translating the first 16 characters after FTS_ from hex to decimal, which would give us the underlying table that the FTS belongs to:

FTS_0000000000000529 translates to table_id 1321, however, as you can see above, the FTS in question belongs to a temporary table. That is because when FTS is created for the first time, it has to rebuild the table to create the FTS_DOC_ID column. Again, it is not possible to copy the files under MDL protection.

New Table Added in the Middle of the Backup

Because of the per-table MDL acquisition, if a table has been created after PXB has gathered the list of tablespaces, it’s .ibd will not be copied. Instead, it will be recreated as part of the –prepare phase based on the data added to redo logs. As you can imagine, if the changes are redo-skipped after recreating the table based on redo information, the table will be incomplete.

Shared Tablespaces

Once a shared tablespace is parsed by the lock-ddl-per-table function, it will get a list of all tables created on that tablespace and acquire the MDL on those tables, however, there is no tablespace level MDL, which means there is nothing blocking a new table to be created on this tablespace. If the tablespace has already been copied, this will follow the previous point and be recreated at –prepare phase by parsing redo logs.

Best/Worst Case Scenario

The outcome of such inconsistent backups can be unknown. In the best-case scenario, you will get a crash either in the backup/prepare phase or when using the server. Yes, a crash is a best-case scenario because you will notice the issue right away.

In the worst-case scenario, data will be missed without you noticing it. Here instead of explaining let’s reproduce it.

Get a brand-new instance of MySQL/Percona Server for MySQL 5.7 and download Percona XtraBackup 2.4 prior to 2.4.21 (same can be reproducible with MySQL/PSMySQL 8 and PXB8).

In order to reproduce this scenario, we will be using gdb to pause PXB execution of the backup at a certain time, however, one can do the same by having a big table, which will take some time to copy.

This will start the backup. On a separate session, connect to MySQL and execute:

Back to gdb session, execute:

Your backup will complete. You can now prepare it and restore it on MySQL. Now try to query a.tb1 using the below queries:

Here is an example: