Eventual Consistency in MySQL

We’re told that foreign key constraints are the best way to enforce logical referential integrity (RI) in SQL, preventing rows from becoming orphaned.  But then we learn that the enforcement of foreign keys incurs a significant performance overhead.1,2

MySQL allows us to set FOREIGN_KEY_CHECKS=0 to disable enforcement of RI when the overhead is too high.  But if you later restore enforcement, MySQL does not immediately scan all your data to verify that you haven’t broken any references while the enforcement was disabled.  That’s undoubtedly the best choice, but it leaves our database in a strange state where RI constraints are active, but we cannot assume that all our data satisfies RI.  At some point we do want to perform some quality control, to verify consistency.  How can we do that?

Quality Control Queries

We need to check for orphaned rows in every parent-child relationship in your database.  That is, do an exclusion join from child table to its referenced parent table, and if no matching parent row is found, then the child is an orphan.  For example, we have a parent table Foo and a child table Bar:

You can find orphaned rows in Bar:

If the result set of this query is empty, then there are no orphaned rows.

But there are probably hundreds of foreign key relationships in any complex database.  We can find all foreign keys by querying the INFORMATION_SCHEMA. The KEY_COLUMN_USAGE contains information about both primary keys and foreign keys. If the REFERENCED_* columns are non-null, it’s a foreign key.

This shows that test.Bar has columns X,Y which reference test.Foo columns A,B.

Generating SQL with SQL

Now that we know the information in KEY_COLUMN_USAGE, we can write a query to fetch every distinct KEY_COLUMN_USAGE.CONSTRAINT_NAME where the REFERENCED_* columns are non-null.  From that information, we can generate an exclusion-join query for each foreign key relationship:

It can be a dizzying exercise to write SQL queries that in turn output valid SQL queries, but with a little care and good testing, we can do it.

Using The Quality Control Query

We now have a query that can verify the referential integrity between Foo and Bar.  If we run this query and it returns a non-empty result set, it shows us which rows in Bar are orphans.

That shows that the quality control query can find orphans.  Because we have a way to generate quality control queries automatically, we can run them at any time: