October 24, 2014

Verifying backup integrity with CHECK TABLES

An attendee to Espen’s recent webinar asked how to check tables for corruption. This kind of ties into my recent post on InnoDB’s handling of corrupted pages, because the best way to check for corruption is with CHECK TABLES, but if a page is corrupt, InnoDB will crash the server to prevent access to the corrupt data. As mentioned in that post, this can only be changed by changing InnoDB.

So how are you supposed to check for corruption that might be introduced by bad hardware, a bug, or so forth?

It’s a great question. The answer I would give for most cases is “check your backups for corruption instead of your live server.” You need to do this anyway — a backup that isn’t checked is a ticking time bomb. You need to verify (at least periodically) that your backups are recoverable.

The usual procedure goes like this: copy your backup somewhere, start a server instance on it, and run CHECK TABLES. You can use the mysqlcheck program to do this conveniently.

You could also use innochecksum, which doesn’t require starting the server. But it only verifies that each page’s checksum matches the page’s data, it doesn’t do all the other checks that are built into InnoDB (making sure that the LSNs are sane, for example).

How often? As often as possible. Some people refresh their dev/staging environment every day with last night’s backup, which is a great way to make failures obvious, as long as you verify that it truly does happen (e.g. what if it fails and you keep running with yesterday’s without knowing it?). If you can’t do it daily, then weekly is perfectly acceptable to most people. I’m not saying a specific interval should/ought to be your goal, I’m just remarking on what a lot of people seem to feel good about.

About Baron Schwartz

Baron is the lead author of High Performance MySQL.
He is a former Percona employee.

Comments

  1. I would emphasize that this method can only be trusted when the backup method is a file system snapshot (e.g. with LVM).
    The use of Xtrabackup or InnoDB Hot Backup is based on software which reads the InnoDB data. Thus, it is in itself part of the system, and is susceptible to such corruptions. A file system snapshot is blind to internal issues, and blindly reflects your data.

  2. I could ask around, but I’m fairly sure that XtraBackup and InnoDB Hot Backup could potentially have failure modes that introduce things you could catch with a full-server CHECK. But, you make a good point that it’s more likely they would fail in a way that only an external verification could catch. In this case you’d need some kind of checks against the data itself, e.g. “does each payment have an invoice with a matching amount.”

  3. Shlomi,

    In theory even LVM can introduce problems.. though it is unlikely of course. In reality I believe checking backups taken with Percona Xtrabackup is rather safe because first it checks pages during copying so some corruptions will be spotted during backup already. Second by having logically corrupted data applying logs may assert on prepare. But even if it worked, say you had logical corruption in rarely used region it would be unlikely to be repaired by Xtrabackup backup process.

  4. Peter,
    If Xtrabackup made the backup with no problem, and data recovery on new server worked with no problem – I agree it is highly unlikely that there was a problem in the first place.
    I’m talking about the opposite direction:
    If there is a suspicion for InnoDB corruption, then I suggest Xtrabackup is not the tool to check this with. Because it, in itself, is an InnoDB-software in that it reads/writes InnoDB pages, working with the internals of the engine. So it is possible that in the event of InnoDB corruption, the Xtrabackup would make it even more corrupted, so to speak. The restore process would possibly work on something which was not the exact copy of the data at the time of backup.
    Whereas I expect a file system snapshot to deliver exactly what it sees, regardless of InnoDB or no InnoDB issues.

    In the event of permanent physical error the above has no meaning, I guess. I’m guessing taking checksums on pages or some parity? I don’t actually know if InnoDB has these?

    BTW, I find it very difficult to win the CAPTCHA. Pretty sure I’m not a robot.

Speak Your Mind

*