Lost innodb tables, xfs and binary grep

Before I start a story about the data recovery case I worked on yesterday, here’s a quick tip – having a database backup does not mean you can restore from it. Always verify your backup can be used to restore the database! If not automatically, do this manually, at least once a month. No, seriously – in most of the recovery cases I worked on, customers did have some sort of backup, but it just wasn’t working, complete and what not. Someone set it up and never bothered to check if it still works after a while.

Anyway, this post is not really about the backups but rather about few interesting things I learned during last recovery case.

First, some facts about the system and how data was lost:

  • MySQL had a dedicated partition on XFS file system
  • Server was running innodb_file_per_table
  • There was a production master and two slaves, all had same setting
  • Developer accidentally ran DROP DATABASE X on the wrong machine (production master)
  • All slaves followed and dropped their copy of the data
  • The important tables were all InnoDB
  • Having a backup, customer has first attempted to restore from backup on the production master

Luckily (or rather, unfortunately) backup only had table definitions but not the data so no data was written to file system. Mind however that restoring a backup could have been fatal if it would have written some junk data as that would have overwritten the deleted files. Now, here’s what I learned while working on this case:

Recovering from XFS is possible. Just a month ago we had a team meeting in Mallorca where we went through various data loss scenarios. One of them was deleted files on xfs – we all agreed on few things:

  • recovering files from xfs is hard, if at all possible
  • we had no recovery cases on xfs, most likely because:
  • whoever is using xfs, is smart enough to have backups set up properly

Now I’m not picking on the customer or anything – indeed they did have a backup set up, it’s just that some (most important) tables weren’t backed up. We did not try any of the file recovery tools for xfs – apparently they are all targeting specific file types and sure enough InnoDB is not one of the supported files. What we did is we simply ran page_parser on the (already) unmounted file system treating it as a raw device. I was surprised how amazingly simple and fast it was (did you know that latest version of page_parser identifies pages by infimum and supremum records?) – 10G partition was scanned in like 5 minutes and all 4G of innodb pages were successfully written to a separate partition. That’s the easy part though – you run page parser, wait and see what you get.

If InnoDB Data Dictionary was not overwritten by an attempt to restore from the backup, actually second part would’ve been quite easy too, but it was so I could no longer identify correct PK id for specific tables by just mapping data dictionary table records to index records. Instead I had to grep for specific character sequences against all pages. Note however that only works for text in uncompressed text columns (varchar, char, text) but what if tables don’t have any text columns at all? Then, you read further.

GNU grep won’t match binary strings. This isn’t new, I kind of knew grep couldn’t look for binary “junk”, but I really needed it to. Why? Well, here’s few of the scenarios we’ve gone through yesterday:

1. There was this rather big table with integer and enum columns only, where we knew a rather unique PK, well something like 837492636 so we needed a way to find pages that match it. InnoDB would internally store integers in 4-bytes rather than 10 bytes if it were stored as a sequence of characters, so “grep -r 837492636 /dir” would not have worked.

2. There was another table, a small one with 4 smallint columns where all we could match on was a sequence of numbers from a single record – customer knew that there was at least one row with the following sequence: 7, 3, 7, 8. Matching by any of the numbers would be insane as it would match all of the pages while matching on numbers as a sequence of characters would not work for many reasons.

This is where I found bgrep which was exactly the tool for the task. In the case number one, I have just converted number 837492636 to it’s binary representation 0x31EB1F9C and ran “bgrep 31EB1F9C /dir” – there were only like 10 other matches across the 4 gigabytes of pages, some of them probably from the secondary pages, but when you only have that many pages to check it’s really simple.

Second case seemed somewhat complicated, but it really wasn’t – all of the columns were fixed size – 2bytes each, so the thing we had to look for was this sequence: 0007000300070008. I was expecting a lot of mismatches but in fact I ended up with only one match pointing exactly to the right page and so the right index id.

The other thing I would note about bgrep – it was so much faster than matching text using grep, so if you happen to have a lot of data to scan and you have to choose between matching text and number, matching a number using bgrep may work much better.

We are considering shipping bgrep as part of percona recovery toolset, with some additional converters so we can match against various date/time columns as well.

Share this post

Comments (13)

  • Using XtraBackup on NFS for MySQL backups | MySQL Performance Blog

    […] are completely ready to use. (You should test your backups even more than that — we’ve blogged before about that. Let us know if you want help with […]

    November 9, 2010 at 12:00 am
  • bob

    “cat -v” can sometimes help when trying to grep binary files

    November 9, 2010 at 4:31 am
  • Aurimas Mikalauskas

    Bob, I’m not sure you could have used cat in this case – I don’t think it would represent numbers in a way you can grep then. You could use “od” or “hexdump” however, though it’s still not as convenient and fast.

    November 9, 2010 at 4:55 am
  • LenZ

    Awesome work, thanks for sharing this! How much of the customer’s data were you able to recover this way?

    November 9, 2010 at 5:37 am
  • Aurimas Mikalauskas

    Thanks, Lenz. It is aways a tricky question, when it comes to recovery. Customers usually don’t ask us to assess that and it’s not an easy question to answer anyway, unless you know exactly how much data you should have had. Either way, I should have mentioned in the post that all of the tables customer asked for were recovered (recovery is done manually table by table) and the amount of data recovered was, at least how it seemed to the customer, correct. There is a chance some of data was overwritten but it was negligible (if anything at all) compared to what was restored.

    November 9, 2010 at 6:13 am
  • John Laur

    The method I use to verify that my production backups are working is to restore them to our staging and development database servers as part of the normal course of test and dev. We use a complex environment with NDB, mixed storage engines, and replication. Making the restore process integral to some part of day-to-day work ensures that if it stops working it gets noticed very quickly.

    November 10, 2010 at 2:56 pm
  • John Laur

    Oh yeah we store all of it on xfs too so there you have some more corroboration for your point “whoever is using xfs, is smart enough to have backups set up properly”

    Personally I wouldn’t even bother attempting to recover from xfs; I did a recovery on reiserfs with tailing once and that was enough for me! Kudos to you guys.

    November 10, 2010 at 3:01 pm
  • Baron Schwartz

    John, congratulations — your recovery testing is not typical, sadly. I think testing once a month is way too seldom. I would not go longer than a week. Especially for a fast-growing app, when data size might grow a lot from day to day. See comments on Morgan’s previous post: http://www.mysqlperformanceblog.com/2010/11/08/an-argument-for-not-using-mysqldump-in-production/

    My least favorite experience with reiserfs was after a power outage. Afterwards, the server ran fsck all day and all night for a fairly small volume — probably 100GB or less, as I recall. In the morning, the filesystem had nothing on it; fsck had moved everything to the “I don’t know what this data is, here’s a snippet” directory. That soured me on reiserfs. A good filesystem should be able to recover from something like that, and reasonably fast.

    November 10, 2010 at 3:31 pm
  • Aurimas Mikalauskas

    Baron, sure – by “at least once a month” I only meant that once a month is much better than never and if you (seem to) not have time to implement something smart around that like John has, having that done manually every now and then still leaves a great chance of catching a problem early on. YMMV though 🙂

    November 11, 2010 at 4:48 am
  • Aurimas Mikalauskas

    John, that’s indeed a very nice approach to it. I’ve only seen very few customers do it that way though..

    November 11, 2010 at 4:50 am
  • Baron Schwartz

    Aurimas, yeah I didn’t mean to disagree with you — it’s hard enough to convince people to test backups at all 🙂

    November 11, 2010 at 5:52 am

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.