Lost innodb tables, xfs and binary grepAurimas Mikalauskas
Before I start a story about the data recovery case I worked on yesterday, here’s a quick tip – having a database backup does not mean you can restore from it. Always verify your backup can be used to restore the database! If not automatically, do this manually, at least once a month. No, seriously – in most of the recovery cases I worked on, customers did have some sort of backup, but it just wasn’t working, complete and what not. Someone set it up and never bothered to check if it still works after a while.
Anyway, this post is not really about the backups but rather about few interesting things I learned during last recovery case.
First, some facts about the system and how data was lost:
- MySQL had a dedicated partition on XFS file system
- Server was running innodb_file_per_table
- There was a production master and two slaves, all had same setting
- Developer accidentally ran DROP DATABASE X on the wrong machine (production master)
- All slaves followed and dropped their copy of the data
- The important tables were all InnoDB
- Having a backup, customer has first attempted to restore from backup on the production master
Luckily (or rather, unfortunately) backup only had table definitions but not the data so no data was written to file system. Mind however that restoring a backup could have been fatal if it would have written some junk data as that would have overwritten the deleted files. Now, here’s what I learned while working on this case:
Recovering from XFS is possible. Just a month ago we had a team meeting in Mallorca where we went through various data loss scenarios. One of them was deleted files on xfs – we all agreed on few things:
- recovering files from xfs is hard, if at all possible
- we had no recovery cases on xfs, most likely because:
- whoever is using xfs, is smart enough to have backups set up properly
Now I’m not picking on the customer or anything – indeed they did have a backup set up, it’s just that some (most important) tables weren’t backed up. We did not try any of the file recovery tools for xfs – apparently they are all targeting specific file types and sure enough InnoDB is not one of the supported files. What we did is we simply ran page_parser on the (already) unmounted file system treating it as a raw device. I was surprised how amazingly simple and fast it was (did you know that latest version of page_parser identifies pages by infimum and supremum records?) – 10G partition was scanned in like 5 minutes and all 4G of innodb pages were successfully written to a separate partition. That’s the easy part though – you run page parser, wait and see what you get.
If InnoDB Data Dictionary was not overwritten by an attempt to restore from the backup, actually second part would’ve been quite easy too, but it was so I could no longer identify correct PK id for specific tables by just mapping data dictionary table records to index records. Instead I had to grep for specific character sequences against all pages. Note however that only works for text in uncompressed text columns (varchar, char, text) but what if tables don’t have any text columns at all? Then, you read further.
GNU grep won’t match binary strings. This isn’t new, I kind of knew grep couldn’t look for binary “junk”, but I really needed it to. Why? Well, here’s few of the scenarios we’ve gone through yesterday:
1. There was this rather big table with integer and enum columns only, where we knew a rather unique PK, well something like 837492636 so we needed a way to find pages that match it. InnoDB would internally store integers in 4-bytes rather than 10 bytes if it were stored as a sequence of characters, so “grep -r 837492636 /dir” would not have worked.
2. There was another table, a small one with 4 smallint columns where all we could match on was a sequence of numbers from a single record – customer knew that there was at least one row with the following sequence: 7, 3, 7, 8. Matching by any of the numbers would be insane as it would match all of the pages while matching on numbers as a sequence of characters would not work for many reasons.
This is where I found bgrep which was exactly the tool for the task. In the case number one, I have just converted number 837492636 to it’s binary representation 0x31EB1F9C and ran “bgrep 31EB1F9C /dir” – there were only like 10 other matches across the 4 gigabytes of pages, some of them probably from the secondary pages, but when you only have that many pages to check it’s really simple.
Second case seemed somewhat complicated, but it really wasn’t – all of the columns were fixed size – 2bytes each, so the thing we had to look for was this sequence: 0007000300070008. I was expecting a lot of mismatches but in fact I ended up with only one match pointing exactly to the right page and so the right index id.
The other thing I would note about bgrep – it was so much faster than matching text using grep, so if you happen to have a lot of data to scan and you have to choose between matching text and number, matching a number using bgrep may work much better.
We are considering shipping bgrep as part of percona recovery toolset, with some additional converters so we can match against various date/time columns as well.