EmergencyEMERGENCY? Get 24/7 Help Now!

File carving methods for the MySQL DBA

 | December 23, 2014 |  Posted In: Insight for DBAs, MySQL, Security


File carving methods for the MySQL DBAThis is a long overdue blog post from London’s 44con Cyber Security conference back in September. A lot of old memories were brought to the front as it were; the one I’m going to cover in this blog post is: file carving.

So what is file carving? despite the terminology it’s not going to be a full roast dinner; unless you have an appetite for data which as you’re here I’m assuming you have.

The TL;DR of “what is file carving” is taking a target blob of data (often a multi GB / TB file) and reducing it in to targeted pieces of data, this could be for instance grabbing all the jpeg images in a packet capture / mysqldump; or pulling that single table/schema out of a huge mysqldump with –all-databases (if you’re not using mydumper you really should it avoids issues like this!) aka “Sorting the wheat from the chaff”.

Let’s take for example at the time of writing this post I am looking to extract a single schema out of one such mysqldump –all-database file of around 2GB (2GB of course isn’t large however it’s large enough to give a practical example; the methods for larger files are of course the same). So where to start?

You’ll need the following tools installed:

  1. xxd (you can substitute xxd for od, hexer or any other hex editing / viewing tool you are comfortable with, just make sure it can handle very large files)
  2. grep

Let’s carve out the mysql schema

dbusby@kali:~$ xxd yourdumpfile.sql | grep 'mysql' -B5 | grep 'ASE' -A2 -B2
00003c0: 6e74 2044 6174 6162 6173 653a 2060 6d79 nt Database: my
00003d0: 7371 6c60 0a2d 2d0a 0a43 5245 4154 4520 sql
00003e0: 4441 5441 4241 5345 202f 2a21 3332 3331 DATABASE /*!3231
00003f0: 3220 4946 204e 4f54 2045 5849 5354 532a 2 IF NOT EXISTS*
0000400: 2f20 606d 7973 716c 6020 2f2a 2134 3031 / mysql /*!40

Wonderful so we have some hex representation of the sql dumpfile why on earth do we want the hex? we need to define our offsets. In short our offsets are the position of the start and end of the chunk we intend to carve from the file.

From the above our start offset is 00003d9 at the start of CREATE DATABASE; for those unfamiliar with hexdump outputs I recommend looking at the tool hexer a vi like tool and pressing v to enter visual selection mode select a few characters and you’ll not something as follows “visual selection:  0x000003d9 – …”.

You can of course work out the range visually from the above, 00003d0 is the start of the line, each alphanumeric pair is a single byte the byte offset notation is hexedecimal 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f.

00003d0: 7371 6c60 0a2d 2d0a 0a43 5245 4154 4520 sql.--..CREATE
00003d0 == s, 00003d1 == q, 00003d2 == l And so on, we can easily verify this using xxd
dbusby@kali:~$ xxd -s 0x3d9 yourdumpfile.sql | head -n3
00003d9: 4352 4541 5445 2044 4154 4142 4153 4520 CREATE DATABASE
00003e9: 2f2a 2133 3233 3132 2049 4620 4e4f 5420 /*!32312 IF NOT
00003f9: 4558 4953 5453 2a2f 2060 6d79 7371 6c60 EXISTS*/

right so now we need the end offset, as above we establish a search pattern as the schema data we're carving is in the midst of a larger file we can look for the start of the dump for the next schema.

dbusby@kali:~$ xxd -s 0x3d9 yourdumpfile.sql | grep '--' -A5 | grep C -A2 -B2 | less
0083b19: 2043 7572 7265 6e74 2044 6174 6162 6173 Current Databas
0083b29: 653a 2060 7065 7263 6f6e 6160 0a2d 2d0a e:

I’ve piped into less here as there were many matches to the grep patterns.

From the above we can see a potential offset of 0x83b19 however we want to “backtrack” a few bytes to before the — comment start.

dbusby@kali:~$ xxd -s 0x83b14 yourdumpfile.sql | head -n1
0083b14: 2d2d 0a2d 2d20 4375 7272 656e 7420 4461 --.-- Current Da

Excellent we have our offsets starting at 0x3d9 ending at 0x83b14 we need to now convert base16 (hexidecimal) into base10 fortunatly we can do this usinc the bc utility very easily however we will need to fully expand and make upper case our offsets.

dbusby@kali:~$ echo 'ibase=16;00003D9' | bc
dbusby@kali:~$ echo 'ibase=16;0083B14' | bc
dbusby@kali:~$ echo '539412-985' | bc
dbusby@kali:~$ dd if=yourdumpfile.sql of=mysql.sql skip=985 bs=1 count=538427
538427+0 records in
538427+0 records out
538427 bytes (538 kB) copied, 1.08998 s, 494 kB/s

Let’s discuss this a little; what we have done here is convert our start offset to a base10 count of bytes to offset by when using dd (skip=985) we then convert the end offset to its base10 byte position, and by removing the startoffset base10 value this gives us the size of the chunk we are carving.

We now put this into a dd command line, and voila! we have a mysql.sql file which contains only the mysqldump data.

I hope this post helps somewhat to demystify file carving; the above techniques can be applied to any for of file carving need and is not limited only to mysql files.

David Busby

Information Security Architect

One Comment

  • It’s easy to say “if you’re not using mydumper you really should”, and I’d really like to be able to use mydumper myself as well. I firmly believe it’s what mysqldump should have evolved into ten to fifteen years ago.
    However, as of the last version I tested, which I think was 0.60, mydumper does not handle VIEWs or MyISAM_MRG tables correctly, and also as of that last-tested version, myloader does not work properly with MySQL 5.6. So while it’s a wonderful improvement over mysqldump, it’s not necessarily safe to be used in production yet, especially with MySQL 5.6.

Leave a Reply