September 18, 2014

Faster Point In Time Recovery with LVM2 Snaphots and Binary Logs

LVM snapshots is one powerful way of taking a consistent backup of your MySQL databases – but did you know that you can now restore directly from a snapshot (and binary logs for point in time recovery) in case of that ‘Oops’ moment? Let me show you quickly how.

This howto assumes that you already have a decent know how of LVM and snaphots and using LVM2 >= 2.02.58 (January 2010) which is when the --merge option was made available to the lvconvert binary. Base installs of Ubuntu 11.04 (Natty) and CentOS 5.7 includes packages with this feature, previous releases might also include them via supplemental repositories i.e. updates on CentOS. If you are using InnoDB, it is also important that your transaction logs (ib_logfile*) are on the same logical volume, if not, you could potentially trigger crash recovery when an LSN mismatch occurs and still end up with inconsistent data.

Now, assuming I have the following logical volumes – mysql-data for my datadir, and mysql-logs for my binary logs. I also have  the latest snapshot of the mysql-data logical volume taken named ‘mysql-data-201202230157‘ using a script* I put together to make sure I have a consistent snapshot of the MySQL data files. Restoring snapshot alone may not be enough since there can be a lot more events from the time of the snapshots until you discover the problem, so it is really important that you have your binary logs on a different LV or copied it someplace else if they are on the same LV before we restore.

Suppose sometime after the snapshot, I accidentally dropped the salaries tables from the employees database! (Oops, I can have an angry mob of employees who may not get their salaries on time!).

With the last snapshot I have above, I should be able to restore up to before the first DELETE statement above. If you use my LVM snapshot script*, it also saves the binary log coordinates when the snapshot was taken and saves it into a file specified as variable on the script. Below is the binary log coordinates for when the last snapshot on my list above.

Using the coordinates above, I can start searching for the position of the DELETE statement so we can skip that after the snapshot restart. Using the below command and some inline searches, I was able to pinpoint the position of the delete statement, it is at 336797160.

Now, let’s restore our data from the snapshot, under the hood restore is really a “rollback” to the snapshot state when it was taken. We will do this using lvconvert’s –merge option, to merge the state of the snapshot to the original LV.

You should shutdown MySQL first, then unmount the logical volume holding the MySQL data. This way you don’t have to deactivate/activate the original logical volume to start the merging. So let’s see if our salaries table is restored.

Success! Your salaries data is back. But, we still have to apply the data after the snapshot, skipping the DROP statement. You should take another snapshot now – in case you missed to skip the DROP statement! I know for a fact that for every MySQL restart, the logs are flushed and a new binary log is created, looking at the current binary logs after restoring the snapshot, I know I have to apply mysql-bin.00022 only starting from position 336796712 and skipping the DROP statement at position 336797160:

So, as you can see, I now have a consistent data and still have my salaries table back.

Because leaving a production server with active snapshots can affect performance, this is not really an advisable backup solution. If your server somehow blew up in flames, hardware problems or encounter and LVM bug, your snapshots are useless. Also, you cannot test restore your snapshots – they are one time use!

However, if you can tolerate the extra IO overhead i.e. development or staging server , then this is still a valid backup method of course on top of your regularly tested (offsite) backups.

Another ideal use case for this method is when you are planning to execute a long running ALTER or server upgrade, this method can be good quick rollback procedure in case something fails during the operation.

* While writing this blog I hacked a quick shell script to create snapshots and uploaded it here. By no means it is perfect, you can use lvmsnap.sh snapshot to create snapshots. The restore functionality is not finished but you can use it for quick testing :) i.e. lvmsnap.sh restore <snapshot-timestamp>

About Jervin Real

Jervin is a member of Percona's Rapid Response Consulting team. When you come to Percona for consulting, chances are he'll be greeting you first. His primary role is to make sure customer issues are handled efficiently and professionally. Jervin joined Percona in April 2010.

Comments

  1. sen says:

    Assume we clone the data from a standby server(S1) in which data replicated from a active master(M) by doing stop slave in standby(S1) to create new standby server(S2) like M —> S1 —-> S2

    The data will be consistent in this approach but while starting the S2 it goes to INNODB crash recovery & takes long time to start.

    How to over come this ?

  2. Jervin Real says:

    @sen,

    It depends on how was the clone was taken, if you can post in detail your procedures and log on the forums – we can try to help you there.

  3. sen says:

    Thanks Jervin.,

    I have posted here in detail:

    http://forum.percona.com/index.php?t=msg&goto=9575&#msg_9575

Speak Your Mind

*