November 28, 2014

Thoughs on Innodb Incremental Backups

For normal Innodb “hot” backups we use LVM or other snapshot based technologies with pretty good success. However having incremental backups remain the problem.

First why do you need incremental backups at all ? Why not just take the full backups daily. The answer is space – if you want to keep several generations to be able to restore to, having huge amount of full copies of large database is not efficient. Especially if it only changes couple of percents per day.

The solution MySQL offers – using binary log works in theory but it is not overly useful in practice because it may take way too long to catch up using binary log. Even if you have very light updates and can execute updates for a full day within an hour it will take over 24 hours to cover month worth of binary logs… and quite typically you would have much higher update traffic.

Another solution is rdiff which is a great general purpose tool. Though you can do much better with Innodb in Particular.

The Innodb pages have great deal of information helpful for their incremental backup in their internal. There is basically page version allowing to quickly check if the page is newer. There is page checksum and finally there is an offset of page (where it should be in the data file) stored in the page.

Using this data it should be easy to implement very efficient and yet simple for Incremental backup for Innodb.

In a way similar to rdiff the tool could both update the backup and store the rollback changes or if dealing with read-only compressed backup create the roll-forward recovery log, which also can be easily compressed.

What tool would need to do is to go through the pages for each Innodb file and simply write all the new pages to the separate file. Because pages already have position information in them there is no need to have complex “diff” meta data.

For recovery we can simply read this new pages file and put the pages back to their original places.

Of course this means .frm files and Innodb logs and MyISAM system tables need to be copied fully but they typically do not have any considerable portion of Innodb database

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Michael says:

    I’ve recommended it many times before here in the comments section:

    (No, I have no association with this company). We use a product called CDP by a company named R1Soft. http://www.r1soft.com/

    It works EXTREMELY well for incremental mysql backups (no need for lvm, etc). We’ve been using it for a year, and it “just works”.

    I highly recommend you try it.

    Michael

  2. Andrew Aksyonoff says:

    > The Innodb pages have great deal of information helpful for their incremental backup in their internal.

    I wonder if that information can be *properly* accessed by an external tool, or whether the tool will need to be built into MySQL.

  3. peter says:

    This information is stored in files so it can be perfectly accessed by external tool. The LVM Backup or other snapshot suggested is to ensure the state of files is atomic.

  4. Andrew Aksyonoff says:

    > The LVM Backup or other snapshot suggested is to ensure the state of files is atomic.

    Yep, that’s exactly what I meant. So the tool would have to work with LVM snapshots, correct? Just wondering.

  5. peter says:

    Michael,

    Thanks. We have some customers using it and we have had a work to give it a good test for some time – what kind of overhead does it have, what are performance parameters etc.

    At the same time I really would like to see some Open Source solution as an alternative.

  6. Michael, I also reached out to R1Soft to see if they were interested in letting us have access to a system to evaluate so we would be able to knowledgeably recommend it when appropriate (we do not just repeat sales brochures to our clients). There was no reply.

  7. Michael Moody says:

    Peter –

    The things I like:
    1. Can be dropped into an existing system (no LVM required, no refactoring storage to make it work)
    2. Very little performance overhead, reads at the block level, doesn’t push anything out of the disk cache (to us, a very important concern)
    3. Can be used to set up a slave or secondary master without any downtime
    4. Enables me to take backups as often as I want (I take backups every 3 hours, though I could do it more often)
    5. Only needs a short table lock, and then an unlock, and it’s able to backup from there
    6. Supports both windows and linux for a homogenous enviroment

    Things I don’t like:
    1. Only supports ext2/3 and reiserfs (currently)

    @Brian – I’m suprised, they’re very quick to respond to technical problems, maybe sales is a different story, I’ll be happy to show you around our copy if you want, I can easily create a user for you.

    Michael

  8. Brooke says:

    Hi!

    I did a Google search and found this :) We would be very excited to get anyone a copy of the software who would like to try it out.

    Please contact me directly at brooke@r1soft.com

    We are looking forward to your evaluation and feedback!

    Brooke

Speak Your Mind

*