Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Just how useful are binary logs for incremental backups?

July 21, 2009

Author

Morgan Tocker

Insight for DBAs

Share this Post:

We’ve written about replication slaves lagging behind masters before, but one of the other side effects of the binary log being serialized, is that it also limits the effectiveness of using it for incremental backup.Â Let me make up some numbers for the purposes of this example:

We have 2 Servers in a Master-Slave topology.

The database size is 100 GB (same tables on each).

The slave machine barely keeps up with the master (at 90% capacity during peak, 75% during offpeak)

The peak window is 12 hours, the offpeak window is 12 hours.

Provided that the backup method was raw data files, it shouldn’t take much more than 30 minutes to restore 100GB (50MB/s), but to replay one day of binary logsÂ it would take an additional 20 hours ((12 * 0.9) + (12 * 0.75) = 19h48m).

If you wanted to do something like setup a new slave with a 24-hour old backup, and apply the binary logs continuously until it catches up, you will be waiting almost 5 days until that happens (each day has 24hr-19h48m = 4h12m “free” capacity, 19h48m/4h12m = 4.7 days).

So what are the solutions?
If you are using all InnoDB tables, an XtraBackup incremental backup should be much faster than using binary logs.Â You can understand Vadim’s excitement when he announced this feature a few months ago.

If you are using multiple storage engines, then your options are to either try and time delay slaves (to keep them close to up to date), or hope that it’s not often that you need to restore!Â Eventually this problem should be lessened by a MySQL Server feature – parallel execution on slaves.Â Lets hope that it can get enough testers so that it makes it into a new release very quickly.

0 0 votes

Article Rating

6 Comments

Oldest

Newest Most Voted

Admin

Peter Zaitsev

16 years ago

Morgan,

I would mention if you’re running your slave at 90% capacity during peak load you’re looking for trouble big deal.

I would highly recommend slave thread busy no more than 30-50% – this makes sure you have at least some time to take an action as load or data size growths. With 90% you can wake up one morning and see things never catch up 🙂

Daniellek

16 years ago

Hardcore (but working) solution:
Setup: Master+2 slaves (one slave is “production”, and the other is “backup”)

mysql-backup stop
tar czvf ….
mysql-backup start

Author

Morgan Tocker

16 years ago

‘@Peter – Yes, I chose a pretty harsh set of number for the purpose of illustration – but there are a lot of people in this category. If we follow your advice, and repeat the math with 50% slave load (peak), 30% slave load (off peak), it doesn’t look as bad:

To replay logs: (12 * 0.50) + (12 * 0.30) = 9h36m
For slave to come online: (each day has 24hr-9h36m = 14h24m â€œfreeâ€ capacity, will be online in less than a day).

@Daniellek – I think your solution is a good one. Always have a slave with no load up to date and ready to clone from. You still need to remember that it doesn’t count as a backup. There’s a range of problems (i.e. accidental/malicious delete) that you need to build a strategy for and cope with.

Shlomi Noach

16 years ago

Just to mention that Xtrabackup incremental backups do not backup MyISAM/ARCHIVE tables (yet), and so provide incremental backup only fo InnoDB tables.

Justin Swanhart

16 years ago

I’m curious why the implementation notes in that workload preclude UPDATE. UPDATE can be logically decomposed into DELETE followed by INSERT. This doesn’t inhibit transaction serialization.

Robert Hodges

16 years ago

Hi Morgan,

Have you tried Tungsten Replicator? It has built-in time-delay replication. Also, it now has built-in backup/restore support (checked in; will be in an official build shortly). I’m going to look at putting in support for invoking XtraBackup as it seems pretty good, especially the incremental capacity.

Finally, as Peter mentioned, running replication anywhere near full capacity is asking for trouble. One thing we are planning for later in the year in Tungsten is to split events into different “channels” for replication in parallel. Parallel replication seems to be particularly helpful when you have events that are truly slow, such as DDL, or for replication of data that are already sharded.