MySQL Distributed Logical Backups: a Proof of Concept

January 9, 2020
Author
Daniel Guzmán Burgos
Share this Post:

The importance of having periodic backups is a given in Database life. There are different flavors: binary ones (Percona XtraBackup), binlog backups, disk snapshots (lvm, ebs, etc) and the classic ones: logical backups, the ones that you can take with tools like mysqldump, mydumper, or mysqlpump. Each of them with a specific purpose, MTTRs, retention policies, etc.

Another given is the fact that taking backups can be a very slow task as soon as your datadir grows: more data stored, more data to read and backup. But also, another fact is that not only does data grow but also the amount of MySQL instances available in your environment increases (usually). So, why not take advantage of more MySQL instances to take logical backups in an attempt to make this operation faster?

Distributed Backups (or Using all the Slaves Available)

The idea is simple: instead of taking the whole backup from a single server, use all the servers available. This Proof of Concept is focused only on using the replicas on a Master/Slave(s) topology. One can use the Master too, but in this case, I’ve decided to leave it alone to avoid adding the backup overhead.

Tests!

On a Master/3-Slaves topology:

Graph from the Orchestrator GUI
Graph from the Orchestrator GUI

With a small datadir of around 64GB of data (without the index size) and 300 tables (schema “sb”):

Using the 3 replicas, the distributed logical backup with mysqldump took 6 minutes, 13 seconds:

Same backup type on a single replica took 11 minutes, 59 seconds:

In other words:

The distributed one was 48% faster!

And this is a fairly small dataset. Worth the shot. So, how does it work?

Concepts

The logic is simple and can be divided into stages. 

Stage 1: Preparation

  • Find out how many replicas there are available
  • Find out the number of tables in the schema you want to take a backup of
  • Divide the number of tables between all the available replicas. The resultant chunks will be the tables each replica will backup.

Stage 2: Guarantee Consistency

  • Prevent the Master from executing operations that change the binlog position. Typically this is done with FLUSH TABLES WITH READ LOCK, but this PoC is using the cool feature of LOCK BINLOG FOR BACKUP available on Percona Server for MySQL and is way less disruptive.
  • Find the most up-to-date replica
  • Make all the other replicas match the most up to date one with START SLAVE UNTIL
  • Fire up a mysqldump per replica with the correspondent chunk of tables and use –lock-for-backup (another Percona Server feature)

The full script can be found here:

https://github.com/nethalo/parallel-mysql-backup/blob/master/dist_backup.sh

Worth to note that the script has its own log that will describe every step, it looks like this:

Requirements

Some basic requirements:

  • Since the tool uses the command SHOW SLAVE HOSTS, it is mandatory to set the variable report_host, which if you are using Orchestrator, you most likely have it set already.
  • The host set in the “report_host” variable should be one that is accessible. For example, an IP or a host that can actually be resolved (DNS, editing /etc/hosts file).
  • No Replication Filters on any of the replicas involved. This to guarantee data consistency.
  • The script currently should be run locally in the Master server.
  • It only works on Percona Server due to the usage of Backup Locks.
  • MySQL user credentials are expected to be available in the home dir inside the .my.cnf file.

We Would Like Your Feedback!

Interesting or not?

  • Is this something that would come handy for your backup operations?
  • Is there something else you would like to see from the script?
  • Is there something missing?

With this being a Proof of Concept, it lacks features that eventually (if this becomes a more mature tool) will arrive, like:

    • Adding weights to the slaves so the distribution can be modified
    • Option to use the Master as one of the backup servers, if desired
    • Use FTWRL when the server is not Percona Server
    • Use MyDumper/MysqlPump with multi-threads instead of MySQLDump
    • Etc…

Let us know in the comments section!

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved