Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

MySQL Distributed Logical Backups: a Proof of Concept

January 9, 2020

Author

Daniel Guzmán Burgos

MySQL

Percona Software

Share this Post:

The importance of having periodic backups is a given in Database life. There are different flavors: binary ones (Percona XtraBackup), binlog backups, disk snapshots (lvm, ebs, etc) and the classic ones: logical backups, the ones that you can take with tools like mysqldump, mydumper, or mysqlpump. Each of them with a specific purpose, MTTRs, retention policies, etc.

Another given is the fact that taking backups can be a very slow task as soon as your datadir grows: more data stored, more data to read and backup. But also, another fact is that not only does data grow but also the amount of MySQL instances available in your environment increases (usually). So, why not take advantage of more MySQL instances to take logical backups in an attempt to make this operation faster?

Distributed Backups (or Using all the Slaves Available)

The idea is simple: instead of taking the whole backup from a single server, use all the servers available. This Proof of Concept is focused only on using the replicas on a Master/Slave(s) topology. One can use the Master too, but in this case, I’ve decided to leave it alone to avoid adding the backup overhead.

Tests!

On a Master/3-Slaves topology:

With a small datadir of around 64GB of data (without the index size) and 300 tables (schema “sb”):

+--------------+--------+--------+-----------+----------+-----------+----------+
| TABLE_SCHEMA | ENGINE | TABLES | ROWS      | DATA (M) | INDEX (M) | TOTAL(M) |
+--------------+--------+--------+-----------+----------+-----------+----------+
| meta         | InnoDB | 1      |         0 | 0.01     | 0.00      |   0.01   |
| percona      | InnoDB | 1      |         2 | 0.01     | 0.01      |   0.03   |
| sb           | InnoDB | 300    | 295924962 | 63906.82 |   4654.68 | 68561.51 |
| sys          | InnoDB | 1      |         6 | 0.01     | 0.00      |   0.01   |
+--------------+--------+--------+-----------+----------+-----------+----------+

+--------------+--------+--------+-----------+----------+-----------+----------+

+--------------+--------+--------+-----------+----------+-----------+----------+

| meta | InnoDB | 1 | 0 | 0.01 | 0.00 | 0.01 |

| percona | InnoDB | 1 | 2 | 0.01 | 0.01 | 0.03 |

| sb | InnoDB | 300 | 295924962 | 63906.82 | 4654.68 | 68561.51 |

| sys | InnoDB | 1 | 6 | 0.01 | 0.00 | 0.01 |

+--------------+--------+--------+-----------+----------+-----------+----------+

Using the 3 replicas, the distributed logical backup with mysqldump took 6 minutes, 13 seconds:

[root@mysql1 ~]# ls -lh /data/backups/20200101/
total 56G
-rw-r--r--. 1 root root 19G Jan  1 14:37 mysql2.sql
-rw-r--r--. 1 root root 19G Jan  1 14:37 mysql3.sql
-rw-r--r--. 1 root root 19G Jan  1 14:37 mysql4.sql
[root@mysql1 ~]# stat /data/backups/20200101/mysql2.sql
  File: '/data/backups/20200101/mysql2.sql'
  Size: 19989576285     Blocks: 39042144 IO Block: 4096   regular file
Device: 10300h/66304d   Inode: 54096034 Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 0/ root) Gid: (    0/ root)
Context: unconfined_u:object_r:unlabeled_t:s0
Access: 2020-01-01 14:31:34.948124516 +0000
Modify: 2020-01-01 14:37:41.297640837 +0000
Change: 2020-01-01 14:37:41.297640837 +0000
 Birth: -

[root@mysql1 ~]# ls -lh /data/backups/20200101/

total 56G

-rw-r--r--. 1 root root 19G Jan 1 14:37 mysql2.sql

-rw-r--r--. 1 root root 19G Jan 1 14:37 mysql3.sql

-rw-r--r--. 1 root root 19G Jan 1 14:37 mysql4.sql

[root@mysql1 ~]# stat /data/backups/20200101/mysql2.sql

File: '/data/backups/20200101/mysql2.sql'

Size: 19989576285 Blocks: 39042144 IO Block: 4096 regular file

Device: 10300h/66304d Inode: 54096034 Links: 1

Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)

Context: unconfined_u:object_r:unlabeled_t:s0

Access: 2020-01-01 14:31:34.948124516 +0000

Modify: 2020-01-01 14:37:41.297640837 +0000

Change: 2020-01-01 14:37:41.297640837 +0000

Birth: -

Same backup type on a single replica took 11 minutes, 59 seconds:

[root@mysql1 ~]# time mysqldump -hmysql2 --single-transaction --lock-for-backup sb > /data/backup.sql

real    11m58.816s
user    9m48.871s
sys     2m6.492s
[root@mysql1 ~]# ls -lh /data/backup.sql
-rw-r--r--. 1 root root 56G Jan  1 14:52 /data/backup.sql

[root@mysql1 ~]# time mysqldump -hmysql2 --single-transaction --lock-for-backup sb > /data/backup.sql

real 11m58.816s

user 9m48.871s

sys 2m6.492s

[root@mysql1 ~]# ls -lh /data/backup.sql

-rw-r--r--. 1 root root 56G Jan 1 14:52 /data/backup.sql

In other words:

The distributed one was 48% faster!

And this is a fairly small dataset. Worth the shot. So, how does it work?

Concepts

The logic is simple and can be divided into stages.

Stage 1: Preparation

- Find out how many replicas there are available

- Find out the number of tables in the schema you want to take a backup of

- Divide the number of tables between all the available replicas. The resultant chunks will be the tables each replica will backup.

Stage 2: Guarantee Consistency

- Prevent the Master from executing operations that change the binlog position. Typically this is done with FLUSH TABLES WITH READ LOCK, but this PoC is using the cool feature of LOCK BINLOG FOR BACKUP available on Percona Server for MySQL and is way less disruptive.

- Find the most up-to-date replica

- Make all the other replicas match the most up to date one with START SLAVE UNTIL

- Fire up a mysqldump per replica with the correspondent chunk of tables and use –lock-for-backup (another Percona Server feature)

The full script can be found here:

https://github.com/nethalo/parallel-mysql-backup/blob/master/dist_backup.sh

Worth to note that the script has its own log that will describe every step, it looks like this:

[200101-16:01:19] [OK] Found 'mysql' bin
[200101-16:01:19] [Info] SHOW SLAVE HOSTS executed
[200101-16:01:19] [Info] Count tables OK
[200101-16:01:19] [Info] table list gathered
[200101-16:01:19] [Info] CREATE DATABASE IF NOT EXISTS percona
[200101-16:01:19] [Info] CREATE TABLE IF NOT EXISTS percona.metabackups
[200101-16:01:19] [Info] TRUNCATE TABLE percona.metabackups
[200101-16:01:19] [Info] Executed INSERT INTO percona.metabackups (host,chunkstart) VALUES('mysql3',0)
[200101-16:01:19] [Info] Executed INSERT INTO percona.metabackups (host,chunkstart) VALUES('mysql4',100)
[200101-16:01:19] [Info] Executed INSERT INTO percona.metabackups (host,chunkstart) VALUES('mysql2',200)
[200101-16:01:19] [Info] lock binlog for backup set
[200101-16:01:19] [Info] slave status position on mysql3
[200101-16:01:19] [Info] slave status file on mysql3
[200101-16:01:19] [Info] slave status position on mysql4
[200101-16:01:19] [Info] slave status file on mysql4
[200101-16:01:19] [Info] slave status position on mysql2
[200101-16:01:19] [Info] slave status file on mysql2
[200101-16:01:19] [Info] set STOP SLAVE; START SLAVE UNTIL MASTER_LOG_FILE = 'mysql-bin.000358', MASTER_LOG_POS = 895419795 on mysql3
[200101-16:01:20] [Info] set STOP SLAVE; START SLAVE UNTIL MASTER_LOG_FILE = 'mysql-bin.000358', MASTER_LOG_POS = 895419795 on mysql4
[200101-16:01:20] [Info] set STOP SLAVE; START SLAVE UNTIL MASTER_LOG_FILE = 'mysql-bin.000358', MASTER_LOG_POS = 895419795 on mysql2
[200101-16:01:20] [Info] Created /data/backups/20200101/ directory
[200101-16:01:20] [Info] Limit chunk OK
[200101-16:01:20] [Info] Tables list for mysql3 OK
[200101-16:01:20] [OK] Dumping mysql3
[200101-16:01:20] [Info] Limit chunk OK
[200101-16:01:20] [Info] Tables list for mysql4 OK
[200101-16:01:20] [OK] Dumping mysql4
[200101-16:01:20] [Info] Limit chunk OK
[200101-16:01:20] [Info] Tables list for mysql2 OK
[200101-16:01:20] [OK] Dumping mysql2
[200101-16:01:20] [Info] UNLOCK BINLOG executed
[200101-16:01:20] [Info] set start slave on mysql3
[200101-16:01:20] [Info] set start slave on mysql4
[200101-16:01:20] [Info] set start slave on mysql2

[200101-16:01:19] [OK] Found 'mysql' bin

[200101-16:01:19] [Info] SHOW SLAVE HOSTS executed

[200101-16:01:19] [Info] Count tables OK

[200101-16:01:19] [Info] table list gathered

[200101-16:01:19] [Info] CREATE DATABASE IF NOT EXISTS percona

[200101-16:01:19] [Info] CREATE TABLE IF NOT EXISTS percona.metabackups

[200101-16:01:19] [Info] TRUNCATE TABLE percona.metabackups

[200101-16:01:19] [Info] Executed INSERT INTO percona.metabackups (host,chunkstart) VALUES('mysql3',0)

[200101-16:01:19] [Info] Executed INSERT INTO percona.metabackups (host,chunkstart) VALUES('mysql4',100)

[200101-16:01:19] [Info] Executed INSERT INTO percona.metabackups (host,chunkstart) VALUES('mysql2',200)

[200101-16:01:19] [Info] lock binlog for backup set

[200101-16:01:19] [Info] slave status position on mysql3

[200101-16:01:19] [Info] slave status file on mysql3

[200101-16:01:19] [Info] slave status position on mysql4

[200101-16:01:19] [Info] slave status file on mysql4

[200101-16:01:19] [Info] slave status position on mysql2

[200101-16:01:19] [Info] slave status file on mysql2

[200101-16:01:19] [Info] set STOP SLAVE; START SLAVE UNTIL MASTER_LOG_FILE = 'mysql-bin.000358', MASTER_LOG_POS = 895419795 on mysql3

[200101-16:01:20] [Info] set STOP SLAVE; START SLAVE UNTIL MASTER_LOG_FILE = 'mysql-bin.000358', MASTER_LOG_POS = 895419795 on mysql4

[200101-16:01:20] [Info] set STOP SLAVE; START SLAVE UNTIL MASTER_LOG_FILE = 'mysql-bin.000358', MASTER_LOG_POS = 895419795 on mysql2

[200101-16:01:20] [Info] Created /data/backups/20200101/ directory

[200101-16:01:20] [Info] Limit chunk OK

[200101-16:01:20] [Info] Tables list for mysql3 OK

[200101-16:01:20] [OK] Dumping mysql3

[200101-16:01:20] [Info] Limit chunk OK

[200101-16:01:20] [Info] Tables list for mysql4 OK

[200101-16:01:20] [OK] Dumping mysql4

[200101-16:01:20] [Info] Limit chunk OK

[200101-16:01:20] [Info] Tables list for mysql2 OK

[200101-16:01:20] [OK] Dumping mysql2

[200101-16:01:20] [Info] UNLOCK BINLOG executed

[200101-16:01:20] [Info] set start slave on mysql3

[200101-16:01:20] [Info] set start slave on mysql4

[200101-16:01:20] [Info] set start slave on mysql2

Requirements

Some basic requirements:

- Since the tool uses the command SHOW SLAVE HOSTS, it is mandatory to set the variable report_host, which if you are using Orchestrator, you most likely have it set already.

- The host set in the “report_host” variable should be one that is accessible. For example, an IP or a host that can actually be resolved (DNS, editing /etc/hosts file).

- No Replication Filters on any of the replicas involved. This to guarantee data consistency.

- The script currently should be run locally in the Master server.

- It only works on Percona Server due to the usage of Backup Locks.

- MySQL user credentials are expected to be available in the home dir inside the .my.cnf file.

We Would Like Your Feedback!

Interesting or not?

- Is this something that would come handy for your backup operations?

- Is there something else you would like to see from the script?

- Is there something missing?

With this being a Proof of Concept, it lacks features that eventually (if this becomes a more mature tool) will arrive, like:

- - - Adding weights to the slaves so the distribution can be modified
  - - Option to use the Master as one of the backup servers, if desired
  - - Use FTWRL when the server is not Percona Server
  - - Use MyDumper/MysqlPump with multi-threads instead of MySQLDump
  - - Etc…

Let us know in the comments section!

0 0 votes

Article Rating

6 Comments

Oldest

Newest Most Voted

Andy Moore

6 years ago

Outside the box thinking from our friends at Percona ??

Brad Mickel

6 years ago

I really like this idea, but why do you wait until the backup is complete to unlock the binlog on the master? Couldn’t that be done after issuing the “START SLAVE UNTIL” commands? This way it allows the master to commit again, but since the slaves are stopped their backups would still be consistent.

Daniel Guzmán Burgos

6 years ago

Reply to Brad Mickel

Hi Brad, it doesn’t. The backup command have an ampersand (&) at the end, which means that the execution of that command will continue in a async way. Unlock happens within the second.

Vaibhav

6 years ago

Simple and great idea.

Just to add few things as a suggestion ..

events and routines
to dump events , procedures and functions

also it would be great to have a hostname as part of each backup.

and last it would be great if there is a check for replication filters before execution of backup.

Eric

6 years ago

Hi,
interesting as approach to split the load, operation impact of “logical” backup done over multiple instance.

My point-of-view:
interesting
– when you don’t have advance storage capabilities, want to limit operational task impact on your production
– by using standard tool, architecture

But:
– doesn’t it make more complex ?
– implementation ?
– if you need to restore our master (user issue –> delete data –> replicated to other slave), you will need to restore your master and rebuild your replicated (DR more complex, RTO impacted)
– what is the cost on running multiple ‘host” and additional storage for (in this scenario) 3 replicas ?
– as you mention there is disk/Storage solution, certainly on modern one, they have advance feature can provide a other magnitude of service and in this context:
– snapshot : for TB, it takes seconds (only delta)
– clone : for TB, it takes seconds to have a clone of the running (real-time) master MySQL instance
– time efficiency: very fast
– space efficiency : for the read –> access the original block of the Master databases (shared resource). With your database of e.g. 1 TB you could have a clone with some MB/GB. You consume the same storage resource (cpu, disk, …) and to limit this you could do the clone from a older snapshot/backup.

Buchan

6 years ago

What you are really doing here is comparing a single serialized dump of all tables, to a parallel (n=3) dump of tables.

Could you test with mydumper ( https://github.com/maxbube/mydumper ), in order to differentiate between scaling via multiple connections dumping vs multiple servers?

Also, in my experience, parallelism in the dump part is less than half the problem, as restore from mysqldump typically scales much worse than creating the mysqldump.

Finally, the other problem that can occur is when, even though you have 100 tables, inevitably there will be one (or a few) table(s) that make up large percentage of the entire database size (e.g. an audit trail table or similar). mydumper seems to have features to help in this case, e.g. by chunking the tables into parts.