Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live. Register

Downloads

Blog

Backup and Restore with MyDumper on Docker

May 16, 2023

Author

David Ducos

Insight for DBAs

MySQL

Share this Post:

At the end of 2021, I pushed the first Docker image to hub.docker.com. This was the first official image and since then, we have been improving our testing and packaging procedures based on Docker, CircleCI, and GitHub Actions. However, when I’m coding, I’m not testing in Docker. But a couple of weeks ago, when I was reviewing an issue, I realized some interesting Docker use cases that I want to share.

Common use case

First, we are going to review how to take a simple backup with MyDumper to warm you up:

docker run --name mydumper <br>     --rm <br>     -v ${backups}:/backups  <br>     mydumper/mydumper:v0.14.4-7 <br>     sh -c "rm -rf /backups/data; <br>          mydumper -h 172.17.0.5 <br>               -o /backups/data <br>               -B test <br>               -v 3 <br>               -r 1000 <br>               -L /backups/mydumper.log"

1	docker run --name mydumper <br> --rm <br> -v ${backups}:/backups <br> mydumper/mydumper:v0.14.4-7 <br> sh -c "rm -rf /backups/data; <br> mydumper -h 172.17.0.5 <br> -o /backups/data <br> -B test <br> -v 3 <br> -r 1000 <br> -L /backups/mydumper.log"

You will find the backup files and the log on ${backups}. Then you can restore it using:

docker run --name mydumper <br>     --rm <br>     -v ${backups}:/backups <br>     mydumper/mydumper:v0.14.4-7 <br>     sh -c "myloader -h 172.17.0.4 <br>               -d /backups/data <br>               -B test <br>               -v 3 <br>               -o <br>               -L /backups/myloader.log"

1	docker run --name mydumper <br> --rm <br> -v ${backups}:/backups <br> mydumper/mydumper:v0.14.4-7 <br> sh -c "myloader -h 172.17.0.4 <br> -d /backups/data <br> -B test <br> -v 3 <br> -o <br> -L /backups/myloader.log"

And if you want to do it faster, you can do it all at once:

docker run --name mydumper <br>     --rm <br>     -v ${backups}:/backups <br>     mydumper/mydumper:v0.14.4-7 <br>     sh -c "rm -rf /backups/data; <br>          mydumper -h 172.17.0.5 <br>               -o /backups/data <br>               -B test <br>               -v 3 <br>               -r 1000 <br>               -L /backups/mydumper.log ; <br>          myloader -h 172.17.0.4 <br>               -d /backups/data <br>               -B test <br>               -v 3 <br>               -o <br>               -L /backups/myloader.log"

docker run --name mydumper --rm -v ${backups}:/backups mydumper/mydumper:v0.14.4-7 sh -c "rm -rf /backups/data; mydumper -h 172.17.0.5 -o /backups/data -B test -v 3 -r 1000 -L /backups/mydumper.log ; myloader -h 172.17.0.4 -d /backups/data -B test -v 3 -o -L /backups/myloader.log"

We can remove the option to mount a volume (-v ${backups}:/backups), as the data will reside inside the container.

Advance use case

Since version 0.14.4-7, I created the Docker image with ZSTD instead of GZIP because it is faster. Other options that are always useful are –rows/-r and –chunk-filesize/-F. On the latest releases, you can run ‘100:1000:0’ for -r, which means:

100 as the minimal chunk size

1000 will be the starting point

0 means that there won’t be a maximum limit

And in this case, where we want small files to be sent to myloader as soon as possible, and because we don’t care about the number of files either, -F will be set to 1.

In the next use case, we are going to stream the backup through the stdout from mydumper to myloader, streaming the content without sharing the backup dir:

docker run --name mydumper <br>     --rm <br>     -v ${backups}:/backups <br>     mydumper/mydumper:v0.14.4-7 <br>     sh -c "rm -rf /backups/data; <br>          mydumper -h 172.17.0.5 <br>               -o /backups/data <br>               -B test <br>               -v 3 <br>               -r 100:1000:0 <br>               -L /backups/mydumper.log <br>               -F 1 <br>               --stream <br>               -c <br>        | myloader -h 172.17.0.4 <br>               -d /backups/data_tmp <br>               -B test <br>               -v 3 <br>               -o <br>               -L /backups/myloader.log <br>               --stream"

docker run --name mydumper --rm -v ${backups}:/backups mydumper/mydumper:v0.14.4-7 sh -c "rm -rf /backups/data; mydumper -h 172.17.0.5 -o /backups/data -B test -v 3 -r 100:1000:0 -L /backups/mydumper.log -F 1 --stream -c | myloader -h 172.17.0.4 -d /backups/data_tmp -B test -v 3 -o -L /backups/myloader.log --stream"

In this case, backup files will be created on /backups/data, sent through the pipeline, and stored on /backups/data_tmp until myloader imports that backup file, and then it will remove it.

To optimize this procedure, now, we can share the backup directory setting –stream to NO_STREAM_AND_NO_DELETE, which is not going to stream the content of the file but is going to stream the filename, and it will not delete it as we want the file to be shared to myloader:

docker run --name mydumper <br>     --rm <br>     -v ${backups}:/backups <br>     mydumper/mydumper:v0.14.4-7 <br>     sh -c "rm -rf /backups/data; <br>          mydumper -h 172.17.0.5 <br>               -o /backups/data <br>               -B test <br>               -v 3 <br>               -r 100:1000:0 <br>               -L /backups/mydumper.log <br>               -F 1 <br>               --stream=NO_STREAM_AND_NO_DELETE <br>               -c <br>        | myloader -h 172.17.0.4 <br>               -d /backups/data <br>               -B test <br>               -v 3 <br>               -o <br>               -L /backups/myloader.log <br>               --stream"

docker run --name mydumper --rm -v ${backups}:/backups mydumper/mydumper:v0.14.4-7 sh -c "rm -rf /backups/data; mydumper -h 172.17.0.5 -o /backups/data -B test -v 3 -r 100:1000:0 -L /backups/mydumper.log -F 1 --stream=NO_STREAM_AND_NO_DELETE -c | myloader -h 172.17.0.4 -d /backups/data -B test -v 3 -o -L /backups/myloader.log --stream"

As you can see, the directory is the same. Myloader will delete the files after importing them, but if you want to keep the backup files, you should use –stream=NO_DELETE.

The performance gain will vary depending on the database size and number of tables. This can also be combined with another MyDumper feature, masquerade your backups, which allows you to build safer QA/Testing environments.

Conclusion

MyDumper, which already has proven that it is the fastest logical backup solution, now offers a simple and powerful way to migrate data in a dockerized environment.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Try Percona Distribution for MySQL today!

0 0 votes

Article Rating

4 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Eric

3 years ago

for large tables (e.g. > 1TB) how does this perform? How much disk space would you recommend I allocate to docker in such a case? twice the size of the largest table?

Author

David Ducos

3 years ago

Reply to Eric

Eric, you need to consider several things:

One single table of 1TB of data? or data + indexes?
Let’s assume is just data, then, will it be split? that is what I recommend. Use -r and -F.
Let’s assume that is possible to split the table. Exporting data is much faster than import data, but it also depend on how you configure the importing server (some recommendations can be found https://github.com/mydumper/mydumper/wiki/Restore-optimizations) and the hardware specs.
If you are using ZSTD version then you 1TB could end up being 500GB compressed.

As you can see, there are so many things to take into consideration, I think that the best is to start with half of the size of the database and reduce when you get a better idea of the space used.

hustegg

1 year ago

Specified with –directory (-d), myloader terminated with error:
[ERROR] - Backup directory (-d) must not exist when --stream / --stream=TRADITIONAL,
How to dump.& load in stream without writing files twice for mydumper –stream=NO_STREAM_AND_NO_DELETE with myloader –stream ?

hustegg

1 year ago

According to the latest code, I found myloader support –stream=NO_STREAM, which is missing in help, but it really works with
mydumper --stream=NO_STREAM_AND_NO_DELETE | myloader --stream=NO_STREAM