Lock, Stock and MySQL Backups: Data Guaranteed Webinar Follow Up QuestionsJervin Real
Hello again! On August 16, we delivered a webinar on MySQL backups. As always, we’ve had a number of interesting questions. Some of them we’ve answered on the webinar, but we’d like to share some of them here in writing.
What is the best way to maintain daily full backups, but selective restores omitting certain archive tables?
There are several ways this can be done, listed below (though not necessarily limited to the following):
- Using logical dumps (i.e., mydumper, mysqlpump, mysqldump). This allows you to dump per table and thus be able to selectively restore.
- Backup the important tables and archive tables separately, allowing to restore separately as well. This is a better approach in my opinion, since if the archive tables do not change often you can backup only what has changed. This gives you more flexibility in backup size and speed. This is also possible if consistency or inter-dependence between the archive and other tables aren’t necessary.
- Filesystem- or XtraBackup-based backups are also another option. However, the restore process means you need to restore the full backup and discard what you do not need. This is especially important if your archive tables are using InnoDB (where metadata is stored in the main tablespace).
Can you recommend a good script on github for mysqlbinlog backup?
This is a shameless plug, but I would recommend the tool I wrote called pyxbackup. At the time it was written, binary log streaming with 5.6 was fairly new. So there weren’t many tools that we could find or adopt that would closely integrate with backups. Hence writing from scratch.
mysqlbinlog can stream binary logs to a remote server. Doesn’t simply copying the binlog to the remote location just as affective. Especially if done frequently using a cronjob that runs rsync?
True, though be aware of a few differences:
- rsync may not capture data that would have been flushed to disk from the filesystem cache.
- In case the source crashes, you could lose the last binary log(s) between the last rsync and the crash.
How is possible to create a backup using xtrabackup compressed directly to a volume with low capacity? Considering that is needed to use –apply-log step.
In the context of this question, we cannot stream backups for compression and do the apply-log phase at the same time. The backup needs to be complete for the apply-log phase to start. Hence compress, decompress, then apply-log. Make sure enough disk space is available for the dataset size, plus your backups if you want to be able to test your backups with apply-log.
How can you keep connection credentials secure for automated backup?
- Tools like xtrabackup, mysqldump, mydumper and mysqlpump have options to pass client defaults file. You can store credentials in those files that are restricted to only a few users on the system (including the backup user).
- Aside from the first item, most of the tools also support login paths if you do not want your credentials on a plain text file. It is not completely secure, as credentials from login paths can still be decoded.
- Another way we’ve seen is to store the credentials on a vault or similar medium, and use query tools that would return the username or password. For example, if you run xtrabackup on bash:
Shell1xtrabackup --password=$(/usr/bin/vault-query mysql-password) --backup
Of course, how you secure the account that can run the vault query command is another topic for discussion. 🙂
I missed the name of your github repo. Also for mysqlbinlog parsing? (same question)
See above, and for an example of mysqlbinlog parsing library: https://github.com/Yelp/ybinlogp
Which one is faster between mydumper and 5.7 mysqlpump?
This is an interesting question, though belongs to the “It Depends” category. 🙂 First, we have not benchmarked these two tools head to head. Second, with different approaches one may be faster on a specific use case, while the other is faster on a different use case. For example, with the different lock granularity support on mydumper, it could be faster on InnoDB with only high-concurrent workloads.
If we wanted to migrate a 2.5TB database over a VPN connection, which backup and restore method would you recommend? The method would need to be resilient. This would be for migrating an on premise db to a MySQL RDS instance at AWS.
Again, there could be a number of ways this might be achieved, but one we frequently go with is:
- Setup an EC2 instance that would replicate from the original source.
- Once the replication is caught up, stop replication, do a parallel dump of the data per table.
- Import the data to RDS per table where you can monitor progress and failure, and retry each table if necessary (hint: mydumper can also chunk)
- Once complete, configure RDS to replicate from EC2 to complete its data.
Bonus: if you are migrating to Aurora, do you know you can use an XtraBackup based backup directly?
What about if I have 1TB of data to backup and restore to a new server, how much time does it take, can we restore/stream at the same time while taking a backup?
Assuming you have direct access to the source server, XtraBackup is an excellent option here. Backup from the source then streams to the new server. Once complete, prepare the backup on the new server and it should be ready for use. These instructions are mostly for provisioning new slaves, but most of the steps should apply for the same outcome.
Is mydumper your product, and how fast will it take to backup a few millions of data?
No, mydumper is not official Percona software. Percona contributes to this software as it both benefits our customers and the community.
Will it lock my table during the process? How to restore the mydumper?
By default, the table will be locked. However, this is highly configurable. For example, if you are using a version Percona Server for MySQL that supports Backup Locks, the lock time is significantly reduced. Additionally, depending on the backup requirements you can skip locks altogether.
Mydumper comes with a complementary tool called myloader that does the opposite. It restores the resulting dumps into the destination server in parallel.
Thank you again for attending the webinar. If you were not able to make it, you could still watch the recording and the slides here.
By the way, if you are attending Percona Live in Europe, Marcelo’s talk on continuous backup is an excellent follow-up to this webinar!