MongoDB Backup: How and When To Use PSMDB hotbackup and mongodb_consistent_backup

mongodb backupWe have many backup methods to backup a MongoDB database using native mongodump or external tools. However, in this article, we’ll take a look at the backup tools offered by Percona, keeping in mind the restoration scenarios for MongoDB replicaSet and Sharded Cluster environments. We’ll explore how and when to use the tool mongodb-consistent-backup from Percona lab to backup the database consistently in Sharded Cluster/replicaSet environments. We’ll also take a look at hotbackup, a tool that’s available in Percona Server for MongoDB (PSMDB) packages. 

Backup is done – What about Restore?

Those who are responsible for data almost always think about the methods needed to backup the database and store the backups securely. But they often fail to foresee the scenario where the backup needs to be used to restore data. For example, unfortunately, I have seen many companies schedule the backup of config files and shard servers separately, but they start and complete the backups at different times based on data volumes. But can we use that backup when we need to restore and start the cluster with it? The answer is no—well, maybe yes if you can tweak the metadata, but data inconsistency may occur. Using this backup schedule, the backup is not consistent for the whole cluster, and we don’t have a point where we can restore the data for all shards/config dbs so that we can start the cluster from that point. Consequently, we face a difficult situation where we really need to use that backup! 

Let’s explore the two tools/features available to backup MongoDB from Percona, and look at which method to choose based on your restoration plan. 

Hot backup for both replicaset and Sharded cluster

Note: PerconaLabs and Percona-QA are open source GitHub repositories for unofficial scripts and tools created by Percona staff. These handy utilities can help you save time and effort.

Percona software builds located in the PerconaLabs and Percona-QA repositories are not officially released software, and also aren’t covered by Percona support or services agreements.

The main problem with backup is maintaining consistency, as an application still writes to the DB while backup is going on. So to maintain the consistency throughout the backup, and get a reliable full backup of all data needed to restore the database, the backup tool needs to track changes via oplog as well.  Using the mongodump utility along with oplog backup would help to achieve this easily in a replicaSet environment since you will need consistency for that replicaSet alone.

But when we need a consistent backup of a Sharded cluster, then it is very difficult to achieve the total cluster consistency as it involvs the backup of all shards and config servers all together up to a particular point,  to reuse in failover cases. In this case, even if you use mongodump manually in each shard/config separately, and try to take a consistent backup of the total cluster when there are writes being made, it is a very tedious job.  The backup of each shard ends at different points based on different scenarios such as load, data volume etc.

To remedy this, we could take a consistent hot backup of the Sharded cluster by using our utility mongodb-consistent-backup – in other words, point-in-time backup for the sharded cluster environment. This utility internally uses mongodump and gets the oplog changes from each node until the backup from all data nodes and configs are complete. This ensures that there is consistency in the backup of a total Sharded Cluster! You have to make sure you are using replicaSet for your config server too.  In fact, this tool also helps you to take a consistent backup in the replicaSet environment. 

This utility is available in our Percona lab but please note that it is not yet supported officially. To install this package, please make sure you install all the dependency packages, and follow the steps mentioned in this link to complete the installation process.

If you have enabled authentication in your environment, then create a user like below:

The backup could be taken as follows by connecting one of the mongos node in the Sharded Cluster. Here mongos is running on 27051 port and the Cluster has one config replicaSet cfg and two Shards s1 and s2.

n – backup directory name to be created
l – backup directory
H – hostname
P – port
p – password
u – user
a – authentication database

The log, above, shows the backup pattern going on, and it captures the state of the oplog, and updates the changes. The same command could be used to connect the replicaSet by having a proper hostname. The tool also has the ability to identify whether it is a replicaSet or Sharded cluster before proceeding with the backup. This can be determined from the log output, as shown below, which is written by the tool when running the backup:

For sharding cluster:

For replicaSet:

You can check out a couple of our blogs here and here for more details about the utility.

Hot but Cold backup

You may be wondering about the title Hot but Cold backup. Yes, for Percona Server for MongoDB (PSMDB) packages, there is feature to take the binary hot backup using hotbackup. Those who know the MySQL world will already know about Percona XtraBackup which is our open source and free binary hot backup utility for MySQL. PSMDB hotbackup works in a similar way. When you use hotbackup to backup, then you will have a binary backup ready to start an instance with the backup directory. You don’t need to worry about restoring from scratch and recreating indices. However, this solution works for replicaset/standalone mongodb instances only. 

If you can plan well, then you could feasibly use this feature to backup a Sharded cluster by bringing down one of the secondaries from all shards/config servers at the same time (probably when there is low or no transaction writing), then start them on a different port and without the replicaSet variable option, so that those instances won’t rejoin their replicaSet. Now you can start the hotbackup in all instances, once they are finished. You can revert the changes in the config file and allow them to rejoin their replicaSet.

Cautionary notes: Please make sure you are using the low priority or hidden nodes for this purpose, so that the election is not triggered when they split/join back to the replicaSet and don’t use SIGKILL (kill -9) to stop the db as it shuts down the database abruptly. Also, please plan to have at least an equal amount of disk space to that of your shard. A hotbackup takes an approximately equal amount of space as your node. 

 My colleague Tim Vaillancourt has written a great blogpost on this. See here.  


So from the above two methods, now you have the option to choose the similar backup methods based on your RTO, RPO explained here. Hope this helps you! Please share your comments and feedback below, and tell me what you think!


Photo by Designecologist from Pexels


Share this post

Comments (3)


    What is the various auth mechanism supported by the tool. I have configured my mongodb in MONGODB-X509, can I use the tool with this configuration of auth?

    December 19, 2018 at 7:53 am
    • Vinodh Krishnaswamy Reply

      Hi Venkat,
      mongodb-consistent-backup uses “mongodump” internally. So it supports the authentication methods supported by MongoDB. From the doc:
      > * Support for MongoDB Authentication and SSL database connections

      The options available for SSL in the tool are as follows:

      –ssl.enabled Use SSL secured database connections to MongoDB hosts (default: false)
      –ssl.insecure Do not validate the SSL certificate and hostname of the server (default: false)
      –ssl.ca_file SSL.CA_FILE Path to SSL Certificate Authority file in PEM format
      –ssl.crl_file SSL.CRL_FILE Path to SSL Certificate Revocation List file in PEM or DER format
      –ssl.client_cert_file SSL.CLIENT_CERT_FILE Path to Client SSL Certificate file in PEM format (for optional client ssl auth)

      December 25, 2018 at 6:30 am

Leave a Reply