MongoDB Backup Best Practices

MongoDB Backup Best PracticesWhy Take Backups?

Regular database backups are a crucial part of guarding against unintended data loss events. They are even more important for the restoration and continuation of business operations.

In this blog, we will be discussing different backup strategies and their use cases along with Pros and Cons and a few other tips.

Generally, there are two types of backups used with databases technologies like MongoDB:

  • Logical Backups
  • Physical Backups

Additionally, when working with logical backups, we have the option of taking incremental backups as well where we capture the deltas or incremental data changes made between full backups to minimize the amount of data loss in case of any disaster.

We will be discussing these two backup options, how to proceed with them, and which one suits better depending upon requirements and environment setup. 

Also, we will take a look at our open-source backup utility custom-built to help avoid costs and proprietary software – Percona Backup for MongoDB or PBM. PBM is a fully-supported community backup tool capable of performing cluster-side consistent backups in MongoDB for replica sets and sharded clusters.

Logical Backups

These are the types of backups where data is dumped from the databases into the backup files. A logical backup with MongoDB means you’ll be dumping the data into a BSON formatted file.

During the logical backups, using client API the data gets read from the server and returned back to the same API which will be serialized and written into respective “.bson”, “.json” or “.csv”  backup files on disk depending upon the type of backup utilities used.

MongoDB offers below utility to take the logical backups :

Mongodump: Takes dump/backup of the databases into “.bson” format which can be later restored by replaying the same logical statements captured in dump files back to the databases.

Note: If we don’t specify the DB name or Collection name explicitly in the above “mongodump” syntax then the backup will be taken for the entire database or collections respectively. If “authorization” is enabled then we must specify the “authenticationDatabase”

Also, you should use “–oplog” to take the incremental data while the backup still running, we can specify “–oplog” with mongodump. Keep in mind that it won’t work with –db and –collection since it will only work for entire databases backups

Pros:

  1. It can take the backup at a more granular level like a specific database or a collection which will be helpful during restoration.
  2. Does not require you to halt writes against a specific node where you will be running the backup. Hence, the node would still be available for other operations.

Cons

  1. As it reads all data it can be slow and will require disk reads too for databases that are larger than the RAM available for the WT cache. The WT cache pressure increases which slow down the performance.
  2. It doesn’t capture the index data into the metadata backup file. Thus while restoring, all the indexes have to be built again for each collection after the collection has been reinserted. This will be done serially in one pass through the collection after the inserts have finished, so it can add a lot of time for big collection restores.
  3. The speed of backup also depends on allocated IOPS and type of storage since lots of read/writes would be happening during this process.
  4. Logical backups such as mongodump are in general -very time consuming for large systems.

Best Practice Tip: It is always advisable to use secondary servers for backups to avoid unnecessary performance degradation on the PRIMARY node.

As we have different types of environment setups we should be approaching each one of them like below.

  1. Replica set: Always preferred to run on secondaries.
  2. Shard clusters: Take a backup of config server replicaset and each shard individually using the secondary nodes of them. 

Since we are discussing distributed database systems like sharded clusters, we should also keep in mind that we want to have consistency in our backups at a point in time. ( Replica sets backups using  mongodump are generally consistent using “–oplog” )

Let’s discuss this scenario where the application is still writing data and cannot be stopped because of business reasons. Even if we take a backup of the config server and each shard separately, the backups of each shard will finish at different times because of data volume, data distribution, load, etc. Hence, while restoring some inconsistencies might occur because of the same reason.

Now comes the restoration part when dealing with Logical backups. Same as for backups, MongoDB provides below utilities for restoration purposes.

Mongorestore: Restores dump files created by “mongodump”. Index recreation will take place only after the data is restored which causes the use of additional memory resources and time. 

For the restore of the incremental dump we can add –oplogReplay in the above syntax to replay the oplog entries as well. 

Best Practice Tip: The “–oplogReplay” can’t be used with –db and –collection flag as it will only work while restoring all the databases.

Percona Backup for MongoDB

It is a distributed, low-impact solution for achieving consistent backups of MongoDB sharded clusters and replica sets. Percona Backup for MongoDB helps overcome the issues around consistency while taking backups of sharded clusters. Percona Backup for MongoDB is an uncomplicated command-line tool by design that lends itself well to backing up larger data sets. PBM uses the faster “s2” library and parallelized threads to improve speed and performance if extra threads are available as resources.

Some main advantages of PBM include the following: 

  • Enables Backups with replica set and sharded cluster consistency via oplog capture
  • Provides Distributed transaction consistency with MongoDB 4.2+
  • Back up anywhere – to the cloud (use any S3-compatible storage) or on-premise with a locally-mounted remote file system
  • Allows you to choose which compression algorithms to use. In some internal experiments, the “s2” library with snappy compression running parallelized with multiple threads was significantly faster than regular gzip. Caveat:  Good as long as you have the additional resources available for running the parallel threads.
  • Records Backup Progress Logging.  If you would like to see the speed of the backup (upload MB/s rate) you can look at the pbm-agent node’s logs to see the current progress. If you have a large backup you can track backup progress in pbm-agent logs. A line is appended every minute showing bytes copied vs. total size for the current collection. 
  • PBM Allows for Point-in-Time Recoveries –  restoring a database up to a specific moment.  P-I-T-R’s restore data from a backup and then replay all actions that happened to the data up to the specified moment from oplog slices.
  • PITR’s help you prevent data loss during a disaster such as crashed database, accidental data deletion or drop of tables, and unwanted update of multiple fields instead of a single one.
  • PBM is optimized to allow backups and have minimal impact on your production performance.

Best Practice Tip: Use PBM to time huge backup sets. Many people don’t realize how long it takes to backup very large data sets. And they are generally very surprised at how long it takes to restore them! Especially if going into or out of storage types that may throttle bandwidth/network traffic.

Best Practice Tip: When running PBM from an unsupervised script, we recommend using a replica set connection string. A direct, or stand-alone style, connection string will fail if that mongod host happens to be unavailable or down temporarily.

When a PBM backup is triggered, it tails and captures the oplog from config server replica set and all the shards while the backup is still running, thus providing consistency once the backup is completed.

It has a feature of taking incremental backups as well apart from complete database backup with “PITR” parameter enabled. It does all this by running “pbm-agent” on the DB (“mongod”) nodes of the cluster and is responsible for the backups and restore purposes.

As we can see below, the “pbm list” command shows the complete list of backups in the Backup snapshots section along with the incremental backups in the “PITR” section.

Below is the sample output :

If you have a large backup you can track backup progress in pbm-agent logs. Let’s take a look at the output of “pbm-agent” as well while it is taking the backup.

The last three lines of the above output mean that the full backup is completed and the incremental backup is started with a sleep interval of 10 minutes. This is an example of the Backup Progress Logging mentioned above.

We will be discussing more about Percona Backup for MongoDB in an upcoming blog post.  Until then, you can find more details on the Percona Backup for MongoDB Documentation page on our website.

Physical/Filesystem Backups

It involves snapshotting or copying the underlying MongoDB data files (–dbPath)  at a point in time, and allowing the database to cleanly recover using the state captured in the snapshotted files. They are instrumental in backing up large databases quickly, especially when used with filesystem snapshots, such as LVM snapshots, or block storage volume snapshots.

There are several general methods to take the filesystem level backup also known as Physical backups.

  1. Manually Copying the entire data files  (using Rsync → Depends on N/W bandwidth)
  2. LVM based snapshots
  3. Cloud-based disk snapshots (AWS / GCP / Azure or any other cloud provider)
  4. Percona Server for MongoDB also includes an integrated open-source Hot Backup system that creates a physical data backup on a running server without notable performance and operating degradation. You can find more information about Percona Server for MongoDB Hot Backup here.

We’ll be discussing all these above options but first, let’s look at the Pros and Cons of Physical Backups over Logical backups.

Pros :

  1. They are at least as fast as, and usually faster than, logical backups.
  2. Can be easily copied over or shared with remote servers or attached NAS.
  3. Recommended for large datasets because of speed and reliability
  4. Can be convenient while building new nodes within the same cluster or new cluster

Cons :

  1. It is not possible to restore on a less granular level such as specific DB or Collection restore
  2. Incremental backups cannot be achieved yet
  3. A dedicated node is recommended for backup (might be a hidden one) as it requires halting writes or shutting down “mongod” cleanly prior to the snapshot against the node to achieve consistency.

Below is the backup time consumption comparison for the same dataset:

DB Size: 267.6GB
Index Size: <1MB (since it was only on _id for testing)

=============================

  1. Percona Server for MongoDB’s Hot Backup:

Syntax:

Best Practice Tip: The backup path “backupDir” should be absolute. It also supports storing the backups on filesystem and AWS S3 buckets.

Notice the time taken by “Percona Hot Backup” was just 4 minutes approx. 

This is very helpful when rebuilding a node or spinning up new instances/clusters with the same dataset. The best part is it doesn’t compromise performance with locking of writes or other performance hits. 

Best Practice Tip: It is recommended to run it against the secondaries. 

  1. Filesystem Snapshot:

The approx time taken for the snapshot to be completed was only 4 minutes.

     3. Mongodump:

Results: As you can see from this quick example using the same dataset  – both the file system level snapshot and Percona Server for MongoDB Hot Backup methods took only 3-5 minutes. However “mongodump” took almost 15 minutes for just 20% of the dump to complete. Hence the speed to back up the data with mongodump is definitely very slow when compared to the other two options discussed. That is where the s2 compression and the parallelized threads of Percona Backup for MongoDB can help.

Conclusion

The best method for taking the backups depends on multiple factors like the type of infrastructure, environment, resources available, dataset size, load, etc. However, consistency and complexity also play a major role while taking backups of distributed database systems

In general, for smaller instances simple logical backups via mongodump are fine. As you reach somewhat larger database sizes above around 100G, use backup methods like Percona Backup for MongoDB that include incremental backups and capture the oplogs in order to be able to perform Point-in-Time Recoveries and minimize potential data loss. 

PBM allows you to backup to anywhere – in the cloud or on-prem, can handle your larger backups, and it is optimized to have minimal impact on your production performance. PBM is also faster due to the use of the “s2” compression method and using parallelized threads. Finally, PBM can overcome consistency issues often seen with replica set and sharded clusters by capturing the changes in the oplog. 

For very large systems, aka once you reach around the 1TB+ range, you should look to utilize physical file system level snapshot backups. One available tool for that is open-source – Percona Server for MongoDB has the integrated Hot Backup functionality built-in for the default WiredTiger storage engine and takes around the same time as other physical snapshots.

Interested in trying Percona Backup for MongoDB? Download it for free! 

Share this post

Comments (3)

  • Ankur Sahu Reply

    Very informative article…. Thanks for writing

    September 18, 2020 at 10:30 am
  • Yiding Reply

    How is about backup a sharding cluster

    September 18, 2020 at 1:36 pm
  • Hardik Chhabra Reply

    Backup process explained very thouroghly, very good blog to learn. @Divyanshu Good work.!!

    September 18, 2020 at 1:48 pm

Leave a Reply