MongoDB PIT Backups In DepthJon Tobin
In this blog is an in-depth discussion of MongoDB PIT backups.
Note: INTIMIDATION FREE ZONE!! This post is meant to give the reader most of the knowledge that is needed to understand the material. This includes basic MongoDB knowledge and other core concepts. I have tried to include links for further research where I can. Please use them where you need background and ask questions. We’re here to help!
In this two-part series, we’re going to fill in some of the gaps you might have to help you get awesome Point-in-Time (PIT) backups in MongoDB. These procedures will work for both MongoDB and Percona Server for MongoDB. This is meant to be a prequel to David Murphy’s MongoDB PIT Backup blog post. In that blog, David shows you how to take and to restore a dump up until a problem operation happened. If you haven’t read that post yet, hold off until you read this one. This foundation will help you better understand the how and why of the necessary steps. Let’s move onto some core concepts in MongoDB – but first, let me tell you what to expect in each part.
Blog 1 (this one): Core concepts – replica set backups, problem statement and solution
Blog 2: Getting Shardy – why backup consistency is tough and how to solve it
Replica Set (process name: mongod) – MongoDB uses replica sets to distribute data for DR purposes. Replica sets are made up of primaries, secondaries and/or arbiters. These are much like master/slave pairs in MySQL, but with one big caveat. There’s an election protocol included that also handles failover! That’s right, high availability (HA) too! So, in general, there is a “rule of three” when thinking about the minimum number of servers to put in your replica sets. This is necessary to avoid a split-brain scenario.
Oplog (collection name: local.oplog.rs) – The oplog is the log that records changes to the data on the MongoDB primary (secondaries also have an oplog). Much like MySQL, the non-primary members (secondaries) pull operations from the oplog and apply them to their own collections. Secondaries can pull from any member of the replica set that is ahead of them. The operations in the oplog are idempotent, meaning they always result in the same change to the database no matter how many times they’re performed.
Sharding (process name: mongos) – MongoDB also has built in horizontal scalability. This is implemented in a “shared nothing” architecture. A sharded cluster is made up of several replica sets. Each replica set contains a unique range of data. The data is distributed amongst replica sets based on a sharding key. There is also a sharding router (mongos) that runs as a routing umbrella over the cluster. In a sharded setup the application solely interfaces with the sharding router (never the replica sets themselves). This is the main function for scaling reads or writes in MongoDB. Scaling both takes very thoughtful design, but may not be possible.
Mongodump (utility name: mongodump) – MongoDB has built in database dump utility that can interface with mongod or mongos. Mongodump can also use the oplog of the server that it is run on to create a consistent point in time backup by using a “roll forward” strategy.
Mongorestore (utility name: mongorestore) – MongoDB has a built in database restore utility. Mongorestore is a rather simple utility that will replay binary dumps created by mongodump. When used with –oplogReplay when restoring a dump made with mongodump’s –oplog switch, it can make for a very functional backup facility.
OK, So What?
We’re going to first need to understand how backups work in a simple environment (a single replica set). Things are going to get much more interesting when we look at sharded clusters in the next post.
In a single replica set, things are pretty simple. There is no sharding router to deal with. You can get an entire data set by interacting with one server. The only problem that you need to deal with is the changes that are being made to the database while your mongodump is in process. If the concept of missed operations is a little fuzzy to you, just consider this simple use case:
We’re going to run a mongodump, and we have a collection with four documents:
We start mongodump on this collection. We’re also running our application at the same time, because we can’t take down production. Mongodump scans from first to last in the collection (like a range scan based on ID). In this case mongodump has backed up of all documents from id:1 through id:4
At this same moment in time, our application inserts id:3 into the collection.
Is the document with id:3 going to be included in the mongodump? The answer is: most likely not. The problem is that you would expect it to be in the completed dump. However, if you need to restore this backup, you’re going to lose id:3. Now, this is perfectly normal in Disaster Recovery scenarios. Knowing that this is happening is the key part. Your backups will have the consistency of swiss cheese if you don’t have a way to track changes being made while the backup is running. Unknown data loss is one of the worst situations one can be in. What we need is a process to capture changes while the backup is running.
Here’s where the inclusion of the –oplog flag is very important. The –oplog flag will allow mongodump to capture changes that are being made to the database while the backup is running. Since the oplog is idempotent, there is chance that we’ll change the data during a restore. This gives the mongodump a consistent snapshot of when the dump completes, like most “clone” type operations.
When running mongorestore, you can use the –oplogReplay option. Using oplog recovers to the point in time when the dump completed. Back to the use case, we may not capture id:3 on the first pass in this case, but as long as we’ve captured the oplog up until the backup completes, we’ll have id:3 available. When replaying the oplog during mongorestore, we will basically re-run the insert operation, completing the dataset. The oplog BSON timestamps all entries, so we know for sure until what point in time we’ve captured.
TIP: If you need to convert the timestamp to something human-readable, here’s something helpful
The Wrap Up
Now we have a firm understanding of the problem. Once we understand the problem, we can easily design a solution to ensure our backups have the integrity that our environment demands. In the next post, we’re going to step up the complexity by examining backups in conjunction with the most complex feature MongoDB has: sharding. Until then, post your feedback and questions in the comments section below.