ZFS For MongoDB Backups

mongodb backup using zfsWe have successfully used ZFS for MySQL® backups and MongoDB® is no different. Normally, backups will be taken from a hidden secondary, either with mongodump , WT hot backup or filesystem snapshots. In the case of the latter, instead of LVM2, we will use ZFS and discuss potential other benefits.

Preparation for initial snapshot

Before taking a ZFS snapshot, it is important to use db.fsyncLock() . This allows a consistent on disk copy of the data by blocking writes. It gives the server the time it needs to commit the journal to disk before the snapshot is taken.

My MongoDB instance below is running a ZFS volume and we will take an initial snapshot.

Notice the addition of sleep on line 23 of my command above. This is to ensure that even with the maximum storage.journal.commitIntervalMs of 500ms we allow enough time to commit the data to disk. This is simply an extra layer of guarantee and may not be necessary if you have very low journal commit interval.

Now I have a snapshot…

At this point, I have a snapshot I can use for a number of purposes.

  • Replicate a full and delta snapshot to a remote storage or region with tools like zrepl. This allows for an extra layer of redundancy and disaster recovery.
  • Use the snapshots to rebuild, replace or create new secondary nodes or refresh test/development servers regularly.
  • Use the snapshots to do point in time recovery. ZFS snapshots are relatively cost free so it is possible to take snapshots even at five minutes interval! This is actually my favorite use case and feature.

Let’s say we take snapshots every five minutes. If a collection was accidentally dropped or even just a few rows were deleted, we can mount the last snapshot before this event. If the event was discovered in less than five minutes (perhaps that’s unrealistic) we only need to replay less than five minutes of oplog!


To start a PITR, first clone the snapshot. Cloning the snapshot like below will automatically mount it. We can then start a temporary mongod instance with this mounted directory.

Once mongod has started, I would like to find out the last oplog event it has completed.

We can use this timestamp to dump the oplog from the current production and use it to replay on our temporary instance.

Assuming our bad incident occurred 30 seconds from the time this snapshot was taken, we can apply the oplog dump with mongorestore. Be aware, you’d have to identify this from your own oplog.

Note the oplogLimit  above shows a 31 seconds difference from the snapshot’s. Since we want to apply the next 30 seconds from the time the snapshot was taken, oplogLimit  takes a value before the specified value.

After applying 45 oplog events, we can see additional documents has been added to the percona.session  collection.


Because snapshots are immediately available and because of its support for deltas, ZFS is quite ideal for large datasets that would otherwise take hours for other backup tools to complete.

Photo by Designecologist from Pexels

Share this post

Leave a Reply