We have successfully used ZFS for MySQL® backups and MongoDB® is no different. Normally, backups will be taken from a hidden secondary, either with
mongodump , WT hot backup or filesystem snapshots. In the case of the latter, instead of LVM2, we will use ZFS and discuss potential other benefits.
Before taking a ZFS snapshot, it is important to use db.fsyncLock() . This allows a consistent on disk copy of the data by blocking writes. It gives the server the time it needs to commit the journal to disk before the snapshot is taken.
My MongoDB instance below is running a ZFS volume and we will take an initial snapshot.
|
1 |
revin@mongodb:~$ sudo zfs list<br>NAME USED AVAIL REFER MOUNTPOINT<br>zfs-mongo 596M 9.04G 24K /zfs-mongo<br>zfs-mongo/data 592M 9.04G 592M /zfs-mongo/data<br>revin@mongodb:~$ mongo --port 28020 --eval 'db.serverCmdLineOpts().parsed.storage' --quiet<br>{<br> "dbPath" : "/zfs-mongo/data/m40",<br> "journal" : {<br> "enabled" : true<br> },<br> "wiredTiger" : {<br> "engineConfig" : {<br> "cacheSizeGB" : 0.25<br> }<br> }<br>}<br>revin@mongodb:~$ mongo --port 28020 --eval 'db.fsyncLock()' --quiet<br>{<br> "info" : "now locked against writes, use db.fsyncUnlock() to unlock",<br> "lockCount" : NumberLong(1),<br>...<br>}<br>revin@mongodb:~$ sleep 0.6<br>revin@mongodb:~$ sudo zfs snapshot zfs-mongo/data@full<br>revin@mongodb:~$ mongo --port 28020 --eval 'db.fsyncUnlock()' --quiet<br>{<br> "info" : "fsyncUnlock completed",<br> "lockCount" : NumberLong(0),<br>...<br>} |
Notice the addition of sleep on line 23 of my command above. This is to ensure that even with the maximum storage.journal.commitIntervalMs of 500ms we allow enough time to commit the data to disk. This is simply an extra layer of guarantee and may not be necessary if you have very low journal commit interval.
|
1 |
revin@mongodb:~$ sudo zfs list -t all<br>NAME USED AVAIL REFER MOUNTPOINT<br>zfs-mongo 596M 9.04G 24K /zfs-mongo<br>zfs-mongo/data 592M 9.04G 592M /zfs-mongo/data<br>zfs-mongo/data@full 192K - 592M - |
At this point, I have a snapshot I can use for a number of purposes.
Let’s say we take snapshots every five minutes. If a collection was accidentally dropped or even just a few rows were deleted, we can mount the last snapshot before this event. If the event was discovered in less than five minutes (perhaps that’s unrealistic) we only need to replay less than five minutes of oplog!
To start a PITR, first clone the snapshot. Cloning the snapshot like below will automatically mount it. We can then start a temporary mongod instance with this mounted directory.
|
1 |
revin@mongodb:~$ sudo zfs clone zfs-mongo/data@full zfs-mongo/data-clone<br>revin@mongodb:~$ sudo zfs list -t all<br>NAME USED AVAIL REFER MOUNTPOINT<br>zfs-mongo 606M 9.04G 24K /zfs-mongo<br>zfs-mongo/data 600M 9.04G 592M /zfs-mongo/data<br>zfs-mongo/data@full 8.46M - 592M -<br>zfs-mongo/data-clone 1K 9.04G 592M /zfs-mongo/data-clone<br><br>revin@mongodb:~$ ./mongodb-linux-x86_64-4.0.8/bin/mongod <br> --dbpath /zfs-mongo/data-clone/m40 <br> --port 28021 --oplogSize 200 --wiredTigerCacheSizeGB 0.25 |
Once mongod has started, I would like to find out the last oplog event it has completed.
|
1 |
revin@mongodb:~$ mongo --port 28021 local --quiet <br>> --eval 'db.oplog.rs.find({},{ts: 1}).sort({ts: -1}).limit(1)'<br>{ "ts" : Timestamp(1555356271, 1) } |
We can use this timestamp to dump the oplog from the current production and use it to replay on our temporary instance.
|
1 |
revin@mongodb:~$ mkdir ~/mongodump28020<br>revin@mongodb:~$ cd ~/mongodump28020<br>revin@mongodb:~/mongodump28020$ mongodump --port 28020 -d local -c oplog.rs <br>> --query '{ts: {$gt: Timestamp(1555356271, 1)}}'<br>2019-04-16T23:57:50.708+0000 writing local.oplog.rs to<br>2019-04-16T23:57:52.723+0000 done dumping local.oplog.rs (186444 documents) |
Assuming our bad incident occurred 30 seconds from the time this snapshot was taken, we can apply the oplog dump with mongorestore. Be aware, you’d have to identify this from your own oplog.
|
1 |
revin@mongodb:~/mongodump28020$ mv dump/local/oplog.rs.bson dump/oplog.bson<br>revin@mongodb:~/mongodump28020$ rm -rf dump/local<br>revin@mongodb:~/mongodump28020$ mongo --port 28021 percona --quiet --eval 'db.session.count()'<br>79767<br>revin@mongodb:~/mongodump28020$ mongorestore --port 28021 --dir=dump/ --oplogReplay <br>> --oplogLimit 1555356302 -vvv |
Note the oplogLimit above shows a 31 seconds difference from the snapshot’s. Since we want to apply the next 30 seconds from the time the snapshot was taken, oplogLimit takes a value before the specified value.
|
1 |
2019-04-17T00:06:46.410+0000 using --dir flag instead of arguments<br>2019-04-17T00:06:46.412+0000 checking options<br>2019-04-17T00:06:46.413+0000 dumping with object check disabled<br>2019-04-17T00:06:46.414+0000 will listen for SIGTERM, SIGINT, and SIGKILL<br>2019-04-17T00:06:46.418+0000 connected to node type: standalone<br>2019-04-17T00:06:46.418+0000 standalone server: setting write concern w to 1<br>2019-04-17T00:06:46.419+0000 using write concern: w='1', j=false, fsync=false, wtimeout=0<br>2019-04-17T00:06:46.420+0000 mongorestore target is a directory, not a file<br>2019-04-17T00:06:46.421+0000 preparing collections to restore from<br>2019-04-17T00:06:46.421+0000 using dump as dump root directory<br>2019-04-17T00:06:46.421+0000 found oplog.bson file to replay<br>2019-04-17T00:06:46.421+0000 enqueued collection '.oplog'<br>2019-04-17T00:06:46.421+0000 finalizing intent manager with multi-database longest task first prioritizer<br>2019-04-17T00:06:46.421+0000 restoring up to 4 collections in parallel<br>...<br>2019-04-17T00:06:46.421+0000 replaying oplog<br>2019-04-17T00:06:46.446+0000 timestamp 6680204450717499393 is not below limit of 6680204450717499392; ending oplog restoration<br>2019-04-17T00:06:46.446+0000 applied 45 ops<br>2019-04-17T00:06:46.446+0000 done |
After applying 45 oplog events, we can see additional documents has been added to the percona.session collection.
|
1 |
revin@mongodb:~/mongodump28020$ mongo --port 28021 percona --quiet --eval 'db.session.count()'<br>79792 |
Because snapshots are immediately available and because of its support for deltas, ZFS is quite ideal for large datasets that would otherwise take hours for other backup tools to complete.
—
Photo by Designecologist from Pexels