Mistakes can happen. If only we could go back in time to the very second before that mistake was made.
Act 1: The Disaster
Plain text version for those who cannot run the asciicast above:
| 
					 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95  | 
						akira@perc01:/data$ #OK, let's get this party started! akira@perc01:/data$ # The frontend has been shut down for 20 mins so they can akira@perc01:/data$ # update that part, and I can update the schema in he  akira@perc01:/data$ # backend simultaneously. akira@perc01:/data$ #Easy-peasy ... akira@perc01:/data$ date Tue Jul  2 13:34:09 JST 2019 akira@perc01:/data$ #Just set my auth details.(NO PEEKING!) akira@perc01:/data$ conn_args="--host localhost:27017 --username akira --password secret --authenticationDatabase admin" akira@perc01:/data$ mongo ${conn_args} --quiet testrs:PRIMARY> use payments switched to db payments testrs:PRIMARY> show collections TheImportantCollection testrs:PRIMARY> //Ah, there it is. Time to work! testrs:PRIMARY> db.TheImportantCollection.count() 174662 testrs:PRIMARY> db.TheImportantCollection.findOne() {     "_id" : 0,     "customer" : {         "fn" : "Smith",         "gn" : "Ken",         "city" : "Georgevill",         "street1" : "1 Wishful St.",         "postcode" : "45031"     },     "order_ids" : [ ] } testrs:PRIMARY> //Ah, there it is. The "customer" object that has the  testrs:PRIMARY> //address fields in it. We're going to move those out. testrs:PRIMARY> //Copy the whole collection, adding the new "addresses" array testrs:PRIMARY> var counter = 0; testrs:PRIMARY> db.TheImportantCollection.find().forEach(function(d) { ...   d["adresses"] = [ ]; ...   db.TheImportantCollectionV2.insert(d); ...   counter += 1; ...   if (counter % 25000 == 0) { print(counter + " updates done"); } ... }); 25000 updates done 50000 updates done 75000 updates done 100000 updates done 125000 updates done 150000 updates done testrs:PRIMARY> //Cool. Let's look at the temp table testrs:PRIMARY> db.TheImportantCollectionV2.findOne() {     "_id" : 0,     "customer" : {         "fn" : "Smith",         "gn" : "Ken",         "city" : "Georgevill",         "street1" : "1 Wishful St.",         "postcode" : "45031"     },     "order_ids" : [ ],     "adresses" : [ ] } testrs:PRIMARY> //?AH!! testrs:PRIMARY> //typo. I misspelled "addresses". testrs:PRIMARY> //I'll just drop this and go again testrs:PRIMARY> db.TheImportantCollectionV2.remove({}) WriteResult({ "nRemoved" : 174662 }) testrs:PRIMARY> //ooops. Why did I bother deleting the docs? testrs:PRIMARY> //I need to *drop* the collection testrs:PRIMARY> db.TheImportantCollection.drop() true testrs:PRIMARY> //!!!! testrs:PRIMARY> //Wait! testrs:PRIMARY> show collections TheImportantCollectionV2 testrs:PRIMARY> //... testrs:PRIMARY> //I've done a bad    thing .... testrs:PRIMARY> //Let me see testrs:PRIMARY> //in the oplog testrs:PRIMARY> use local switched to db local testrs:PRIMARY> db.oplog.rs.findOne({"o.drop": "TheImportantCollection"}) {     "ts" : Timestamp(1562042272, 1),     "t" : NumberLong(6),     "h" : NumberLong("6726633412398410781"),     "v" : 2,     "op" : "c",     "ns" : "payments.$cmd",     "ui" : UUID("abc9c1f9-71c0-45ea-aeba-ea239b975a95"),     "wall" : ISODate("2019-07-02T04:37:52.171Z"),     "o" : {         "drop" : "TheImportantCollection"     } } testrs:PRIMARY> //AH. 1562042272, you are the worst unix epoch second of my testrs:PRIMARY> // life. testrs:PRIMARY>   | 
					
Act 2: Time travel with a Snapshot restore + Oplog replay
Plain text version for those who cannot run the asciicast above:
| 
					 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267  | 
						akira@perc01:/data$ #OK, OK, this is bad. I dropped TheImportantCollection akira@perc01:/data$ #Breathe. Breathe Akira. akira@perc01:/data$ #Right! Backups! akira@perc01:/data$ #I have backups! akira@perc01:/data$ ls /backups/ 20190624_2300  20190626_2300  20190628_2300 20190625_2300  20190627_2300  20190629_2300 akira@perc01:/data$ #OK, I have one from 23:00 JST ... which is a while ago. akira@perc01:/data$ #I can use the latest backup, then roll forward from akira@perc01:/data$ # there using this neat thing you can do with akira@perc01:/data$ #  mongorestore (the standard mongo utils command) akira@perc01:/data$ #You can replay a dumped oplog bson file  akira@perc01:/data$ # on a primary like it was receiving as a secondary akira@perc01:/data$ #Just as a secondary can catch up from a primary so akira@perc01:/data$ # far the oplog window of time goes, a primary can akira@perc01:/data$ # be given an oplog history to replay, using this 'trick' akira@perc01:/data$ #(Not really a trick, but let's call it that) akira@perc01:/data$  akira@perc01:/data$ # akira@perc01:/data$ #But, before doing ANYTHING with the backups, akira@perc01:/data$ # get a full dump of the oplog of the *live* replicaset akira@perc01:/data$ # first akira@perc01:/data$ conn_args="--host localhost:27017 --username akira --password secret --authenticationDatabase admin" akira@perc01:/data$ mongodump ${conn_args} -d local -c oplog.rs --out /data/oplog_dump_full 2019-07-02T13:50:02.713+0900	writing local.oplog.rs to  2019-07-02T13:50:03.635+0900	done dumping local.oplog.rs (825815 documents) akira@perc01:/data$ #Oh wait. akira@perc01:/data$ #We *do* need a trick akira@perc01:/data$ #v3.6 and v4.0 added some system collections that cause akira@perc01:/data$ # mongorestore to fail, no matter what we do. akira@perc01:/data$ # This is just a 3.6 and 4.0 issue hopefully, but 4.2's  akira@perc01:/data$ #  behaviour is not known at this date. akira@perc01:/data$ #I'll do the dump again, removing these two collections akira@perc01:/data$ mongodump ${conn_args} -d local -c oplog.rs \ > --query '{"ns": {"$nin": ["config.system.sessions", "config.cache.collections"]}}' --out /data/oplog_dump_full 2019-07-02T13:52:08.841+0900	writing local.oplog.rs to  2019-07-02T13:52:10.010+0900	done dumping local.oplog.rs (825781 documents) akira@perc01:/data$ #So that was Trick #1. Removing those 2 specific  akira@perc01:/data$ # config.* collections. akira@perc01:/data$ #Now for #Trick 2 akira@perc01:/data$ #mongodump puts the dumped oplog.rs.bson file in subdirectory "local" like that is a whole DB to restore. But you don't do a restore of local like any other DB, it doesn't work like that. akira@perc01:/data$ #So we MUST get  rid of subdirectory structure and just keep the single *.bson file akira@perc01:/data$ ls -lR /data/oplog_dump_full/ /data/oplog_dump_full/: total 146032 drwxr-xr-x 2 akira akira        57 Jul  2 13:50 local -rw-r--r-- 1 akira akira 149534510 Jul  2 10:26 oplog.rs.bson /data/oplog_dump_full/local: total 233008 -rw-r--r-- 1 akira akira 238596091 Jul  2 13:52 oplog.rs.bson -rw-r--r-- 1 akira akira       120 Jul  2 13:52 oplog.rs.metadata.json akira@perc01:/data$ mv /data/oplog_dump_full/local/oplog.rs.bson /data/oplog_dump_full/ akira@perc01:/data$ rm -rf /data/oplog_dump_full/local akira@perc01:/data$ ls -lR /data/oplog_dump_full/ /data/oplog_dump_full/: total 233004 -rw-r--r-- 1 akira akira 238596091 Jul  2 13:52 oplog.rs.bson akira@perc01:/data$ #OK. akira@perc01:/data$ #Now let's look at this oplog. Does it go back as far as akira@perc01:/data$ # the latest backup snapshot or more? akira@perc01:/data$ ls /backups/ | tail -n 1 20190629_2300 akira@perc01:/data$ #By the way that is my JST timezone, not UTC akira@perc01:/data$ #let's see ... check the bson file's first timestamp akira@perc01:/data$ bsondump /data/oplog_dump_full/oplog.rs.bson 2>/dev/null | head -n 1 {"ts":{"$timestamp":{"t":1561727517,"i":1}},"h":{"$numberLong":"212971303912007811"},"v":2,"op":"n","ns":"","wall":{"$date":"2019-06-28T13:11:57.633Z"},"o":{"msg":"initiating set"}} akira@perc01:/data$ #I see the epoch timestamp there: 1561727517 akira@perc01:/data$ date -d @1561727517 Fri Jun 28 22:11:57 JST 2019 akira@perc01:/data$ #Ah, good, that's before 20190629_2300 akira@perc01:/data$ #We can do a oplog replay akira@perc01:/data$ #Just for sanity's sake let's look for that "drop" akira@perc01:/data$ #  command that is the disaster we want to avoid replaying akira@perc01:/data$ bsondump /data/oplog_dump_full/oplog.rs.bson 2>/dev/null | grep drop | grep '\bTheImportantCollection\b' | tail -n 1 {"ts":{"$timestamp":{"t":1562042272,"i":1}},"t":{"$numberLong":"6"},"h":{"$numberLong":"6726633412398410781"},"v":2,"op":"c","ns":"payments.$cmd","ui":{"$binary":"q8nB+XHARequuuojm5dalQ==","$type":"04"},"wall":{"$date":"2019-07-02T04:37:52.171Z"},"o":{"drop":"TheImportantCollection"}} akira@perc01:/data$ #Let's see it was 1562042272, the worst epoch second of my akira@perc01:/data$ # my life. Let's not go there again! akira@perc01:/data$ #Time to shut the live replicaset down, restore a snapshot akira@perc01:/data$ # backup from 20190629_2300 akira@perc01:/data$ ps -C mongod -o pid,args   PID COMMAND 18119 mongod -f /data/n1/mongod.conf 18195 mongod -f /data/n2/mongod.conf 18225 mongod -f /data/n3/mongod.conf akira@perc01:/data$ kill 18119 18195 18225 akira@perc01:/data$ ps -C mongod -o pid,args   PID COMMAND 18119 mongod -f /data/n1/mongod.conf akira@perc01:/data$ ps -C mongod -o pid,args   PID COMMAND 18119 mongod -f /data/n1/mongod.conf akira@perc01:/data$ ps -C mongod -o pid,args   PID COMMAND 18119 mongod -f /data/n1/mongod.conf akira@perc01:/data$ ps -C mongod -o pid,args   PID COMMAND akira@perc01:/data$ #OK, shutdown akira@perc01:/data$ /data/dba_scripts/our_restore_script.sh  usage: /data/dba_scripts/our_restore_script.sh XXXXXX Choose one of these subdirectory names from /backups/:   20190624_2300   20190625_2300   20190626_2300   20190627_2300   20190628_2300   20190629_2300 akira@perc01:/data$ /data/dba_scripts/our_restore_script.sh 20190629_2300 Stopping mongod nodes Restoring backup 20190629_2300 to one node dbpath Restarting about to fork child process, waiting until server is ready for connections. forked process: 21776 child process started successfully, parent exiting akira@perc01:/data$ ps -C mongod -o pid,args   PID COMMAND 21776 mongod -f /data/n1/mongod.conf akira@perc01:/data$ #I'll start the secondaries too akira@perc01:/data$ rm -rf /data/n2/data/* akira@perc01:/data$ mongod -f /data/n2/mongod.conf  about to fork child process, waiting until server is ready for connections. forked process: 21859 child process started successfully, parent exiting akira@perc01:/data$ rm -rf /data/n3/data/* akira@perc01:/data$ mongod -f /data/n3/mongod.conf  about to fork child process, waiting until server is ready for connections. forked process: 21896 child process started successfully, parent exiting akira@perc01:/data$ ps -C mongod -o pid,args   PID COMMAND 21776 mongod -f /data/n1/mongod.conf 21859 mongod -f /data/n2/mongod.conf 21896 mongod -f /data/n3/mongod.conf akira@perc01:/data$ #I'm going to check my important collection is there again akira@perc01:/data$ mongo ${conn_args}  MongoDB shell version v4.0.10 connecting to: mongodb://localhost:27017/?authSource=admin&gssapiServiceName=mongodb Implicit session: session { "id" : UUID("e5aa9b27-f26b-4c73-bdc1-bdaf494cf7ab") } MongoDB server version: 4.0.10 testrs:PRIMARY> use payments switched to db payments testrs:PRIMARY> show collections TheImportantCollection testrs:PRIMARY> //YES testrs:PRIMARY> db.TheImportantCollection.count() 174662 testrs:PRIMARY> db.TheImportantCollection.findOne() { 	"_id" : 0, 	"customer" : { 		"fn" : "Smith", 		"gn" : "Ken", 		"city" : "Georgevill", 		"street1" : "1 Wishful St.", 		"postcode" : "45031" 	}, 	"order_ids" : [ ] } testrs:PRIMARY> //Yes yes yes ... I live testrs:PRIMARY>  bye akira@perc01:/data$ #So the data is back ... but only some time way in the akira@perc01:/data$ # past. I want to replay up until ... akira@perc01:/data$ bad_drop_epoch_sec=1562042272 akira@perc01:/data$ #Trick 3: mongorestore always expects a directory name akira@perc01:/data$ #We don't need any directories, but it's just hard-coded akira@perc01:/data$ # to expect one. So let's make one. Can be anywhere akira@perc01:/data$ # Just not a subdirectory under the oplog dump location please, that will confuse it maybe akira@perc01:/data$ mkdir /tmp/fake_empty_dir mkdir: cannot create directory ‘/tmp/fake_empty_dir’: File exists akira@perc01:/data$ #Ah, I got it already. akira@perc01:/data$ ls /tmp/fake_empty_dir akira@perc01:/data$ mongorestore ${conn_args} \ >   --oplogReplay \ >    --oplogFile /data/oplog_dump_full/oplog.rs.bson \ >   --oplogLimit ${bad_drop_epoch_sec}:0  \ >   --stopOnError /tmp/fake_empty_dir 2019-07-02T14:04:35.742+0900	preparing collections to restore from 2019-07-02T14:04:35.742+0900	replaying oplog 2019-07-02T14:04:38.715+0900	oplog  5.47MB 2019-07-02T14:04:41.715+0900	oplog  11.0MB 2019-07-02T14:04:44.715+0900	oplog  16.6MB 2019-07-02T14:04:47.715+0900	oplog  22.2MB 2019-07-02T14:04:50.715+0900	oplog  27.6MB 2019-07-02T14:04:53.715+0900	oplog  32.8MB 2019-07-02T14:04:56.715+0900	oplog  37.9MB 2019-07-02T14:04:59.715+0900	oplog  43.0MB 2019-07-02T14:05:02.715+0900	oplog  48.3MB 2019-07-02T14:05:05.715+0900	oplog  53.9MB 2019-07-02T14:05:08.715+0900	oplog  59.5MB 2019-07-02T14:05:11.715+0900	oplog  65.1MB 2019-07-02T14:05:14.715+0900	oplog  70.2MB 2019-07-02T14:05:17.715+0900	oplog  75.0MB 2019-07-02T14:05:20.715+0900	oplog  79.6MB 2019-07-02T14:05:23.715+0900	oplog  84.1MB 2019-07-02T14:05:26.715+0900	oplog  88.5MB 2019-07-02T14:05:29.715+0900	oplog  93.0MB 2019-07-02T14:05:32.715+0900	oplog  97.6MB 2019-07-02T14:05:35.715+0900	oplog  101MB 2019-07-02T14:05:38.715+0900	oplog  104MB 2019-07-02T14:05:41.715+0900	oplog  107MB 2019-07-02T14:05:44.715+0900	oplog  110MB 2019-07-02T14:05:47.715+0900	oplog  113MB 2019-07-02T14:05:50.715+0900	oplog  115MB 2019-07-02T14:05:53.715+0900	oplog  118MB 2019-07-02T14:05:56.715+0900	oplog  123MB 2019-07-02T14:05:59.715+0900	oplog  128MB 2019-07-02T14:06:02.715+0900	oplog  133MB 2019-07-02T14:06:05.715+0900	oplog  138MB 2019-07-02T14:06:08.715+0900	oplog  142MB 2019-07-02T14:06:11.715+0900	oplog  146MB 2019-07-02T14:06:14.715+0900	oplog  151MB 2019-07-02T14:06:17.715+0900	oplog  156MB 2019-07-02T14:06:20.715+0900	oplog  161MB 2019-07-02T14:06:23.715+0900	oplog  166MB 2019-07-02T14:06:26.715+0900	oplog  171MB 2019-07-02T14:06:29.715+0900	oplog  176MB 2019-07-02T14:06:32.715+0900	oplog  181MB 2019-07-02T14:06:35.715+0900	oplog  186MB 2019-07-02T14:06:38.715+0900	oplog  192MB 2019-07-02T14:06:41.715+0900	oplog  197MB 2019-07-02T14:06:44.715+0900	oplog  201MB 2019-07-02T14:06:47.715+0900	oplog  204MB 2019-07-02T14:06:50.715+0900	oplog  206MB 2019-07-02T14:06:53.715+0900	oplog  209MB 2019-07-02T14:06:56.715+0900	oplog  211MB 2019-07-02T14:06:59.715+0900	oplog  213MB 2019-07-02T14:07:02.715+0900	oplog  216MB 2019-07-02T14:07:05.715+0900	oplog  218MB 2019-07-02T14:07:08.715+0900	oplog  220MB 2019-07-02T14:07:11.715+0900	oplog  223MB 2019-07-02T14:07:14.715+0900	oplog  225MB 2019-07-02T14:07:17.715+0900	oplog  227MB 2019-07-02T14:07:17.753+0900	oplog  227MB 2019-07-02T14:07:17.753+0900	done akira@perc01:/data$ #Yay! I hope! Let's check akira@perc01:/data$ mongo ${conn_args}  MongoDB shell version v4.0.10 connecting to: mongodb://localhost:27017/?authSource=admin&gssapiServiceName=mongodb Implicit session: session { "id" : UUID("302f2c26-7416-4e18-bd02-1bd67626d062") } MongoDB server version: 4.0.10 testrs:PRIMARY> use payments switched to db payments testrs:PRIMARY> show collections TheImportantCollection TheImportantCollectionV2 testrs:PRIMARY> //Yes! both there! testrs:PRIMARY> db.TheImportantCollection.count() 174662 testrs:PRIMARY> //plus the 'V2' table I was working on when I made my  testrs:PRIMARY> // 'fat thumb' mistake testrs:PRIMARY> //There we go, a point-in-time restore from a snapshot testrs:PRIMARY> // backup + a mongorestore --oplogReplay --oplogFile testrs:PRIMARY> // operation. testrs:PRIMARY> //Hold on for one last trick (which I didn't have to use today) testrs:PRIMARY> // Trick #4: ultimate permissions are sometimes needed. testrs:PRIMARY> // The config.system.sessions and config.transactions(?)  testrs:PRIMARY> //  system collections are currently unreplayable (3.6, 4.0, testrs:PRIMARY> //  4.2. TBD). testrs:PRIMARY> // They are not the only system collections that you can stuck on, because systems collections are mostly not covered by the "backup" and "restore" built-in roles. testrs:PRIMARY> // E.g. if you are replaying updates to the admin.system.users testrs:PRIMARY> //  collection that will fail. testrs:PRIMARY> // But you can allow if you make a *custom* role that grants testrs:PRIMARY> //  "anyAction" on "anyResource" (see the docs), and grant that testrs:PRIMARY> // to your backup and restore user, that will make it possible for those to succeed too. testrs:PRIMARY> //good night testrs:PRIMARY>   | 
					
The ‘TLDR’
The oplog of the damaged replicaset is your valuable, idempotent history if you have a backup from a recent enough time to apply it on.
- Identify your disaster operation’s timestamp value in the oplog.
 - Before shutting the damaged replicaset down: 
mongodump connection-args --db local --collection oplog.rs- (Necessary workaround #1) use a 
--query '{"ns": {"$nin": ["config.system.sessions", "config.transactions", "config.transaction_coordinators"]}}'argument to avoid transaction-related system collections from v3.6 and v4.0 (and maybe 4.2+ too) that can’t be restored. 
 - (Necessary workaround #1) use a 
 - (Necessary workaround #2) Get rid of the subdirectory structure mongodump makes and keep just the oplog.rs.bson file.
 - (Necessary workaround #3) Make a fake, empty directory somewhere else too, to trick mongorestore later.
 - Use 
bsondump oplog.rs.bson | head -n 1to check that this oplog starts before the time of your last backup - Shut the damaged DB down.
 - Restore to the latest backup before the disaster.
 - (Possibly-required workaround #4) If the oplog updates other system collections, create a user-defined role that grants anyAction on anyResource and grants it to your user as well. (See special section on system collections below.)
 - Replay up to but not including the disaster second: mongorestore connection-args –oplogReplay –oplogFile oplog.rs.bson –oplogLimit disaster_epoch_sec:0 /tmp/fake_empty_directory
 
See the ‘Act 2’ video for the details.
So how did that work?
If you’re having the kind of disaster presented in this article I assume you are already familiar with the mongodump and mongorestore tools and MongoDB Oplog idempotency. Taking that for granted let’s go to the next level of detail.
The applyOps command – Kinda secret; Actually public
In theory you could iterate oplog documents and write an application that runs an insert command for an “i” op, an update for the “u” ops, various different commands for the “c” op, etc, but the simpler way is to submit them as they are (well almost exactly as they are) using the applyOps command, and this is what the mongorestore tool does.
The permission to run applyOps is granted to the “restore” role for all non-system collections, and there is no ‘reject if a primary’ rule. So you can make a primary apply oplog docs like a secondary does.
N.b. for some system collections, the “restore” role is not enough. See the bottom section for more details.
It might seem a bit strange users can have this privilege but without it, there would be no convenient way for dump-restore tools to guarantee consistency. The “consistency” here means all that the restored data will be exactly as it was at some point in time – the end of the dump – and not contain earlier versions of documents from some midpoint time during the dumping process.
Achieving that data consistency is why the --oplog option for mongodump was created, and why mongorestore has the matching --oplogReplay option. (Those two options should be on by default i.m.o. but they are not). The short oplog span made during a normal dump will be at  <dump_directory>/oplog.rs.bson, but the --oplogFile argument lets you choose any arbitrary path.
--oplogLimit
We could have limited the oplog docs during mongodump to only include those before the disaster time with –query parameter such as the following:
mongodump ... --query '{"ts": {"$lt": new Timestamp(1560915610, 0)}}' ...
But --oplogLimit makes it easier. You can dump everything, but then use --oplogLimit <epoch_sec_value>[:<counter>] when you run mongorestore with the –oplogReplay argument.
If you’re getting confused about whether it’s UTC or your server timezone – it’s UTC. All timestamps inside MongoDB are UTC if they represent ‘wall clock’ times, and for ‘logical clocks’ timezone is a non-applicable concept.
When the oplog includes system collection updates
In the built-in roles documentation, inserted after the usual and mostly fair warnings on why you should not grant users the most powerful internal role, comes this extra note that tells you what you actually need to do to allow oplog-replay updates on all system collections too:
If you need access to all actions on all resources, for example to run applyOps commands … create a user-defined role that grants anyAction on anyResource and ensure that only the users who need access to these operations have this access.
Translation: if your oplog replay fails because it hit a system collection update the “restore” role doesn’t cover, upgrade your user to be able to run with all the privileges that a secondary runs oplog replication with.
| 
					 1 2 3 4 5 6 7 8 9 10 11 12 13  | 
						use admin db.createRole({   "role": "CustomAllPowersRole",    "privileges": [      { "resource": { "anyResource": true }, "actions": [ "anyAction" ] },    ],   "roles": [ ] }); db.grantRolesToUser("<bk_and_restore_username>", [ "CustomAllPowersRole" ]) //For afterwards: //use admin //db.revokeRolesFromUser("<bk_and_restore_username>", [ "CustomAllPowersRole" ]) //db.dropRole("CustomAllPowersRole")  | 
					
Alternatively, to granting the role shown above, you could restart the mongod with security disabled; in this mode, all operations work without access control restrictions.
It’s not quite as simple as that though because transaction stuff is currently (v3.6, v4.0) throwing a spanner in the works. So I’ve found explicitly excluding config.system.sessions and config.transactions during mongodump is the best way to avoid those updates. They are logically unnecessary in a restore because the sessions/transactions finished when the replica set was completely shut down.
						
						
						
						
						
Akira, well written, very helpful, thank you very much!