In the previously posted blog, Compaction in Percona Server for MongoDB (PSMDB), we discussed how compact works before version 4.4. In this blog, we will see how compact works on PSMDB 6.0.
I recommend reading the blog post linked above to understand what compact does, how to check dataSize, and how much space we can reclaim.
Below is a collection name demo under the test database, from which we have deleted the data. Check the count before and after the delete:
| 1 2 3 4 5 6 7 8 | rs1:PRIMARY> db.demo.count() 2578000 rs1:PRIMARY> rs1:PRIMARY> db.demo.remove({age:{"$gte":35}}) WriteResult({ "nRemoved" : 1665610 }) rs1:PRIMARY> db.demo.count() 912390 rs1:PRIMARY> | 
When a large chunk of data is deleted from a collection, and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system to be used by your other databases or collections. Use the below command to check how much data we can reclaim in the demo collection:
| 1 2 3 | rs1:PRIMARY> db.demo.stats().wiredTiger["block-manager"]["file bytes available for reuse"] 306188288 rs1:PRIMARY> | 
Note: The above command output will come in bytes.
Prerequisites
Once you have the details and have decided which collection you want to compact and reclaim space, please keep in mind the following before running the compact command:
- Always take the full backup of the database.
- The user should have the required privilege to run the compact command.
- Always run compact on the secondary nodes/hidden nodes or nodes with low priority, following the Primary node in last after stepping it down.
- In a replica set, the compact command must be run on each node.
- In a shard cluster, the compact command needs to be run on each node of every shard. Compact cannot be run against the mongos.
- In PSMDB 4.4 or newer, compact will only block the metadata operations like dropping collection, dropping index, creating a new index.
- From PSMDB 5.0.12 and 6.0.2, compact running on secondary nodes can replicate the data and reads are permitted on that secondary node.
We have used PSMDB 6.0.14 in this blog. Now, we will run the compact command on the demo collection on a secondary node, start writing the same data to the collection demo from the primary, and read the data from the same collection on the secondary node when compact is running.
First, we will start a script to insert the sample data in the demo collection:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 | rs1:PRIMARY> db.version() 6.0.14-11 rs1:PRIMARY> rs1:PRIMARY> Date() Sun Jun 23 2024 15:41:01 GMT+0000 (UTC) rs1:PRIMARY> for (var i = 1; i <= 1000000; i++) { ...   db.demo.insert( ...     { ...       name : "name"+i, ...       birthday: getRandomDate() ...     } ...   ) ... } | 
Now, we will compact the collection on the secondary node:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | rs1:SECONDARY> hostname() ip-172-31-92-155.ec2.internal rs1:SECONDARY>  rs1:SECONDARY> db.runCommand({compact: "demo" }) { 	"bytesFreed" : 507662336, 	"ok" : 1, 	"$clusterTime" : { 		"clusterTime" : Timestamp(1719157444, 154), 		"signature" : { 			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), 			"keyId" : NumberLong(0) 		} 	}, 	"operationTime" : Timestamp(1719157444, 152) } rs1:SECONDARY> db.demo.stats().wiredTiger["block-manager"]["file bytes available for reuse"] 34000896 rs1:SECONDARY> | 
Above, we can see that the compact has reclaimed the disk space. In the logs of the secondary node as well we can see the start and end time of the compact command:
| 1 2 | {"t":{"$date":"2024-06-23T15:43:50.693+00:00"},"s":"I",  "c":"COMMAND",  "id":20284,   "ctx":"conn1105993","msg":"Compact begin","attr":{"namespace":"test.demo"}} {"t":{"$date":"2024-06-23T15:44:04.198+00:00"},"s":"I",  "c":"STORAGE",  "id":20286,   "ctx":"conn1105993","msg":"Compact end","attr":{"namespace":"test.demo","freedBytes":507662336}} | 
When the compact command was running, we could see the data was being replicated to the secondary node. Below is the output from the secondary node oplog:
| 1 2 3 4 5 6 | rs1:SECONDARY> db.oplog.rs.find({"op":"i","ns": "test.demo"}).sort({$natural:-1}).skip(29500).limit(2) { "op" : "i", "ns" : "test.demo", "ui" : UUID("cc364994-d259-464e-bf7e-a02d4746f910"), "o" : { "_id" : ObjectId("667842c1c078dd4e9674de7f"), "name" : "name144980", "birthday" : ISODate("1980-03-26T00:00:00Z") }, "o2" : { "_id" : ObjectId("667842c1c078dd4e9674de7f") }, "ts" : Timestamp(1719157441, 177), "t" : NumberLong(92), "v" : NumberLong(2), "wall" : ISODate("2024-06-23T15:44:01.218Z") } { "op" : "i", "ns" : "test.demo", "ui" : UUID("cc364994-d259-464e-bf7e-a02d4746f910"), "o" : { "_id" : ObjectId("667842c1c078dd4e9674de7e"), "name" : "name144979", "birthday" : ISODate("1951-02-03T00:00:00Z") }, "o2" : { "_id" : ObjectId("667842c1c078dd4e9674de7e") }, "ts" : Timestamp(1719157441, 176), "t" : NumberLong(92), "v" : NumberLong(2), "wall" : ISODate("2024-06-23T15:44:01.216Z") } rs1:SECONDARY> hostname() ip-172-31-92-155.ec2.internal rs1:SECONDARY> | 
We also tried to read the data and were able to successfully read it when compact was running:
| 1 2 3 4 5 6 7 8 9 | rs1:SECONDARY> Date() Sun Jun 23 2024 15:42:11 GMT+0000 (UTC) rs1:SECONDARY> db.demo.find({age:{"$lte":21}}).limit(2) { "_id" : ObjectId("65d83956967bd2f98e9b89f3"), "name" : "Aaron Allen", "age" : 21, "emails" : [ "[email protected]", "[email protected]", "[email protected]" ] } { "_id" : ObjectId("65d83929967bd2f98e93d51b"), "name" : "Aaron Armstrong", "age" : 21, "emails" : [ "[email protected]", "[email protected]", "[email protected]" ] } rs1:SECONDARY> rs1:SECONDARY>rs1:SECONDARY> hostname() ip-172-31-92-155.ec2.internal rs1:SECONDARY> | 
You can check the progress of compaction in the Mongo logs or by running the db.currentOp() command in another shell instance.
Once the collections are compacted, please check the reclaimed space using the same command we used to determine how much space we can reclaim. You can also check the disk space on the OS level.
Conclusion
Sometimes, when a large collection is compacted, the compact command immediately returns OK, but in reality, the physical space of the collection remains unchanged. This is because WiredTiger deems that the collection does not need to be compacted. To overcome this, you need to run the compact command again until it releases the space.
Before PSMDB 4.4, it was always advisable to run compact in a scheduled maintenance window due to the nature of the command, which blocks all the read/write operations. Starting from the PSMDB 4.4 version, you can plan to run it at any time.
Percona Server for MongoDB is an open source replacement for MongoDB Community Edition that combines all of the features and benefits of MongoDB Community Edition with enterprise-class features developed by Percona: LDAP Authentication and Authorization, Audit Logging, Kerberos Authentication, and hot backups. To learn more about the enterprise-grade features available in the vendor lock-in-free Percona Server for MongoDB, we recommend reading our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered.
 
 
 
 
						 
						 
						 
						 
						