In the previously posted blog, Compaction in Percona Server for MongoDB (PSMDB), we discussed how compact works before version 4.4. In this blog, we will see how compact works on PSMDB 6.0.
I recommend reading the blog post linked above to understand what compact does, how to check dataSize, and how much space we can reclaim.
Below is a collection name demo under the test database, from which we have deleted the data. Check the count before and after the delete:
|
1 |
rs1:PRIMARY> db.demo.count()<br>2578000<br>rs1:PRIMARY><br>rs1:PRIMARY> db.demo.remove({age:{"$gte":35}})<br>WriteResult({ "nRemoved" : 1665610 })<br>rs1:PRIMARY> db.demo.count()<br>912390<br>rs1:PRIMARY><br> |
When a large chunk of data is deleted from a collection, and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system to be used by your other databases or collections. Use the below command to check how much data we can reclaim in the demo collection:
|
1 |
rs1:PRIMARY> db.demo.stats().wiredTiger["block-manager"]["file bytes available for reuse"]<br>306188288<br>rs1:PRIMARY> |
Note: The above command output will come in bytes.
Once you have the details and have decided which collection you want to compact and reclaim space, please keep in mind the following before running the compact command:
We have used PSMDB 6.0.14 in this blog. Now, we will run the compact command on the demo collection on a secondary node, start writing the same data to the collection demo from the primary, and read the data from the same collection on the secondary node when compact is running.
First, we will start a script to insert the sample data in the demo collection:
|
1 |
rs1:PRIMARY> db.version()<br>6.0.14-11<br>rs1:PRIMARY><br>rs1:PRIMARY> Date()<br>Sun Jun 23 2024 15:41:01 GMT+0000 (UTC)<br>rs1:PRIMARY> for (var i = 1; i <= 1000000; i++) {<br>... db.demo.insert(<br>... {<br>... name : "name"+i,<br>... birthday: getRandomDate()<br>... }<br>... )<br>... } |
Now, we will compact the collection on the secondary node:
|
1 |
rs1:SECONDARY> hostname()<br>ip-172-31-92-155.ec2.internal<br>rs1:SECONDARY> <br>rs1:SECONDARY> db.runCommand({compact: "demo" })<br>{<br> "bytesFreed" : 507662336,<br> "ok" : 1,<br> "$clusterTime" : {<br> "clusterTime" : Timestamp(1719157444, 154),<br> "signature" : {<br> "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),<br> "keyId" : NumberLong(0)<br> }<br> },<br> "operationTime" : Timestamp(1719157444, 152)<br>}<br>rs1:SECONDARY> db.demo.stats().wiredTiger["block-manager"]["file bytes available for reuse"]<br>34000896<br>rs1:SECONDARY> |
Above, we can see that the compact has reclaimed the disk space. In the logs of the secondary node as well we can see the start and end time of the compact command:
|
1 |
{"t":{"$date":"2024-06-23T15:43:50.693+00:00"},"s":"I", "c":"COMMAND", "id":20284, "ctx":"conn1105993","msg":"Compact begin","attr":{"namespace":"test.demo"}}<br>{"t":{"$date":"2024-06-23T15:44:04.198+00:00"},"s":"I", "c":"STORAGE", "id":20286, "ctx":"conn1105993","msg":"Compact end","attr":{"namespace":"test.demo","freedBytes":507662336}}<br> |
When the compact command was running, we could see the data was being replicated to the secondary node. Below is the output from the secondary node oplog:
|
1 |
rs1:SECONDARY> db.oplog.rs.find({"op":"i","ns": "test.demo"}).sort({$natural:-1}).skip(29500).limit(2)<br>{ "op" : "i", "ns" : "test.demo", "ui" : UUID("cc364994-d259-464e-bf7e-a02d4746f910"), "o" : { "_id" : ObjectId("667842c1c078dd4e9674de7f"), "name" : "name144980", "birthday" : ISODate("1980-03-26T00:00:00Z") }, "o2" : { "_id" : ObjectId("667842c1c078dd4e9674de7f") }, "ts" : Timestamp(1719157441, 177), "t" : NumberLong(92), "v" : NumberLong(2), "wall" : ISODate("2024-06-23T15:44:01.218Z") }<br>{ "op" : "i", "ns" : "test.demo", "ui" : UUID("cc364994-d259-464e-bf7e-a02d4746f910"), "o" : { "_id" : ObjectId("667842c1c078dd4e9674de7e"), "name" : "name144979", "birthday" : ISODate("1951-02-03T00:00:00Z") }, "o2" : { "_id" : ObjectId("667842c1c078dd4e9674de7e") }, "ts" : Timestamp(1719157441, 176), "t" : NumberLong(92), "v" : NumberLong(2), "wall" : ISODate("2024-06-23T15:44:01.216Z") }<br>rs1:SECONDARY> hostname()<br>ip-172-31-92-155.ec2.internal<br>rs1:SECONDARY> |
We also tried to read the data and were able to successfully read it when compact was running:
|
1 |
rs1:SECONDARY> Date()<br>Sun Jun 23 2024 15:42:11 GMT+0000 (UTC)<br>rs1:SECONDARY> db.demo.find({age:{"$lte":21}}).limit(2)<br>{ "_id" : ObjectId("65d83956967bd2f98e9b89f3"), "name" : "Aaron Allen", "age" : 21, "emails" : [ "[email protected]", "[email protected]", "[email protected]" ] }<br>{ "_id" : ObjectId("65d83929967bd2f98e93d51b"), "name" : "Aaron Armstrong", "age" : 21, "emails" : [ "[email protected]", "[email protected]", "[email protected]" ] }<br>rs1:SECONDARY><br>rs1:SECONDARY>rs1:SECONDARY> hostname()<br>ip-172-31-92-155.ec2.internal<br>rs1:SECONDARY> |
You can check the progress of compaction in the Mongo logs or by running the db.currentOp() command in another shell instance.
Once the collections are compacted, please check the reclaimed space using the same command we used to determine how much space we can reclaim. You can also check the disk space on the OS level.
Sometimes, when a large collection is compacted, the compact command immediately returns OK, but in reality, the physical space of the collection remains unchanged. This is because WiredTiger deems that the collection does not need to be compacted. To overcome this, you need to run the compact command again until it releases the space.
Before PSMDB 4.4, it was always advisable to run compact in a scheduled maintenance window due to the nature of the command, which blocks all the read/write operations. Starting from the PSMDB 4.4 version, you can plan to run it at any time.
Percona Server for MongoDB is an open source replacement for MongoDB Community Edition that combines all of the features and benefits of MongoDB Community Edition with enterprise-class features developed by Percona: LDAP Authentication and Authorization, Audit Logging, Kerberos Authentication, and hot backups. To learn more about the enterprise-grade features available in the vendor lock-in-free Percona Server for MongoDB, we recommend reading our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered.
Resources
RELATED POSTS