MongoDB has always made it relatively easy to scale horizontally, but with version 8.0, the database takes a significant step forward. If you’re working with large datasets or high-throughput applications, some of the changes in this release will make your life a lot easier — and your architecture cleaner.
Let’s take a look at some of the most important improvements in MongoDB 8.0 you should leverage when it comes to scalability and sharding.
Look also at this article about the performance tests we did on MongoDB 8.0.
Resharding just got way faster
In previous versions of MongoDB, resharding a collection, for example, switching from sharding by userId to region, could be a long, resource-intensive process. Years ago, I wrote an article on the blog when resharding came to life. The article showed how expensive the process was. Take a look at that article for your reference: Resharding in MongoDB 5.0.
With 8.0, things are different: resharding is now significantly faster and uses about half the memory compared to earlier releases.
This makes it much more realistic to adjust your sharding strategy as your application evolves. You’re no longer locked into the decisions you made on day one; now, you can adapt without bringing your cluster to its knees.
The documentation and other articles state the improvement, but I’m curious by nature and would like to test it for myself.
I deployed two sharded clusters (by the way, using our Percona Operator for MongoDB to deploy as fast as I could). Both clusters have two shards, three nodes each (Primary/Secondary/Secondary), and the same resources in terms of CPU and memory for all nodes: 4 CPU and 16GB RAM. The first is running MongoDB 7.0.18, and the second is MongoDB 8.0.8.
In both clusters, I created the same people collection using a simple script. It is populated with random data. Here you can see a sample document:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
[direct: mongos] test> db.people.findOne() { _id: ObjectId('679816dd97d164797c6566fe'), name: 'sndew zblwcup', surname: 'fgpmb weppyjy', address: 'iufwpjypaq bugscagnyy', age: 68, registrationDate: ISODate('2018-04-11T02:16:43.695Z'), country: 'xxgxyfvvsrrkbmi', city: 'lxtgatqllkwabaq', gender: 'bwkcw', name2: 'eipih zgttwjs', surname2: 'livkn jokarrk', address2: 'krxthzjxlf hxohswjiyx', age2: 68, registrationDate2: ISODate('2022-10-24T08:28:23.985Z'), country2: 'pbvfwcdgonrnduu', city2: 'qxhosmdqqwsqxdv', gender2: 'cyzlw' } |
The people collection contains 3.5 GB of uncompressed data, and it is sharded using the field { name: 1 }. I’ll use the same collection for running the same resharding operation, converting the shard key to { email: 1 }.
Let’s see what happens from the performance perspective on both MongoDB 7.0 and 8.0
Let’s launch the resharding:
1 2 3 4 |
[direct: mongos] test> db.adminCommand({ ... reshardCollection: "test.people", ... key: {email:1} ... }) |
Here are the results.
MongoDB 7.0
The complete resharding of the collection took around three minutes.
In the following graphs, you can see the CPU usage up to 30% with spikes to 62%, and the disk IOPS consumed, not impressive, 2K at maximum..
MongoDB 8.0
On this version, the same resharding took a little less than one minute. It was three times faster; that’s impressive. The collection in this case was just a few GB, and the cluster was idle. Further tests need to be done with the largest collections, which have more workload on the cluster. Anyway, this result is really impressive. Resharding is definitively faster in 8.0.
The following graphs show the CPU and disk IOPS utilization. The kind of utilization is pretty much the same, but for less time.
Then, I did a few more tests, increasing the size of the collection to 100GB. The following graph compares the two versions.
Move or unshard collections when you need to
Two new commands, moveCollection and unshardCollection, give you a lot more flexibility in managing your data layout:
- Use moveCollection to move an unsharded collection to a different shard.
- Use unshardCollection if a collection no longer needs to be sharded.
With moveCollection, you can move a collection to another shard without needing to dump, reload, or manually reconfigure sharding.
1 2 3 4 |
db.adminCommand({ moveCollection: "<your_collection>", toShard: "<ID of the recipient shard>" }) |
Use the listShards command to retrieve the ID of the recipient shard.
When you have data that doesn’t require sharding anymore (because it’s rarely accessed), you can simplify it for example as follows:
1 2 3 |
db.adminCommand({ unshardCollection: "archive.activity_logs" }) |
That’s a clean, low-impact way to reorganize your data layout. The data is automatically moved onto a single shard, the primary shard for the database.
Config servers can now store data
Here’s a big architectural change: in MongoDB 8.0, config servers (which used to only store metadata) can now also hold application data. They effectively become “config shards.”
This is a way for smaller clusters to reduce infrastructure overhead: fewer nodes, simpler setups, and better resource utilization, ultimately resulting in a cheaper cluster.
For large clusters, this doesn’t seem to be a valuable option. It is better to have distinct roles in the cluster’s replica sets for better maintenance. Indeed, for a cluster with a lot of shards, the impact on the costs of having the config server deal with data is really minimal.
Example: Lightweight sharded cluster for internal tools
Imagine you’re building a small internal analytics dashboard for your company. You want to use sharding for future scalability, but don’t want to deploy and maintain a full set of dedicated shards. In MongoDB 8.0, you can run a 3-node config server replica set and store both metadata and your actual application collections on it — saving on hardware and complexity, while still benefiting from horizontal scaling.
To configure a dedicated config server to run as a config shard, run the transitionFromDedicatedConfigServer command.
To configure a config shard to run as a dedicated config server, run the transitionToDedicatedConfigServer command.
The following configures a config shard to run as a dedicated config server:
1 2 3 |
db.adminCommand( { transitionToDedicatedConfigServer: 1 } ) |
Better visibility into what’s happening
MongoDB 8.0 improves how you monitor and control sharded clusters:
- New commands like abortMoveCollection and enhanced versions of existing ones give you more control during migrations. With this, you can stop moving an unsharded collection.
- The serverStatus output now includes a shardingStatistics section, offering better insight into chunk activity, balancing behavior, and more.
Example: Diagnosing imbalanced chunks
Let’s say you’ve noticed increased latency for queries targeting a particular shard. With the new metrics in serverStatus.shardingStatistics, you can quickly check how many chunks are assigned to each shard and identify imbalances or other issues:
1 |
db.serverStatus().shardingStatistics |
The command can be executed on both mongos and shards.
Conclusions
MongoDB 8.0 doesn’t just add features; it makes scaling and managing a distributed database less painful and more flexible. Whether you’re already running a sharded setup or planning to scale in the near future, the improvements to resharding, collection mobility, config shard usage, and observability are well worth exploring.
MongoDB 8.0 provides impressively faster resharding, but remember that the task remains resource-intensive. For the test, I used a pretty small collection, but in a real case, a collection in the TB order of magnitude could take several hours or even days in the worst scenario to complete the process. CPU, disk, and memory utilization remain relevant, but fortunately for less time.
If you haven’t looked into Percona Server for MongoDB 8.0 yet, this might be the push you need.