In this blog post, we will discuss some of the Sharding improvements in the recent MongoDB 3.4 GA release.
Let’s go over what MongoDB Sharding “is” at a simplified, high level.
The concept of “sharding” exists to allow MongoDB to scale to very large data sets that may exceed the available resources of a single node or replica set. When a MongoDB collection is sharding-enabled, it’s data is broken into ranges called “chunks.” These are intended to be evenly distributed across many nodes or replica sets (called “shards”). MongoDB computes the ranges of a given chunk based on a mandatory document-key called a “shard key.” The shard key is used in all read and write queries to route a database request to the right shard.
The MongoDB ecosystem introduced additional architectural components so that this could happen:

Under sharding, all client database traffic is directed to one or more of the mongos router process(es), which use the cluster metadata, such as the chunk ranges, the members of the cluster, etc., to service requests while keeping the details of sharding transparent to the client driver. Without the cluster metadata, the routing process(es) do not know where the data is, making the config servers a critical component of sharding. Due to this, at least three config servers are required for full fault tolerance.
To ensure chunks are always balanced among the cluster shards, a process named the “chunk balancer” (or simply “balancer”) runs periodically, moving chunks from shard to shard to ensure data is evenly distributed. When a chunk is balanced, the balancer doesn’t actually move any data, it merely coordinates the transfer between the source and destination shard and updates the cluster metadata when chunks have moved.

Before MongoDB 3.4, the chunk balancer would run on whichever mongos process could acquire a cluster-wide balancer lock first. From my perspective this was a poor architectural decision for these reasons:
Luckily, MongoDB 3.4 has come to check this problem (and many others) off of my holiday wish list!
In MongoDB 3.4, the chunk balancer was moved to the Primary config server, bringing these solutions to my concerns about the chunk balancer:
Note: although we expect the overhead of the balancer to be negligible, keep in mind that a minor overhead is added to the config server Primary-node due to this change.
See more about this change here: https://docs.mongodb.com/manual/release-notes/3.4/#balancer-on-config-server-primary
In MongoDB releases before 3.2, the set of cluster config servers received updates using a mode called Sync Cluster Connection Config (SCCC) to ensure all nodes received the same change. This essentially meant that any updates to cluster metadata would be sent N x times from the mongos to the config servers in a fan-out pattern. This is another legacy design choice that always confused me, considering MongoDB already has facilities for reliably replicating data: MongoDB Replication. Plus without transactions in MongoDB, there are some areas where SCCC can fail.
Luckily MongoDB 3.2 introduced Replica-Set based config servers as an optional feature. This moved us away from the SCCC fan-out mode to traditional replication and write concerns for consistency. This brought many benefits: rebuilding a config server node became simpler, backups became more straightforward and flexible and the move towards a consistent method of achieving consistent updates simplified the architecture.
MongoDB 3.4 requires Replica-Set based config servers, and removed the SCCC mode entirely. This might require some changes for some, but I think the benefits outweigh the cost. For more details on how to upgrade from SCCC to Replica-Set based config servers, see this article.
Note: the balancer in MongoDB 3.4 always runs on the config server that is the ‘PRIMARY’ of the replica set.
As intended, MongoDB Sharded Clusters can get very big, with 10s, 100s or even 1000s of shards. Historically MongoDB’s balancer worked in serial, meaning it could only coordinate 1 x chunk balancing round at any given time within the cluster. On very large clusters, this limitation poses a huge throughput limitation on balancing: all chunk moves have to wait in a serial queue.
In MongoDB 3.4, the chunk balancer can now perform several chunk moves in parallel given they’re between a unique source and destination shard. Given shards: A, B, C and D, this means that a migration from A -> B can now happen at the same time as a migration from C -> D as they’re mutually exclusive source and destination shards. Of course, you need four or more shards to really see the benefit of this change.
Of course, on large clusters this change could introduce a significant change in network bandwidth usage. This is due to the ability for several balancing operations to occur at once. Be sure to test your network capacity with this change.
See more about this change here: https://docs.mongodb.com/manual/release-notes/3.4/#faster-balancing
Of course, there were many other improvements to sharding and other areas in 3.4. We hope to cover more in the future. These are just some of my personal highlights.
For more information about what has changed in the new GA release, see: MongoDB 3.4 Release Notes. Also, please check out our beta release of Percona Server for MongoDB 3.4. This includes all the improvements in MongoDB 3.4 plus additional storage engines and features.
Links
Resources
RELATED POSTS