This article was written with the main purpose of showing you how to determine zones on a shard when using compound shard keys.
Defining Zones in Shards means pre-defining where certain chunks will be stored and amongst which set of particular shards they will be balanced according to the shard key definition.
MongoDB 4.4 brings the possibility to shard a collection and determine zones by compound keys, including mixing a hash key with non-hashed keys. Hashed keys may be placed in the prefix of the index (shard key) or not. Depending on the method chosen, different settings must be in compliance with the balancer and this article will also show you a couple of examples.
Even though it is not always required to create indexes in advance of running the “shardCollection” command, I am defining it to better illustrate the procedure
|
1 |
mongos> db.getSiblingDB("dbtest").colltest1.createIndex({_id:"hashed","location":1},{unique:false})<br>{<br> "raw" : {<br> "shard02/localhost:40003" : {<br> "createdCollectionAutomatically" : true,<br> "numIndexesBefore" : 1,<br> "numIndexesAfter" : 2,<br> "commitQuorum" : "votingMembers",<br> "ok" : 1<br> }<br> },<br> "ok" : 1,<br> "operationTime" : Timestamp(1624357456, 7),<br> "$clusterTime" : { ... }<br>} |
|
1 |
mongos> db.getSiblingDB("dbtest").colltest2.createIndex({"location":1,_id:"hashed"},{unique:false})<br>{<br> "raw" : {<br> "shard02/localhost:40003" : {<br> "createdCollectionAutomatically" : true,<br> "numIndexesBefore" : 1,<br> "numIndexesAfter" : 2,<br> "commitQuorum" : "votingMembers",<br> "ok" : 1<br> }<br> },<br> "ok" : 1,<br> "operationTime" : Timestamp(1624357534, 2),<br> "$clusterTime" : { ... }<br>} |
It is important to highlight a couple of important points about creating the indexes for the shard keys:
There are, basically, a couple of requirements to ensure that the collection will be in compliance with the balancer and the initial chunk distribution will be optimally performed.
The below example shows how to shard the collection colltest1.
|
1 |
mongos> sh.addShardToZone("shard01", "LATAM")<br>{<br> "ok" : 1,<br> "operationTime" : Timestamp(1624359227, 38),<br> "$clusterTime" : { ... }<br>}<br>mongos> sh.addShardToZone("shard02", "LATAM")<br>{<br> "ok" : 1,<br> "operationTime" : Timestamp(1624359231, 1),<br> "$clusterTime" : {... }<br>} |
If you check the output of the sh.status(), you will notice that tags were created and assigned to the shards according to the zones definition:
|
1 |
mongos> sh.status()<br>--- Sharding Status --- <br> sharding version: {<br> "_id" : 1,<br> "minCompatibleVersion" : 5,<br> "currentVersion" : 6,<br> "clusterId" : ObjectId("60d1be52be9f36000b01a9f0")<br> }<br> shards:<br> { "_id" : "shard01", "host" : "shard01/localhost:40002", "state" : 1, "tags" : [ "LATAM" ] }<br> { "_id" : "shard02", "host" : "shard02/localhost:40003", "state" : 1, "tags" : [ "LATAM" ] } <br><<the rest of status output was intentionally truncated to avoid unnecessary verbosity>> |
|
1 |
mongos> sh.updateZoneKeyRange("dbtest.colltest1",{ "_id" : MinKey, "location" : MinKey },{ _id:MaxKey,"location" : MaxKey },"LATAM");<br>{<br> "ok" : 1,<br> "operationTime" : Timestamp(1624361452, 1),<br> "$clusterTime" : { ... }<br>} |
|
1 |
mongos> sh.enableSharding("dbtest")<br>{<br> "ok" : 1,<br> "operationTime" : Timestamp(1624361047, 6),<br> "$clusterTime" : { ... }<br>}<br>mongos> sh.shardCollection("dbtest.colltest1",{_id:"hashed","location":1},false,{ presplitHashedZones: true })<br>{<br> "collectionsharded" : "dbtest.colltest1",<br> "collectionUUID" : UUID("30b2efff-2e2e-4c72-9a82-9593f227002f"),<br> "ok" : 1,<br> "operationTime" : Timestamp(1624361528, 31),<br> "$clusterTime" : { ... }<br>} |
In this example, the namespace dbtest.colltest1 will be evenly distributed according to the zone LATAM which will reach the shards shard01 and shard02. Looking at the sh.status() again, you will see that the initial chunks were created following the range defined above.
This example section will be a little bit more complicated to make the shard key compliant with the balancer for the initial chunk distribution.
The zones were defined differently for this example:
|
1 |
mongos> sh.status()<br>--- Sharding Status --- <br> sharding version: {<br> "_id" : 1,<br> "minCompatibleVersion" : 5,<br> "currentVersion" : 6,<br> "clusterId" : ObjectId("60d1be52be9f36000b01a9f0")<br> }<br> shards:<br> { "_id" : "shard01", "host" : "shard01/localhost:40002", "state" : 1, "tags" : [ "EU" ] }<br> { "_id" : "shard02", "host" : "shard02/localhost:40003", "state" : 1, "tags" : [ "LATAM" ] }<br> { "_id" : "shard03", "host" : "shard03/localhost:40004", "state" : 1, "tags" : [ "AMER" ] }<br> { "_id" : "shard04", "host" : "shard04/localhost:40005", "state" : 1, "tags" : [ "APAC" ] } |
|
1 |
mongos> sh.updateZoneKeyRange("dbtest2.colltest2",{ "location": "DC01", "_id" : MinKey },{ "location": "DC02", "_id" : MinKey },"LATAM");<br>{<br> "ok" : 1,<br> "operationTime" : Timestamp(1624364375, 1),<br> "$clusterTime" : { ... }<br>}<br>mongos> sh.updateZoneKeyRange("dbtest2.colltest2",{ "location": "DC02", "_id" : MinKey },{ "location": MaxKey, "_id" : MinKey },"EU");<br>{<br> "ok" : 1,<br> "operationTime" : Timestamp(1624364430, 1),<br> "$clusterTime" : { ... }<br>} |
|
1 |
mongos> sh.enableSharding("dbtest2")<br>{<br> "ok" : 1,<br> "operationTime" : Timestamp(1624366057, 5),<br> "$clusterTime" : { ... }<br>}<br>mongos> sh.shardCollection("dbtest2.colltest2",{"location":1,_id:"hashed"},false,{ presplitHashedZones: true })<br>{<br> "collectionsharded" : "dbtest2.colltest2",<br> "collectionUUID" : UUID("b8a64972-97df-4791-8019-9526d2f8d405"),<br> "ok" : 1,<br> "operationTime" : Timestamp(1624366085, 37),<br> "$clusterTime" : { ... }<br>}<br>mongos> |
The above example basically describes how to use a non-hashed field on the prefix of a shard key to ensure that certain values of that field will reach certain zones (determined as tags on shards) and the boundaries applied on the hashed field will ensure even distribution. In that case, all the docs of the namespace dbtest2.colltest2 with the minimum _id of location DC01 and the minimum _id of the location DC02, will be placed on the zone LATAM (shard02). And the next docs from the minimum _id of the location DC02 until the rest will be placed on the zone EU (shard 01)
It is very important to highlight that if the collection is not in compliance with the balancer, the migration of the chunks will never happen, though all the chunks will stay on the Primary Shard. It is possible to predict that situation right after enabling the sharding on the collection by looking at the output of the command sh.balancerCollectionStatus
|
1 |
mongos> sh.balancerCollectionStatus("dbtest.colltest1")<br>{<br> "balancerCompliant" : true,<br> "ok" : 1,<br> "operationTime" : Timestamp(1623691376, 1),<br> "$clusterTime" : { ... }<br>} |
If the balancerCompliant is true, means that the balancer will be able to split and migrate the chunks.
Defining the shard key is the most important step of deploying a healthy sharded cluster. Having a field on the shard key which contains a few distinct values would compromise the shard distribution, and as consequence, the performance. Hence, this is a great improvement coming along in MongoDB 4.4 which makes it possible to have keys with low cardinality defining the boundaries, yet still ensuring that the shard will rely on a hashed distribution based on a very selective key.
Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.
Resources
RELATED POSTS