Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Zone Based Sharding in MongoDB

June 13, 2018

Author

Dev Montiontactic

Insight for Developers

MongoDB

Security

Share this Post:

MongoDB shard zones In this blog post, we will discuss about how to use zone based sharding to deploy a sharded MongoDB cluster in a customized manner so that the queries and data will be redirected per geographical groupings. This feature of MongoDB is a part of its Data Center Awareness, that allows queries to be routed to particular MongoDB deployments considering physical locations or configurations of mongod instances.

Before moving on, let’s have an overview of this feature. You might already have some questions about zone based sharding. Was it recently introduced? If zone-based sharding is something we should use, then what about tag-aware sharding?

MongoDB supported tag-aware sharding from even the initial versions of MongoDB. This means tagging a range of shard keys values, associating that range with a shard, and redirecting operations to that specific tagged shard. This tag-aware sharding, since version 3.4, is referred to as ZONES. So, the only change is its name, and this is the reason sh.addShardTag(shard, tag) method is being used.

How it works

With the help of a shard key, MongoDB allows you to create zones of sharded data – also known as shard zones.

Each zone can be associated with one or more shards.

Similarly, a shard can associate with any number of non-conflicting zones.

MongoDB migrates chunks to the zone range in the selected shards.

MongoDB routes read and write to a particular zone range that resides in particular shards.

Useful for what kind of deployments/applications?

In cases where data needs to be routed to a particular shard due to some hardware configuration restrictions.

Zones can be useful if there is the need to isolate specific data to a particular shard. For example, in the case of GDPR compliance that requires businesses to protect data and privacy for an individual within the EU.

If an application is being used geographically and you want a query to route to the nearest shards for both reads and writes.

Let’s consider a Scenario

Consider the scenario of a school where students are experts in Biology, but most students are experts in Maths. So we have more data for the maths students compare to Biology students. In this example, deployment requires that Maths students data should route to the shard with the better configuration for a large amount of data. Both read and write will be served by specific shards. All the Biology students will be served by another shard. To implement this, we will add a tag to deploy the zones to the shards.

For this scenario we have an environment with:

DB: “school”

Collection: “students”

Fields: “sId”, “subject”, “marks” and so on..

Indexed Fields: “subject” and “sId”

We enable sharding:

sh.enableSharding("school")

1	sh.enableSharding("school")

And create a shardkey: “subject” and “sId”

sh.shardCollection("school.students", {subject: 1, sId: 1});

1	sh.shardCollection("school.students", {subject: 1, sId: 1});

We have two shards in our test environment

shards:

{  "_id" : "shard0000",  "host" : "127.0.0.1:27001",  "state" : 1 }
{  "_id" : "shard0001",  "host" : "127.0.0.1:27002",  "state" : 1 }

1 2	{ "_id" : "shard0000", "host" : "127.0.0.1:27001", "state" : 1 } { "_id" : "shard0001", "host" : "127.0.0.1:27002", "state" : 1 }

Zone Deployment

1) Disable balancer

To prevent migration of the chunks across the cluster, disable the balancer for the “students” collection:

mongos> sh.disableBalancing("school.students")
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

1 2	mongos> sh.disableBalancing("school.students") WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Before proceeding further make sure the balancer is not running. It is not a mandatory process, but it is always a good practice to make sure no migration of chunks takes place while configuring zones

mongos> sh.isBalancerRunning()
false

1 2	mongos> sh.isBalancerRunning() false

2) Add shard to the zone

A zone can be associated with a particular shard in the form of a tag, using the sh.addShardTag(), so a tag will be added to each shard. Here we are considering two zones so the tags “MATHS” and “BIOLOGY” need to be added.

mongos> sh.addShardTag( "shard0000" , "MATHS");
{ "ok" : 1 }
mongos> sh.addShardTag( "shard0001" , "BIOLOGY");
{ "ok" : 1 }

mongos> sh.addShardTag( "shard0000" , "MATHS");

{ "ok" : 1 }

mongos> sh.addShardTag( "shard0001" , "BIOLOGY");

{ "ok" : 1 }

We can see zones are assigned in the form of tags as required against each shard.

mongos> sh.status()
shards:
{  "_id" : "shard0000",  "host" : "127.0.0.1:27001",  "state" : 1,  "tags" : [ "MATHS" ] }
{  "_id" : "shard0001",  "host" : "127.0.0.1:27002",  "state" : 1,  "tags" : [ "BIOLOGY" ] }

mongos> sh.status()

shards:

{ "_id" : "shard0000", "host" : "127.0.0.1:27001", "state" : 1, "tags" : [ "MATHS" ] }

{ "_id" : "shard0001", "host" : "127.0.0.1:27002", "state" : 1, "tags" : [ "BIOLOGY" ] }

3) Define ranges for each zone

Each zone covers one or more ranges of shard key values. Note: each range a zone covers is always inclusive of its lower boundary and exclusive of its upper boundary.

mongos> sh.addTagRange(
    "school.students",
    { "subject" : "maths", "sId" : MinKey},
    { "subject" : "maths", "sId" : MaxKey},
    "MATHS"
    )
{ "ok" : 1 }
mongos> sh.addTagRange(
    "school.students",
    { "subject" : "biology", "sId" : MinKey},
    { "subject" : "biology", "sId" : MaxKey},
    "BIOLOGY"
    )
{ "ok" : 1 }

mongos> sh.addTagRange(

"school.students",

{ "subject" : "maths", "sId" : MinKey},

{ "subject" : "maths", "sId" : MaxKey},

"MATHS"

)

{ "ok" : 1 }

mongos> sh.addTagRange(

"school.students",

{ "subject" : "biology", "sId" : MinKey},

{ "subject" : "biology", "sId" : MaxKey},

"BIOLOGY"

)

{ "ok" : 1 }

4) Enable balancer

Now enable the balancer so the chunks will migrate across the shards as per the requirement and all the read and write queries will be routed to the particular shards.

mongos> sh.enableBalancing("school.students")
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
mongos> sh.isBalancerRunning()
true

mongos> sh.enableBalancing("school.students")

WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

mongos> sh.isBalancerRunning()

true

Let’s check how documents get routed as per the tags:

We have inserted 6 documents, 4 documents with “subject”:”maths” and 2 documents with “subject”:”biology”

mongos> db.students.find({"subject":"maths"}).count()
4
mongos> db.students.find({"subject":"biology"}).count()
2

mongos> db.students.find({"subject":"maths"}).count()

mongos> db.students.find({"subject":"biology"}).count()

Checking the shard distribution for the students collection:

mongos> db.students.getShardDistribution()
Shard shard0000 at 127.0.0.1:27003
data : 236B docs : 4 chunks : 4
estimated data per chunk : 59B
estimated docs per chunk : 1

Shard shard0001 at 127.0.0.1:27004
data : 122B docs : 2 chunks : 1
estimated data per chunk : 122B
estimated docs per chunk : 2

mongos> db.students.getShardDistribution()

Shard shard0000 at 127.0.0.1:27003

data : 236B docs : 4 chunks : 4

estimated data per chunk : 59B

estimated docs per chunk : 1

Shard shard0001 at 127.0.0.1:27004

data : 122B docs : 2 chunks : 1

estimated data per chunk : 122B

estimated docs per chunk : 2

So in this test case, all the queries for the students collection have routed as per the tag used, with four documents inserted into shard0000 and two documents inserted to shard0001.

Any queries related to MATHS will route to shard0000 and queries related to BIOLOGY will route to shard0001, hence the load will be distributed as per the configuration of the shard, keeping the database performance optimized.

Sharding MongoDB using zones is a great feature provided by MongoDB. With the help of zones, data can be isolated to the specific shards. Or if we have any kind of hardware or configuration restrictions to the shards, it is a possible solution for routing the operations.

0 0 votes

Article Rating

5 Comments

Oldest

Newest Most Voted

Manoj George

8 years ago

Nice article. can the zone be used with Hash shard key. If so, could you please an example or some link to that

Aayushi Mangal

7 years ago

Thanks Manoj, yes, zones could be used with Hashed shard key, but with the hashed shard keys, upper and lower bound values will represent the hashed value for the field instead of actual value, and this might leads to broadcast operations and thus unexpected behaviour.
Please refer more details on zones with hashed shard keys from here:
https://docs.mongodb.com/manual/core/zone-sharding/#hashed-shard-keys-and-zones
https://docs.mongodb.com/manual/core/hashed-sharding/#hashed-sharding-shard-key

Chris Scott

7 years ago

Can you have a default Zone? If you have a large number of clients and you want to separate one client “Amazon” onto its own zone, (Zone A). All the other clients default to a different zone, (Zone B). I do not want to have to define all the other clients to Zone B as every day more clients are being added and there are hundreds of them.