In this blog post, we will walk through the internals of the election process in MongoDB®, following on from a previous post on the internals of the replica set. You can read Part 1 here.
For this post, I am refer to the same configurations we discussed before.
Elections: As the term suggests, in MongoDB there is a freedom to “vote”: individual nodes of the cluster can vote and select their primary member for that replica set cluster.
Why Elections? MongoDB maintains high availability through this process.
|
1 |
settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 60000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5ba8ed10d4fddccfedeb7492') } } |
From the mongo shell, the value for the electionTimeoutMillis can be found in replica set configuration as:
|
1 |
rplint:SECONDARY> rs.conf()<br>{<br> "_id" : "rplint",<br> "version" : 3,<br> "protocolVersion" : NumberLong(1),<br> "members" : [<br> {<br> "_id" : 0,<br> "host" : "m103:25001",<br> "arbiterOnly" : false,<br> "buildIndexes" : true,<br> "hidden" : false,<br> "priority" : 1,<br> "tags" : {<br> <br> },<br> "slaveDelay" : NumberLong(0),<br> "votes" : 1<br> },<br> {<br> "_id" : 1,<br> "host" : "192.168.103.100:25002",<br> "arbiterOnly" : false,<br> "buildIndexes" : true,<br> "hidden" : false,<br> "priority" : 1,<br> "tags" : {<br> <br> },<br> "slaveDelay" : NumberLong(0),<br> "votes" : 1<br> },<br> {<br> "_id" : 2,<br> "host" : "192.168.103.100:25003",<br> "arbiterOnly" : false,<br> "buildIndexes" : true,<br> "hidden" : false,<br> "priority" : 1,<br> "tags" : {<br> <br> },<br> "slaveDelay" : NumberLong(0),<br> "votes" : 1<br> }<br> ],<br> "settings" : {<br> "chainingAllowed" : true,<br> "heartbeatIntervalMillis" : 2000,<br> "heartbeatTimeoutSecs" : 10,<br> "electionTimeoutMillis" : 10000,<br> "catchUpTimeoutMillis" : 60000,<br> "getLastErrorModes" : {<br> <br> },<br> "getLastErrorDefaults" : {<br> "w" : 1,<br> "wtimeout" : 0<br> },<br> "replicaSetId" : ObjectId("5c20ff87272eff3a5e28573f")<br> }<br>}<br> |
More precisely the value for electionTimeoutMillis can be found at:
|
1 |
rplint:SECONDARY> rs.conf().settings.electionTimeoutMillis<br>10000 |
2. If the priority of the existing primary node is being taken over by another node. For example, during planned maintenance using replica set configuration settings. The priority of the member node can be changed as explained here
The priority of all three members can be seen from the replica set configuration like this:
|
1 |
rplint:SECONDARY> rs.conf().members[0].priority<br>1<br>rplint:SECONDARY> <br>rplint:SECONDARY> <br>rplint:SECONDARY> rs.conf().members[2].priority<br>1<br>rplint:SECONDARY> rs.conf().members[1].priority<br>1 |
Before real elections, the node runs a dry election. Dry election? Yes, the node first runs dry elections, and if the node wins a dry election, then an actual election begins. Here’s how:
A primary (port:25002) Transition to secondary after receiving the rs.stepDown() command.
|
1 |
2019-01-03T03:05:29.972+0000 I COMMAND [conn124] Attempting to<strong> step down in response to replSetStepDown command</strong><br>2019-01-03T03:05:29.976+0000 I REPL [conn124] transition to SECONDARY<br>driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.15" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "14.04" } }<br>2019-01-03T03:05:40.874+0000 I REPL [ReplicationExecutor] Member m103:25001 is now in state PRIMARY<br>2019-01-03T03:05:41.459+0000 I REPL [rsBackgroundSync] sync source candidate: m103:25001<br>2019-01-03T03:05:41.459+0000 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to m103:25001<br>2019-01-03T03:05:41.460+0000 I ASIO [NetworkInterfaceASIO-RS-0] Successfully connected to m103:25001, took 1ms (1 connections now open to m103:25001)<br>2019-01-03T03:05:41.461+0000 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to m103:25001<br>2019-01-03T03:05:41.462+0000 I ASIO [NetworkInterfaceASIO-RS-0] Successfully connected to m103:25001, took 1ms (2 connections now open to m103:25001) |
Dry election at candidate node (port:25001) and success: no primary found.
|
1 |
2019-01-03T03:05:31.498+0000 I REPL [rsBackgroundSync] could not find member to sync from<br>2019-01-03T03:05:36.493+0000 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update to 192.168.103.100:25002: InvalidSyncSource: Sync source was cleared. Was 192.168.103.100:25002<br>2019-01-03T03:05:39.390+0000 I REPL [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms<br>2019-01-03T03:05:39.390+0000 I REPL [ReplicationExecutor] conducting a dry run election to see if we could be elected. current term: 35<br>2019-01-03T03:05:39.391+0000 I REPL [ReplicationExecutor] VoteRequester(term 35 dry run) received a yes vote from 192.168.103.100:25002; response message: { term: 35, voteGranted: true, reason: "", ok: 1.0 } |
Dry election succeeds and increments term by 1 (here the term was 35 and is incremented to 36). It transitions to primary and enters catchup mode.
|
1 |
2019-01-03T03:05:39.391+0000 I REPL [ReplicationExecutor] dry election run succeeded, running for election in term 36 <br>2019-01-03T03:05:39.394+0000 I REPL [ReplicationExecutor] VoteRequester(term 36) received a yes vote from 192.168.103.100:25003; response message: { term: 36, voteGranted: true, reason: "", ok: 1.0 } <br>2019-01-03T03:05:39.395+0000 I REPL [ReplicationExecutor] election succeeded, assuming primary role in term 36 <br>2019-01-03T03:05:39.395+0000 I REPL [ReplicationExecutor] transition to PRIMARY <br>2019-01-03T03:05:39.395+0000 I REPL [ReplicationExecutor] Entering primary catch-up mode. |
Other nodes also receive information about the new primary.
|
1 |
2019-01-03T03:05:31.498+0000 I REPL [rsBackgroundSync] could not find member to sync from <br>2019-01-03T03:05:36.493+0000 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update to 192.168.103.100:25002: InvalidSyncSource: Sync source was cleared. Was 192.168.103.100:25002 <br>2019-01-03T03:05:41.499+0000 I REPL [ReplicationExecutor] Member m103:25001 is now in state PRIMARY |
This is how MongoDB is able to maintain high availability by electing primary node from the replica set clusters in the case of existing primary node failures.
—
Photo by Daria Shevtsova from Pexels
Resources
RELATED POSTS