In this blog post, we are going to implement the concept of sharding in a Valkey setup. This is a built-in feature and can be implemented by enabling clustering in the Valkey configuration.
Sharding, in general, helps in distributing/scaling application writes over multiple nodes. In a similar fashion, it works in Valkey. Here, it uses the hash slots, which automatically distribute the keys over the different master nodes.
Let’s start with some practical stuff.
Our environment consists of six Valkey nodes.
1 2 3 4 5 6 |
172.31.46.16 172.31.58.227 172.31.21.83 172.31.49.135 172.31.56.12 172.31.53.110 |
Below is our basic Valkey configuration file [/etc/valkey/valkey.conf] required for clustering/sharding.
1 2 3 4 |
bind 172.31.46.16 ##Need to be change for different nodes requirepass valkey cluster-enabled yes cluster-config-file nodes.conf |
Here, “nodes.conf” file will automatically generate and update the information about the cluster nodes when required.
Cluster creation
The below command will create three shards, each containing one master and one slave.
1 |
root@ip-172-31-46-16 valkey]# valkey-cli -h 172.31.46.16 -p 6379 -a valkey --cluster create 172.31.46.16:6379 172.31.58.227:6379 172.31.21.83:6379 172.31.49.135:6379 172.31.56.12:6379 172.31.53.110:6379 --cluster-replicas 1 |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
>>> Performing hash slots allocation on 6 nodes... Master[0] -> Slots 0 - 5460 Master[1] -> Slots 5461 - 10922 Master[2] -> Slots 10923 - 16383 Adding replica 172.31.56.12:6379 to 172.31.46.16:6379 Adding replica 172.31.53.110:6379 to 172.31.58.227:6379 Adding replica 172.31.49.135:6379 to 172.31.21.83:6379 M: d2f9e40ce2a2a0c7abeaa2fd64f32957ba209208 172.31.46.16:6379 slots:[0-5460] (5461 slots) master M: 5ea5b997fbf61d207a0abb75d955196f0299b969 172.31.58.227:6379 slots:[5461-10922] (5462 slots) master M: 8887bfffd356dc38dff3b0083533a9f31299e3e3 172.31.21.83:6379 slots:[10923-16383] (5461 slots) master S: 56e5e038bc96e5e88466ca4abd35f2f7343e01cc 172.31.49.135:6379 replicates 8887bfffd356dc38dff3b0083533a9f31299e3e3 S: 4c79bc242d40e005398227f08af8b024c1c80c2d 172.31.56.12:6379 replicates d2f9e40ce2a2a0c7abeaa2fd64f32957ba209208 S: 24f369d9ea6a58f1611ed60ae7b3e49644db7b86 172.31.53.110:6379 replicates 5ea5b997fbf61d207a0abb75d955196f0299b969 Can I set the above configuration? (type 'yes' to accept): yes >>> Nodes configuration updated >>> Assign a different config epoch to each node >>> Sending CLUSTER MEET messages to join the cluster Waiting for the cluster to join >>> Performing Cluster Check (using node 172.31.46.16:6379) M: d2f9e40ce2a2a0c7abeaa2fd64f32957ba209208 172.31.46.16:6379 slots:[0-5460] (5461 slots) master 1 additional replica(s) M: 5ea5b997fbf61d207a0abb75d955196f0299b969 172.31.58.227:6379 slots:[5461-10922] (5462 slots) master 1 additional replica(s) S: 4c79bc242d40e005398227f08af8b024c1c80c2d 172.31.56.12:6379 slots: (0 slots) slave replicates d2f9e40ce2a2a0c7abeaa2fd64f32957ba209208 S: 56e5e038bc96e5e88466ca4abd35f2f7343e01cc 172.31.49.135:6379 slots: (0 slots) slave replicates 8887bfffd356dc38dff3b0083533a9f31299e3e3 S: 24f369d9ea6a58f1611ed60ae7b3e49644db7b86 172.31.53.110:6379 slots: (0 slots) slave replicates 5ea5b997fbf61d207a0abb75d955196f0299b969 M: 8887bfffd356dc38dff3b0083533a9f31299e3e3 172.31.21.83:6379 slots:[10923-16383] (5461 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. |
That’s it. The sharded environment is ready now.
Below we can see the slot distribution with respect to each shard. The slaves are pointing to each of their masters.
1 |
[root@ip-172-31-46-16 valkey]# valkey-cli -h 172.31.46.16 -p 6379 -a valkey CLUSTER NODES |
Output:
1 2 3 4 5 6 |
d2f9e40ce2a2a0c7abeaa2fd64f32957ba209208 172.31.46.16:6379@16379 myself,master - 0 1716003862000 1 connected 0-5460 5ea5b997fbf61d207a0abb75d955196f0299b969 172.31.58.227:6379@16379 master - 0 1716003867000 2 connected 5461-10922 4c79bc242d40e005398227f08af8b024c1c80c2d 172.31.56.12:6379@16379 slave d2f9e40ce2a2a0c7abeaa2fd64f32957ba209208 0 1716003866000 1 connected 56e5e038bc96e5e88466ca4abd35f2f7343e01cc 172.31.49.135:6379@16379 slave 8887bfffd356dc38dff3b0083533a9f31299e3e3 0 1716003867610 3 connected 24f369d9ea6a58f1611ed60ae7b3e49644db7b86 172.31.53.110:6379@16379 slave 5ea5b997fbf61d207a0abb75d955196f0299b969 0 1716003866604 2 connected 8887bfffd356dc38dff3b0083533a9f31299e3e3 172.31.21.83:6379@16379 master - 0 1716003865599 3 connected 10923-16383 |
For a production environment, we can better control the failure/timeout handling by using the below additional cluster settings as well.
1 2 3 4 5 |
cluster-node-timeout cluster-migration-barrier Cluster-slave-validity-factor Cluster-require-full-coverage cluster-allow-reads-when-down |
Write activity in the cluster
We can interact with the nodes in cluster mode with the “-c” option:
1 |
[root@ip-172-31-46-16 valkey]# valkey-cli -h 172.31.46.16 -p 6379 -a valkey -c |
And then perform the writes. The keys are automatically distributed to different nodes, based on the modulo-hash of the key, mapped to the 16K slots. (Ex: hash(key) % 16384 = slot 3245, which lives currently on server X).
1 2 3 4 5 6 7 |
172.31.46.16:6379> SET key1 val1 -> Redirected to slot [9189] located at 172.31.58.227:6379 OK 172.31.58.227:6379> SET key2 val2 -> Redirected to slot [4998] located at 172.31.46.16:6379 OK 172.31.46.16:6379> SET key3 val3 OK |
In a similar fashion, we can get the details as well.
1 2 3 4 5 6 7 8 9 10 11 12 |
172.31.46.16:6379> GET key1 -> Redirected to slot [9189] located at 172.31.58.227:6379 "val1" 172.31.58.227:6379> GET key2 -> Redirected to slot [4998] located at 172.31.46.16:6379 "val2" 172.31.46.16:6379> GET key3 "val3" |
Adding new nodes
If we wanted to add additional shards to our cluster, we can do that by the “–cluster add-node” option with the below command. There is no requirement that each shard have both a master and slave, and masters can have multiple slaves.
Add new slave or DR nodes:
1 |
shell> valkey-cli -a --cluster add-node NodeIP:6378 NodeIP:6378 --cluster-slave |
Here, the first node is the new slave node and the second node is the master on which this node needs to be added.
Add new Master nodes:
1 |
shell> valkey-cli -a --cluster add-node NodeIP:6378 NodeIP:6378 -c |
Here, the first node is the new master node, and the second node name can be any live master node for reference.
Cluster resharding
Resharding involves moving hash slots from one shard to another shard. This also helps in balancing/distribution of hash slots when adding a new master node.
Let’s see the below scenario of adding a new master node[172.31.56.37].
1 2 3 4 |
bind 172.31.56.37 requirepass valkey cluster-enabled yes cluster-config-file nodes.conf |
Here, we add the new master node via any of the existing master nodes as a connection source.
1 |
shell> valkey-cli -a valkey --cluster add-node 172.31.56.37:6379 172.31.46.16:6379 -c |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
>>> Adding node 172.31.56.37:6379 to cluster 172.31.46.16:6379 >>> Performing Cluster Check (using node 172.31.46.16:6379) S: d2f9e40ce2a2a0c7abeaa2fd64f32957ba209208 172.31.46.16:6379 slots: (0 slots) slave replicates 4c79bc242d40e005398227f08af8b024c1c80c2d M: 5ea5b997fbf61d207a0abb75d955196f0299b969 172.31.58.227:6379 slots:[5461-10922] (5462 slots) master 1 additional replica(s) S: 24f369d9ea6a58f1611ed60ae7b3e49644db7b86 172.31.53.110:6379 slots: (0 slots) slave replicates 5ea5b997fbf61d207a0abb75d955196f0299b969 M: 8887bfffd356dc38dff3b0083533a9f31299e3e3 172.31.21.83:6379 slots:[10923-16383] (5461 slots) master 1 additional replica(s) M: 4c79bc242d40e005398227f08af8b024c1c80c2d 172.31.56.12:6379 slots:[0-5460] (5461 slots) master 1 additional replica(s) S: 56e5e038bc96e5e88466ca4abd35f2f7343e01cc 172.31.49.135:6379 slots: (0 slots) slave replicates 8887bfffd356dc38dff3b0083533a9f31299e3e3 [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. >>> Getting functions from cluster >>> Send FUNCTION LIST to 172.31.56.37:6379 to verify there is no functions in it >>> Send FUNCTION RESTORE to 172.31.56.37:6379 >>> Send CLUSTER MEET to node 172.31.56.37:6379 to make it join the cluster. [OK] New node added correctly. |
The node [172.31.56.37] was successfully added. Now, if we check the status of this new master, we observe it doesn’t contain any hash slots yet. Yes, the auto-rebalancing is not automatic with Valkey/Redis sharding, and we have to rely on a manual resharding approach.
1 |
shell> valkey-cli -h 172.31.46.16 -a valkey CLUSTER NODES |
Output:
1 2 3 4 5 6 7 |
5ea5b997fbf61d207a0abb75d955196f0299b969 172.31.58.227:6379@16379 master - 0 1716005916705 2 connected 5461-10922 24f369d9ea6a58f1611ed60ae7b3e49644db7b86 172.31.53.110:6379@16379 slave 5ea5b997fbf61d207a0abb75d955196f0299b969 0 1716005915701 2 connected 8887bfffd356dc38dff3b0083533a9f31299e3e3 172.31.21.83:6379@16379 master - 0 1716005915000 3 connected 10923-16383 4c79bc242d40e005398227f08af8b024c1c80c2d 172.31.56.12:6379@16379 master - 0 1716005917710 7 connected 0-5460 d2f9e40ce2a2a0c7abeaa2fd64f32957ba209208 172.31.46.16:6379@16379 myself,slave 4c79bc242d40e005398227f08af8b024c1c80c2d 0 1716005917000 7 connected 56e5e038bc96e5e88466ca4abd35f2f7343e01cc 172.31.49.135:6379@16379 slave 8887bfffd356dc38dff3b0083533a9f31299e3e3 0 1716005916000 3 connected c5f766b7f6a39f601c433cf65a4bc76d6a98bb16 172.31.56.37:6379@16379 master - 0 1716005913693 0 connected |
The output above consists of the below tokens:
- Node ID
- IP: Port (redis/cluster port)
- Flags: Master, Replica etc
- If showing Replica, the Node ID of the Master
- Time of the last pending PING waiting for an acknowledgement/reply
- Time of the last PONG received
- Configuration epoch for this node
- Status of the node
- Slots allocated.
Let’s do resharding now.
- Connect to any of the existing nodes and then follow the instructions over the interactive option.
1 |
shell> valkey-cli -a valkey --cluster reshard 172.31.46.16:6379 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
>>> Performing Cluster Check (using node 172.31.46.16:6379) S: d2f9e40ce2a2a0c7abeaa2fd64f32957ba209208 172.31.46.16:6379 slots: (0 slots) slave replicates 4c79bc242d40e005398227f08af8b024c1c80c2d M: 5ea5b997fbf61d207a0abb75d955196f0299b969 172.31.58.227:6379 slots:[5461-10922] (5462 slots) master 1 additional replica(s) S: 24f369d9ea6a58f1611ed60ae7b3e49644db7b86 172.31.53.110:6379 slots: (0 slots) slave replicates 5ea5b997fbf61d207a0abb75d955196f0299b969 M: 8887bfffd356dc38dff3b0083533a9f31299e3e3 172.31.21.83:6379 slots:[10923-16383] (5461 slots) master 1 additional replica(s) M: 4c79bc242d40e005398227f08af8b024c1c80c2d 172.31.56.12:6379 slots:[0-5460] (5461 slots) master 1 additional replica(s) S: 56e5e038bc96e5e88466ca4abd35f2f7343e01cc 172.31.49.135:6379 slots: (0 slots) slave replicates 8887bfffd356dc38dff3b0083533a9f31299e3e3 M: c5f766b7f6a39f601c433cf65a4bc76d6a98bb16 172.31.56.37:6379 slots: (0 slots) master [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered. How many slots do you want to move (from 1 to 16384)? 1000 What is the receiving node ID? c5f766b7f6a39f601c433cf65a4bc76d6a98bb16 Please enter all the source node IDs. Type 'all' to use all the nodes as source nodes for the hash slots. Type 'done' once you entered all the source nodes IDs. Source node #1: all Ready to move 1000 slots. Source nodes: M: 5ea5b997fbf61d207a0abb75d955196f0299b969 172.31.58.227:6379 slots:[5461-10922] (5462 slots) master 1 additional replica(s) M: 8887bfffd356dc38dff3b0083533a9f31299e3e3 172.31.21.83:6379 slots:[10923-16383] (5461 slots) master 1 additional replica(s) M: 4c79bc242d40e005398227f08af8b024c1c80c2d 172.31.56.12:6379 slots:[0-5460] (5461 slots) master 1 additional replica(s) Destination node: M: c5f766b7f6a39f601c433cf65a4bc76d6a98bb16 172.31.56.37:6379 slots: (0 slots) master Resharding plan: Moving slot 5461 from 5ea5b997fbf61d207a0abb75d955196f0299b969 Moving slot 5462 from 5ea5b997fbf61d207a0abb75d955196f0299b969 Moving slot 5463 from 5ea5b997fbf61d207a0abb75d955196f0299b969 Moving slot 5464 from 5ea5b997fbf61d207a0abb75d955196f0299b969 Moving slot 5465 from 5ea5b997fbf61d207a0abb75d955196f0299b969 Moving slot 5466 from 5ea5b997fbf61d207a0abb75d955196f0299b969 Moving slot 5467 from 5ea5b997fbf61d207a0abb75d955196f0299b969 Moving slot 5468 from 5ea5b997fbf61d207a0abb75d955196f0299b969 ... Moving slot 329 from 172.31.56.12:6379 to 172.31.56.37:6379: Moving slot 330 from 172.31.56.12:6379 to 172.31.56.37:6379: Moving slot 331 from 172.31.56.12:6379 to 172.31.56.37:6379: Moving slot 332 from 172.31.56.12:6379 to 172.31.56.37:6379: |
- Now, if we check the status again, we can observe that around ~1k slots moved from existing masters to this new master [172.31.56.37].
1 |
shell> valkey-cli -h 172.31.46.16 -a valkey cluster nodes |
Output:
1 2 3 4 5 6 7 |
5ea5b997fbf61d207a0abb75d955196f0299b969 172.31.58.227:6379@16379 master - 0 1716007319957 2 connected 5795-10922 24f369d9ea6a58f1611ed60ae7b3e49644db7b86 172.31.53.110:6379@16379 slave 5ea5b997fbf61d207a0abb75d955196f0299b969 0 1716007319000 2 connected 8887bfffd356dc38dff3b0083533a9f31299e3e3 172.31.21.83:6379@16379 master - 0 1716007320962 3 connected 11256-16383 4c79bc242d40e005398227f08af8b024c1c80c2d 172.31.56.12:6379@16379 master - 0 1716007318951 7 connected 333-5460 d2f9e40ce2a2a0c7abeaa2fd64f32957ba209208 172.31.46.16:6379@16379 myself,slave 4c79bc242d40e005398227f08af8b024c1c80c2d 0 1716007318000 7 connected 56e5e038bc96e5e88466ca4abd35f2f7343e01cc 172.31.49.135:6379@16379 slave 8887bfffd356dc38dff3b0083533a9f31299e3e3 0 1716007317946 3 connected c5f766b7f6a39f601c433cf65a4bc76d6a98bb16 172.31.56.37:6379@16379 master - 0 1716007315938 8 connected 0-332 5461-5794 10923-11255 |
Also, we can use the non-interactive way as below to move/re-distribute the slots.
1 |
shell>valkey-cli --cluster reshard <host>:<port> --cluster-from <node-id> --cluster-to <node-id> --cluster-slots <number of slots> --cluster-yes |
Failover in sharding/clustering
If one of the master’s in a shard goes down, the “Slave” will be promoted to “Master” automatically by the cluster. We don’t need any sentinel services for the failover; The cluster is itself capable of taking care of the failure under cluster mode.
Let’s see an example here.
We have the below Master which has one connected slave [172.31.56.12].
1 |
Master[172.31.46.16 ] |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Replication role:master connected_slaves:1 slave0:ip=172.31.56.12,port=6379,state=online,offset=1028,lag=1 master_failover_state:no-failover master_replid:e3c6e6d6605fd40b8081fbb138a1daf6b0e439a2 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:1028 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:2 repl_backlog_histlen:1027 |
Now let’s stop the service on this Master node[172.31.46.16].
1 |
shell> systemctl stop valkey |
The slave[172.31.56.12] is now promoted to Master based on the node timeout settings.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
172.31.56.12:6379> info Replication # Replication role:master connected_slaves:0 master_failover_state:no-failover master_replid:ef2361a180edd0c9f00f09dbac25f31f28d8f792 master_replid2:e3c6e6d6605fd40b8081fbb138a1daf6b0e439a2 master_repl_offset:1084 second_repl_offset:1085 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:2 repl_backlog_histlen:1083 |
If we start the service on the old Master [172.31.46.16], it will automatically be added as a slave of [172.31.56.12].
1 2 3 4 |
# Replication role:slave master_host:172.31.56.12 master_port:6379 |
Summary
Here, we cover the basics of a sharding/clustering setup in the Valkey and Redis environment, with built-in master failover capabilities. Along with that, we also see how the key/slots distribution works in the topology and the slot migration/resharding when adding a new master node. For a production environment we should have at least three nodes per cluster to better cover the hash slots. The setup also should include the Slave nodes to have a highly available deployment.