In the previous guide, a robust Primary-Replica topology for Valkey was established. Read scaling is now active, and a hot copy of the data is securely stored on a second node.
But there is a catch. If a primary node crashes, the replica will remain faithful and wait for instructions. It will not automatically take over the responsibilities of the primary. Applications will start throwing write errors until an administrator manually logs in and reconfigures the replica to become the new primary.
To achieve true High Availability (HA) and ensure continuous uptime without manual intervention, Valkey Sentinel is required.
Valkey Sentinel is a distributed system designed to monitor Valkey instances, detect failures, and automatically handle failover.
When Sentinel detects that a primary node is unresponsive, it performs the following tasks:
Sentinel is a distributed system, meaning multiple Sentinel processes must run and agree on a node’s failure before taking action. This agreement is called a quorum.
To prevent a “split-brain” scenario (where a network partition causes two nodes to both assume they are the primary), at least three Sentinel instances must be deployed.
For this guide, the environment consists of three dedicated database nodes. Each node will run both the Valkey database service and the Valkey Sentinel service:
The primary node is healthy and running as the master, with two replicas connected and actively syncing.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -a amma@123 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> INFO replication # Replication role:master connected_slaves:2 slave0:ip=172.31.37.55,port=6379,state=online,offset=98214,lag=1 slave1:ip=172.31.39.58,port=6379,state=online,offset=98214,lag=1 master_failover_state:no-failover master_replid:629656a198b7290bf6492e470b449ad1ced509e0 master_replid2:30977276632877f46ad12fcc2bbc2c5191c67c0c master_repl_offset:98214 second_repl_offset:1643 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:1643 repl_backlog_histlen:96572 127.0.0.1:6379> |
Sentinel runs as a separate process from the main Valkey database, using its own configuration file and listening on port 26379 by default.
The Sentinel configuration file (typically /etc/valkey/sentinel.conf) must be created or edited on all three nodes(ArunValkeyPrimary, ArunValkeyReplica, and ArunValkeyReplica2).
Open the file and add the following core directives:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
port 26379 # Format: sentinel monitor <cluster-name> <primary-ip> <primary-port> <quorum> sentinel monitor mymaster 172.31.32.27 6379 2 # The primary password set in the previous setup sentinel auth-user mymaster default sentinel auth-pass mymaster amma@123 # How many milliseconds the primary must be unreachable before Sentinel considers it down sentinel down-after-milliseconds mymaster 5000 # How long to wait before trying another failover if the first one fails sentinel failover-timeout mymaster 10000 |
Understanding the monitor line:
Sentinel needs the ability to rewrite its own configuration file. When a failover happens, Sentinel updates sentinel.conf with the new primary’s IP address and the current state of the cluster.
Ensure the valkey user has write permissions to the file on all three nodes:
|
1 |
sudo chown valkey:valkey /etc/valkey/sentinel.conf |
Start the Sentinel service on all three nodes. Depending on the Linux distribution and the Valkey installation method, this is usually done via systemctl:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl enable valkey-sentinel Synchronizing state of valkey-sentinel.service with SysV service script with /usr/lib/systemd/systemd-sysv-install. Executing: /usr/lib/systemd/systemd-sysv-install enable valkey-sentinel root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl start valkey-sentinel root@ArunValkeyPrimary:/home/ubuntu# root@ArunValkeyReplica:/home/ubuntu# sudo systemctl enable valkey-sentinel Synchronizing state of valkey-sentinel.service with SysV service script with /usr/lib/systemd/systemd-sysv-install. Executing: /usr/lib/systemd/systemd-sysv-install enable valkey-sentinel root@ArunValkeyReplica:/home/ubuntu# sudo systemctl start valkey-sentinel root@ArunValkeyReplica:/home/ubuntu# root@ArunValkeyReplica2:/home/ubuntu# sudo systemctl enable valkey-sentinel Synchronizing state of valkey-sentinel.service with SysV service script with /usr/lib/systemd/systemd-sysv-install. Executing: /usr/lib/systemd/systemd-sysv-install enable valkey-sentinel root@ArunValkeyReplica2:/home/ubuntu# sudo systemctl start valkey-sentinel root@ArunValkeyReplica2:/home/ubuntu# |
Check if the Sentinels are successfully communicating with each other and monitoring the database. Log into any node and use the Valkey CLI to connect to the Sentinel port (26379):
|
1 2 3 4 5 6 7 8 9 10 11 12 |
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 AUTH failed: ERR AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct? 127.0.0.1:26379> INFO sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_tilt_since_seconds:-1 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=172.31.32.27:6379,slaves=2,sentinels=3 127.0.0.1:26379> |
Look closely at the master0 line at the bottom. This confirms everything is functioning correctly:
To further validate that all Sentinel nodes are actively communicating and healthy, we can query the list of Sentinel peers and inspect their status:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 SENTINEL SENTINELS mymaster | grep -E -A 1 '^ip$|^flags$|^last-ok-ping-reply$|^down-after-milliseconds$' ip 172.31.37.55 -- flags sentinel -- last-ok-ping-reply 65 -- down-after-milliseconds 5000 -- ip 172.31.39.58 -- flags sentinel -- last-ok-ping-reply 65 -- down-after-milliseconds 5000 root@ArunValkeyPrimary:/home/ubuntu# |
Lower values here indicate healthy and responsive communication between Sentinel nodes.
The best way to trust an HA setup is to break it intentionally. We will simulate a crash by killing the primary node, verifying the failover, and then manually failing back to our original primary.
On ArunValkeyPrimary (172.31.32.27), stop the Valkey database service (do not stop Sentinel, just the database):
|
1 2 |
root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl stop valkey root@ArunValkeyPrimary:/home/ubuntu# |
Wait for about 5 to 10 seconds to allow the down-after-milliseconds threshold to pass and the Sentinels to complete the election process. Instead of checking the logs, you can query the Sentinel information directly to confirm the failover has occurred and find out which node was promoted.
On ArunValkeyReplica, connect to the Sentinel port (26379) and run the INFO sentinel command:
|
1 2 3 4 5 6 7 8 9 10 11 |
root@ArunValkeyReplica:/home/ubuntu# valkey-cli -p 26379 127.0.0.1:26379> INFO sentinel # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_tilt_since_seconds:-1 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=172.31.37.55:6379,slaves=2,sentinels=3 127.0.0.1:26379> |
Look at the master0 line at the bottom. It shows that the status is ok and the primary address is now 172.31.37.55:6379.
Now, connect to that newly promoted node (172.31.37.55) on the standard database port to verify the promotion from the database’s perspective:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
root@ArunValkeyReplica:/home/ubuntu# valkey-cli -a amma@123 Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 127.0.0.1:6379> INFO replication # Replication role:master connected_slaves:1 slave0:ip=172.31.39.58,port=6379,state=online,offset=574633,lag=0 master_failover_state:no-failover master_replid:b93b82982616a59a2304a799e548d7398ee15732 master_replid2:43ea3aeca4846f06c3c6dd11174e9bfd7ac7fabf master_repl_offset:574633 second_repl_offset:475110 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:450256 repl_backlog_histlen:124378 127.0.0.1:6379> |
Notice that the role has changed from slave to master, and it now shows 1 connected slave (the other surviving replica, 172.31.39.58).
When the Valkey service on ArunValkeyPrimary is eventually restarted, Sentinel will automatically detect it, reconfigure it as a read-only replica, and point it to the newly promoted primary to catch up on missed data.
|
1 2 3 4 5 6 7 8 9 10 11 |
root@ArunValkeyPrimary:/home/ubuntu# sudo systemctl start valkey root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 INFO sentinel AUTH failed: ERR AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct? # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_tilt_since_seconds:-1 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=172.31.37.55:6379,slaves=2,sentinels=3 |
Check the database replication status on the old primary to see it is now acting as a replica:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli INFO replication # Replication role:slave master_host:172.31.37.55 master_port:6379 master_link_status:up master_last_io_seconds_ago:0 master_sync_in_progress:0 slave_read_repl_offset:614120 slave_repl_offset:614120 slave_priority:1 slave_read_only:1 replica_announced:1 connected_slaves:0 master_failover_state:no-failover master_replid:b93b82982616a59a2304a799e548d7398ee15732 master_replid2:0000000000000000000000000000000000000000 master_repl_offset:614120 second_repl_offset:-1 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:607384 repl_backlog_histlen:6737 root@ArunValkeyPrimary:/home/ubuntu# |
If you want ArunValkeyPrimary to reclaim its throne as the primary node, you can trigger a manual failover. First, configure it to have a high priority for elections, then issue the failover command to Sentinel:
|
1 2 3 4 5 6 7 |
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli CONFIG SET replica-priority 1 OK root@ArunValkeyPrimary:/home/ubuntu# valkey-cli CONFIG REWRITE OK root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 SENTINEL FAILOVER mymaster AUTH failed: ERR AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct? OK |
(Note: The AUTH failed warnings simply indicate the CLI attempted to pass a default auth to a Sentinel instance that might not require it or is configured differently, but the OK confirms the command successfully executed.)
Check Sentinel one last time to confirm ArunValkeyPrimary (172.31.32.27) is back in charge:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
root@ArunValkeyPrimary:/home/ubuntu# valkey-cli -p 26379 INFO sentinel AUTH failed: ERR AUTH <password> called without any password configured for the default user. Are you sure your configuration is correct? # Sentinel sentinel_masters:1 sentinel_tilt:0 sentinel_tilt_since_seconds:-1 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=ok,address=172.31.32.27:6379,slaves=2,sentinels=3 root@ArunValkeyPrimary:/home/ubuntu# |
Wrapping Up
By combining replication with Sentinel, a single cache becomes a highly available, self-healing data cluster. If hardware fails or network hiccups occur, Sentinel automatically handles the reshuffling. Furthermore, as demonstrated, system administrators still retain full control to manually shuffle roles during planned maintenance or load balancing.
Resources
RELATED POSTS