Achieving High Availability with Valkey Sentinel

April 24, 2026
Author
Arunjith Aravindan
Share this Post:

In the previous guide, a robust Primary-Replica topology for Valkey was established. Read scaling is now active, and a hot copy of the data is securely stored on a second node.

But there is a catch. If a primary node crashes, the replica will remain faithful and wait for instructions. It will not automatically take over the responsibilities of the primary. Applications will start throwing write errors until an administrator manually logs in and reconfigures the replica to become the new primary.

To achieve true High Availability (HA) and ensure continuous uptime without manual intervention, Valkey Sentinel is required.

What is Valkey Sentinel?

Valkey Sentinel is a distributed system designed to monitor Valkey instances, detect failures, and automatically handle failover.

When Sentinel detects that a primary node is unresponsive, it performs the following tasks:

  1. Monitoring: It continuously checks whether primary and replica nodes are functioning as expected.
  2. Notification: It can notify system administrators or another computer program via an API that something is wrong.
  3. Automatic Failover: It promotes a healthy replica to the new primary and reconfigures the other replicas to sync with it.
  4. Configuration Provider: It acts as a source of truth for clients. Applications can connect to Sentinel to ask for the current primary’s address. If a failover occurs, Sentinel reports the new address.

The Rule of Three (Quorum)

Sentinel is a distributed system, meaning multiple Sentinel processes must run and agree on a node’s failure before taking action. This agreement is called a quorum.

To prevent a “split-brain” scenario (where a network partition causes two nodes to both assume they are the primary), at least three Sentinel instances must be deployed.

For this guide, the environment consists of three dedicated database nodes. Each node will run both the Valkey database service and the Valkey Sentinel service:

  • ArunValkeyPrimary (Primary + Sentinel): 172.31.32.27
  • ArunValkeyReplica (Replica 1 + Sentinel): 172.31.37.55
  • ArunValkeyReplica2 (Replica 2 + Sentinel): 172.31.39.58

The primary node is healthy and running as the master, with two replicas connected and actively syncing.

Step 1: Create the Sentinel Configuration File

Sentinel runs as a separate process from the main Valkey database, using its own configuration file and listening on port 26379 by default.

The Sentinel configuration file (typically /etc/valkey/sentinel.conf) must be created or edited on all three nodes(ArunValkeyPrimary, ArunValkeyReplica, and ArunValkeyReplica2).

Open the file and add the following core directives:

Understanding the monitor line:

  • mymaster is the arbitrary name given to this cluster.
  • 172.31.32.27 6379 points to the current primary node (ArunValkeyPrimary). (Sentinels will automatically discover both replicas by querying the primary, so the replica IPs do not need to be listed).
  • 2 is the quorum. This means at least 2 out of the 3 Sentinels must agree the primary is down to initiate a failover.

Step 2: Ensure Proper Permissions

Sentinel needs the ability to rewrite its own configuration file. When a failover happens, Sentinel updates sentinel.conf with the new primary’s IP address and the current state of the cluster.

Ensure the valkey user has write permissions to the file on all three nodes:

Step 3: Start the Sentinel Services

Start the Sentinel service on all three nodes. Depending on the Linux distribution and the Valkey installation method, this is usually done via systemctl:

Step 4: Verify the Sentinel Cluster

Check if the Sentinels are successfully communicating with each other and monitoring the database. Log into any node and use the Valkey CLI to connect to the Sentinel port (26379):

Look closely at the master0 line at the bottom. This confirms everything is functioning correctly:

  • status=ok: The primary (ArunValkeyPrimary) is healthy.
  • slaves=2: Sentinel found both ArunValkeyReplica and ArunValkeyReplica2.
  • sentinels=3: All three Sentinel instances have discovered each other and formed a quorum.

Additional Verification: Sentinel Peer Health

To further validate that all Sentinel nodes are actively communicating and healthy, we can query the list of Sentinel peers and inspect their status:

What this means:

  • ip → Lists the other Sentinel nodes in the cluster
  • flags=sentinel → Confirms these are active Sentinel peers
  • last-ok-ping-reply → Indicates the last successful heartbeat response (in milliseconds)
  • down-after-milliseconds: 5000 ms → failure threshold

Lower values here indicate healthy and responsive communication between Sentinel nodes.

Step 5: The Chaos Test (Triggering a Failover)

The best way to trust an HA setup is to break it intentionally. We will simulate a crash by killing the primary node, verifying the failover, and then manually failing back to our original primary.

1. Kill the Primary

On ArunValkeyPrimary (172.31.32.27), stop the Valkey database service (do not stop Sentinel, just the database):

2. Verify the Failover via Sentinel

Wait for about 5 to 10 seconds to allow the down-after-milliseconds threshold to pass and the Sentinels to complete the election process. Instead of checking the logs, you can query the Sentinel information directly to confirm the failover has occurred and find out which node was promoted.

On ArunValkeyReplica, connect to the Sentinel port (26379) and run the INFO sentinel command:

Look at the master0 line at the bottom. It shows that the status is ok and the primary address is now 172.31.37.55:6379.

3. Verify the Failover via the Database

Now, connect to that newly promoted node (172.31.37.55) on the standard database port to verify the promotion from the database’s perspective:

Notice that the role has changed from slave to master, and it now shows 1 connected slave (the other surviving replica, 172.31.39.58).

4. Restarting the Old Primary

When the Valkey service on ArunValkeyPrimary is eventually restarted, Sentinel will automatically detect it, reconfigure it as a read-only replica, and point it to the newly promoted primary to catch up on missed data.

Check the database replication status on the old primary to see it is now acting as a replica:

5. Executing a Manual Failback

If you want ArunValkeyPrimary to reclaim its throne as the primary node, you can trigger a manual failover. First, configure it to have a high priority for elections, then issue the failover command to Sentinel:

(Note: The AUTH failed warnings simply indicate the CLI attempted to pass a default auth to a Sentinel instance that might not require it or is configured differently, but the OK confirms the command successfully executed.)

Check Sentinel one last time to confirm ArunValkeyPrimary (172.31.32.27) is back in charge:

Wrapping Up

By combining replication with Sentinel, a single cache becomes a highly available, self-healing data cluster. If hardware fails or network hiccups occur, Sentinel automatically handles the reshuffling. Furthermore, as demonstrated, system administrators still retain full control to manually shuffle roles during planned maintenance or load balancing.

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved