In most database systems, like MySQL, PostgreSQL, and MongoDB, replication of some kind is used to create a highly available architecture. Valkey/Redis is no different in this regard. Replication is native functionality to Valkey, supporting multiple replicas, and even chains of replication.
To clear up any confusion, understand that Valkey replication is a different concept compared to Valkey clustering.
Replication is a simple, asynchronous process in which writes on a single master are replayed on N-connected replicas.
Clustering is a more complex architecture in which the data is fragmented/sharded across multiple Valkey servers, operating together as a single domain. Replication can and should be part of your clustered architecture, but it can also be used without clustering.
In this post, we will set up a basic master and two-replica configuration used for splitting reads and writes. To keep the post short and sweet, we will discuss failover using Valkey Sentinel in a later post.
Environment and configurations
In a typical production environment, each Valkey process would be installed on its own server. For this post, docker will be used to simplify things.
The working directory is laid out as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$ ls -1 certs/ - ca-key.pem - ca.pem - client-cert.pem - client-key.pem - server-cert.pem - server-key.pem dataM1/ - valkeyM1.conf dataR1/ - valkeyR1.conf dataR2/ - valkeyR2.conf |
In each data directory, hard links to the certificates were created, allowing all three Valkey containers to use the same keys. Example:
1 2 3 |
$ cd dataM1/ $ for i in ../certs/*.pem; do ln $i; done -- Repeat for dataR1, dataR2 |
The configuration files for the master node, valkeyM1, are as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
$ cat valkeyM1.conf # Enable TLS connections port 0 tls-port 6379 tls-cert-file /data/server-cert.pem tls-key-file /data/server-key.pem tls-ca-cert-file /data/ca.pem tls-replication yes tls-auth-clients optional # Keep this much change delta in memory so that a replica # which went offline for a short period of time does not # need to complete a full sync, but can instead do a partial # delta transfer. repl-backlog-size 30mb # The master will not accept writes unless there are this # many connected replicas, all with lag less than max-lag min-replicas-to-write 1 min-replicas-max-lag 10 # Various user accounts user monitor on +@admin >monitorpass user root on +@all >superr00tpass user app on +@all -KEYS -SCAN -CONFIG >appsecret user repl on +PSYNC +SYNC +REPLCONF >replpassw0rd# # Don't use more than 50MB of memory for Valkey # If we run out memory, start dropping keys according # to those least-frequently-used maxmemory 50mb maxmemory-policy allkeys-lfu # Utilize append only files for disk persistence appendonly yes appendfilename "appendonly.aof" appenddirname "appenddir" |
The configuration file for the replicas is nearly identical, minus the user accounts and a few other master-only config parameters.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Authentication to the master masteruser repl masterauth replpassw0rd# # Don't allow writing to the replica replica-read-only yes # Stream the RDB snapshot from master; don't write it to disk repl-diskless-sync yes # If something happens to the replication stream, or replica is unable # to process a local AOF, shut down replica to prevent any data drift propagation-error-behavior panic-on-replicas |
Now that we have our configuration files and TLS certificates ready, we can start launching containers.
Setting up the master
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$ docker run -d --name valkeyM1 -v ${PWD}/dataM1:/data valkey/valkey:7.2.5 /data/valkeyM1.conf $ docker inspect --format='{{.NetworkSettings.IPAddress}}' valkeyM1 172.17.0.6 $ docker exec -it valkeyM1 valkey-cli --tls --cacert /data/ca.pem 127.0.0.1:6379> SET foo bar (error) NOREPLICAS Not enough good replicas to write. 127.0.0.1:6379> ROLE 1) "master" 2) (integer) 0 3) (empty array) |
We launched our container, named valkeyM1, and specified our configuration file, located in the mounted volume path. We then grab the docker container’s IP address for later actions and then attempted to write a key-value pair. This produced an error as no replicas are connected, and we specified in the config that at least one replica is required. The ROLE command is a simple status check that says we are currently a ‘master’, the replication offset is zero bytes (since there have been no writes yet), and the empty array indicates no replicas are connected.
Since we cannot do anything until at least one replica is connected, let’s set one up.
Setting up replicas
1 2 |
$ docker run -d --name valkeyR1 -v ${PWD}/dataR1:/data valkey/valkey:7.2.5 /data/valkeyR1.conf |
Note above that we did not specify the REPLICAOF parameter in the config file for this replica, valkeyR1. As such, this Valkey server is functioning as an independent master. Let’s connect this server to our M1 above.
1 2 3 |
$ docker exec -it valkeyR1 valkey-cli --tls --cacert /data/ca.pem 127.0.0.1:6379> REPLICAOF 172.17.0.6 6379 OK |
If we look at the log for valkeyM1, we can see the synchronization attempt:
1 2 3 4 5 6 7 8 9 10 11 |
* Replica 172.17.0.7:6379 asks for synchronization * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'b00d0d93b0b2b5472781a1f826238188466be26e', my replication IDs are '0be3cb60d0f8c3cfc173d45fb69aedb68f8bbbda' and '0000000000000000000000000000000000000000') * Delay next BGSAVE for diskless SYNC * Starting BGSAVE for SYNC with target: replicas sockets * Background RDB transfer started by pid 59 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB * Diskless rdb transfer, done reading from pipe, 1 replicas still up. * Background RDB transfer terminated with success * Streamed RDB transfer with replica 172.17.0.7:6379 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming * Synchronization with replica 172.17.0.7:6379 succeeded |
Notice that there are two replication IDs on the master. These are referred to as the ‘current ID’ and the ‘old ID’. The old ID is used by replicas which get promoted to master in the event of failover.
When a replica takes over as master during a failover event, it will change its ‘current ID’ to a newly generated ID, indicating the start of a new dataset/topology change. The ‘old ID’ will become the failed master’s ID. This allows for other replicas, which have reconnected to this replica as the new master, to continue receiving events from that failed stream. Once other replicas have received all old events, they will switch to the new ID of the newly promoted master. This is an automatic process.
The next several lines of the log output above show the snapshot creation of the RDB file, and streaming a copy to the replica. Recall that in our config of R1, we specified diskless sync.
Now that we have at least one replica connected to our master, we can write to it:
1 2 |
$ docker exec -it valkeyM1 valkey-cli --tls --cacert /data/ca.pem SET foo bar OK |
Additional replicas
We can start an additional replica. Two changes have been made to R2’s config. We added ‘REPLICAOF 172.17.0.6 6379’, and changed ‘repl-diskless-sync no’.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
$ docker run -d --name valkeyR2 -v ${PWD}/dataR2:/data valkey/valkey:7.2.5 /data/valkeyR2.conf ... * Connecting to MASTER 172.17.0.6:6379 * MASTER <-> REPLICA sync started * Non blocking connect for SYNC fired the event. * Master replied to PING, replication can continue... * Partial resynchronization not possible (no cached master) * Full resync from master: 0be3cb60d0f8c3cfc173d45fb69aedb68f8bbbda:2482 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk * MASTER <-> REPLICA sync: Flushing old data * MASTER <-> REPLICA sync: Loading DB in memory * Loading RDB produced by valkey version 7.2.5 * RDB age 0 seconds * RDB memory usage when created 1.18 Mb * Done loading RDB, keys loaded: 1, keys expired: 0. * MASTER <-> REPLICA sync: Finished with success |
Because the ‘REPLICAOF’ command was in the configuration file, R2 immediately connected to M1 and began a sync of the RDB file. The RDB file was copied to R2’s disk initially and, after completion, loaded into R2’s memory.
Looking at the master’s ROLE, we can now see two connected replicas, which are both in sync with the ID offset of the master (2832):
1 2 3 4 5 6 7 8 9 |
$ docker exec -it valkeyM1 valkey-cli --tls --cacert /data/ca.pem ROLE 1) "master" 2) (integer) 2832 3) 1) 1) "172.17.0.7" 2) "6379" 3) "2832" 2) 1) "172.17.0.8" 2) "6379" 3) "2832" |
Utilizing replicas and failover
In order to utilize the R1 and R2 for reads, your application must be configured to open separate connections to these instances and execute commands. Valkey does not have a single endpoint for routing connections. You must code the read/write split yourself.
Also, the above architecture has no failover capabilities. If valkeyM1 goes offline, R1 and R2 will continue to function as readers returning, potentially, stale data.
1 2 3 4 5 6 7 8 9 |
$ docker exec -it valkeyR1 valkey-cli --tls --cacert /data/ca.pem 127.0.0.1:6379> ROLE 1) "slave" 2) "172.17.0.6" 3) (integer) 6379 4) "connect" 5) (integer) -1 127.0.0.1:6379> GET foo "bar" |
Notice the ‘-1’ value for “connect” in the ROLE output, indicating that we are not connected to a master. You can also see the successful read of the ‘foo’ key.
If you do not want replicas to serve stale data when the connection to the master is lost, you can set ‘replica-serve-stale-data no’, as shown here:
1 2 3 4 |
127.0.0.1:6379> CONFIG SET replica-serve-stale-data no OK 127.0.0.1:6379> GET foo (error) MASTERDOWN Link with MASTER is down and replica-serve-stale-data is set to 'no'. |
The replicas will continue reconnection attempts to the master every one second.
In order to have R1 or R2 automatically take over as the new master, Valkey Sentinel must be used. Setup of the sentinel is beyond the scope of this post, but will be the scope of a future post.
In the meantime, you can run ‘REPLICAOF NO ONE’ on R1, and create the replication user on R1, then tell R2 to become a REPLICAOF R2:
1 2 3 4 5 6 7 8 9 |
$ docker exec -it valkeyR1 valkey-cli --tls --cacert /data/ca.pem 127.0.0.1:6379> ACL SETUSER repl on +PSYNC +SYNC +REPLCONF >replpassw0rd# OK 127.0.0.1:6379> REPLICAOF NO ONE OK $docker exec -it valkeyR2 valkey-cli --tls --cacert /data/ca.pem 127.0.0.1:6379> REPLICAOF 172.17.0.7 6379 OK |
R1 will generate a new ‘current ID’ and will set its secondary ID to the failed M1’s ID. R2 will request partial resync from R1, using M1’s original ID. After the backlog is synced, R2 will switch to R1’s new ID.
Conclusion
In this post, we learned how simple it is to configure asynchronous replication in Valkey in order to create replicas for simple read/write load balancing and to get us started on our journey toward high availability.