Having a stand-by cluster ensures maximum data availability and a disaster recovery solution. In this blog post, we will cover how to set up a standby cluster using streaming replication and how to create an ad-hoc/standby cluster that uses a remote pgBackRest repository. Both the source and destination clusters can be deployed in different namespaces, regions, or data centers, with no such dependencies between them.
Let’s deep dive into each of the processes below.
1) Below is the leader/primary cluster which is already set up and running.
|
1 |
shell> kubectl get pods -n postgres-operator<br>NAME READY STATUS RESTARTS AGE<br>cluster1-backup-wffk-9lbcf 0/1 Completed 0 2d22h<br>cluster1-instance1-wltm-0 4/4 Running 1 (6h39m ago) 22h<br>cluster1-pgbouncer-556659fb94-szvjt 2/2 Running 0 3d21h<br>cluster1-repo-host-0 2/2 Running 0 2d22h<br>percona-postgresql-operator-6746bff4c7-729z5 1/1 Running 3 (11h ago) 3d21h<br> |
In order to stand-by to connect to leader/primary, we need to expose the service in the below part of [cr.yaml] file.
|
1 |
image: docker.io/percona/percona-postgresql-operator:2.7.0-ppg17.5.2-postgres<br> imagePullPolicy: Always<br> postgresVersion: 17<br># port: 5432<br><br> expose:<br># annotations:<br># my-annotation: value1<br># labels:<br># my-label: value2<br> type: clusterIP<br> |
|
1 |
shell> kubectl apply -f cr.yaml -n postgres-operator |
|
1 |
shell> kubectl get services -n postgres-operator <br>NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE<br>cluster1-ha ClusterIP 10.43.101.40 <none> 5432/TCP 2d15h<br>cluster1-ha-config ClusterIP None <none> <none> 2d15h<br>cluster1-pgbouncer ClusterIP 10.43.149.182 <none> 5432/TCP 2d15h<br>cluster1-pods ClusterIP None <none> <none> 2d15h<br>cluster1-primary ClusterIP None <none> 5432/TCP 2d15h<br>cluster1-replicas ClusterIP 10.43.85.169 <none> 5432/TCP 2d15h<br> |
The exact endpoint details below will be used later in stand-by cluster configuration.
|
1 |
<service-name>.<namespace>.svc.cluster.local<br> |
E.g,
|
1 |
cluster1-ha.postgres-operator.svc.cluster.local<br> |
2) Next, we need to make sure we have copied all the certificates from the leader/primary cluster and deployed the same on the stand-by cluster which we set up under a different namespace [postgres-operator2].
|
1 |
shell> kubectl get secret cluster1-cluster-ca-cert -n postgres-operator -o yaml > backup-cluster1-cluster-ca-cert.yaml<br>shell> kubectl get secret cluster1-cluster-cert -n postgres-operator -o yaml > backup-cluster1-cluster-cert.yaml<br>shell> kubectl get secret cluster1-replication-cert -n postgres-operator -o yaml > backup-cluster1-replication-cert.yaml<br> |
Delete the old certificates from the new setup/ stand-by after taking the backup(if required).
|
1 |
shell> kubectl get secret cluster1-cluster-ca-cert -n postgres-operator2 -o yaml > backup-cluster1-cluster-ca-cert.yaml<br>shell> kubectl get secret cluster1-cluster-cert -n postgres-operator2 -o yaml > backup-cluster1-cluster-cert.yaml<br>shell> kubectl get secret cluster1-replication-cert -n postgres-operator2 -o yaml > backup-cluster1-replication-cert.yaml<br> |
…
|
1 |
shell> kubectl delete secret cluster1-cluster-ca-cert -n postgres-operator2<br>shell> kubectl delete secret cluster1-cluster-cert -n postgres-operator2<br>shell> kubectl delete secret cluster1-replication-cert -n postgres-operator2<br> |
Before applying new secret changes make sure to change the namespace [postgres-operator2] as per new cluster.
|
1 |
shell> kubectl apply -f backup-cluster1-cluster-ca-cert.yaml -n postgres-operator2<br>shell> kubectl apply -f backup-cluster1-cluster-cert.yaml -n postgres-operator2<br>shell> kubectl apply -f backup-cluster1-replication-cert.yaml -n postgres-operator2<br> |
3) If we change the cert name to any different naming then we need to perform the changes in the stand-by [cr.yaml] file accordingly and re-apply the changes there.
|
1 |
secrets:<br># customRootCATLSSecret:<br># name: cluster1-ca-cert<br># items:<br># - key: "tls.crt"<br># path: "root.crt"<br># - key: "tls.key"<br># path: "root.key"<br> customTLSSecret:<br> name: cluster1-cert<br> customReplicationTLSSecret:<br> name: replication1-cert<br> |
Additionally, we also need to enable the stand-by option and add the leader endpoint details in stand-by [cr.yaml].
|
1 |
standby:<br> enabled: true<br> host: cluster1-ha.postgres-operator.svc.cluster.local<br> |
4) Finally, we can deploy the modified changes.
|
1 |
shell> kubectl apply -f deploy/cr.yaml -n pg-operator<br> |
Also, make sure to delete the pod and associated pvc in case the changes are not reflects.
|
1 |
shell> kubectl delete pvc cluster1-instance1-ft6m-pgdata -n postgres-operator2<br>shell> kubectl delete pod cluster1-instance1-ft6m-0 -n postgres-operator2 <br> |
5) Verify the changes on stand-by.
Primary/Leader:
|
1 |
shell> kubectl exec -it cluster1-instance1-wltm-0 -n postgres-operator -- sh<br>hello=# dt<br> List of relations<br> Schema | Name | Type | Owner <br>--------+------+-------+----------<br> public | h1 | table | postgres<br>(1 rows)<br> |
Stand-by:
|
1 |
shell> kubectl exec -it cluster1-instance1-ft6m-0 -n postgres-operator2 -- sh<br>hello=# dt<br> List of relations<br> Schema | Name | Type | Owner <br>--------+------+-------+----------<br> public | h1 | table | postgres<br> (1 rows)<br> |
|
1 |
sh-5.1$ patronictl list<br>+ Cluster: cluster1-ha (7569663519331602522) -------------------------+----------------+---------------------+----+-----------+<br>| Member | Host | Role | State | TL | Lag in MB |<br>+---------------------------+-----------------------------------------+----------------+---------------------+----+-----------+<br>| cluster1-instance1-ft6m-0 | cluster1-instance1-ft6m-0.cluster1-pods | Standby Leader | in archive recovery | 6 | |<br>+---------------------------+-----------------------------------------+----------------+---------------------+----+-----------+<br>sh-5.1$ <br> |
1) Considering the below standby cluster.
|
1 |
kubectl get pods -n postgres-operator2<br>NAME READY STATUS RESTARTS AGE<br>cluster1-instance1-ft6m-0 4/4 Running 0 36h<br>cluster1-pgbouncer-556659fb94-qk2ng 2/2 Running 0 2d15h<br>cluster1-repo-host-0 2/2 Running 0 2d15h<br>percona-postgresql-operator-6746bff4c7-w7l9h 1/1 Running 0 3d11h<br> |
2) Next we need to set up our bucket/S3 credentials in a secret file.
|
1 |
shell> cat <<EOF | base64 -b 0<br>[global]<br>repo1-s3-key=minioadmin<br>repo1-s3-key-secret=minioadmin<br>EOF<br> |
Output:
|
1 |
W2dsb2JhbF0KcmVwbzEtczMta2V5PW1pbmlvYWRtaW4KcmVwbzEtczMta2V5LXNlY3JldD1taW5pb2FkbWluCg==<br> |
|
1 |
shell> cat cluster1-pgbackrest-secrets.yaml <br><br>apiVersion: v1<br>kind: Secret<br>metadata:<br> name: cluster1-pgbackrest-secrets<br>type: Opaque<br>data:<br> s3.conf: W2dsb2JhbF0KcmVwbzEtczMta2V5PW1pbmlvYWRtaW4KcmVwbzEtczMta2V5LXNlY3JldD1taW5pb2FkbWluCg==<br> |
…
|
1 |
shell> kubectl apply -f deploy/cluster1-pgbackrest-secrets.yaml -n postgres-operator2<br> |
Note – For configuring with other storage types like (GCB, ABS etc) please refer to the manual – https://docs.percona.com/percona-operator-for-postgresql/2.0/backups-storage.html#__tabbed_1_3
3) Once the secret file is deployed , we need to add the remote bucket/endpoint details along with the above secret [cluster1-pgbackrest-secrets] in the pgBackRest backup section of [cr.yaml] file. The backup stored in the remote S3 repository is initiated by the main primary cluster node with similar pgBackRest configuration.
|
1 |
backups:<br># trackLatestRestorableTime: true<br> pgbackrest:<br># metadata:<br># labels:<br> image: docker.io/percona/percona-pgbackrest:2.55.0<br># initContainer:<br># image: docker.io/percona/percona-postgresql-operator:2.7.0<br># resources:<br># limits:<br># cpu: 2.0<br># memory: 4Gi<br># requests:<br># cpu: 1.0<br># memory: 3Gi<br># containerSecurityContext:<br># runAsUser: 1001<br># runAsGroup: 1001<br># runAsNonRoot: true<br># privileged: false<br># allowPrivilegeEscalation: false<br># readOnlyRootFilesystem: true<br># capabilities:<br># add:<br># - NET_ADMIN<br># - SYS_TIME<br># drop:<br># - ALL<br># seccompProfile:<br># type: Localhost<br># localhostProfile: localhost/profile.json<br># procMount: Default<br># seLinuxOptions:<br># type: spc_t<br># level: s0:c123,c456<br># containers:<br># pgbackrest:<br># resources:<br># limits:<br># cpu: 200m<br># memory: 128Mi<br># requests:<br># cpu: 150m<br># memory: 120Mi<br># pgbackrestConfig:<br># resources:<br># limits:<br># cpu: 200m<br># memory: 128Mi<br># requests:<br># cpu: 150m<br># memory: 120Mi<br>#<br> configuration:<br> - secret:<br> name: cluster1-pgbackrest-secrets<br># jobs:<br># restartPolicy: OnFailure<br># backoffLimit: 2<br># priorityClassName: high-priority<br># ttlSecondsAfterFinished: 60<br># resources:<br># limits:<br># cpu: 200m<br># memory: 128Mi<br># requests:<br># cpu: 150m<br># memory: 120Mi<br># tolerations:<br># - effect: NoSchedule<br># key: role<br># operator: Equal<br># value: connection-poolers<br>#<br># securityContext:<br># fsGroup: 1001<br># runAsUser: 1001<br># runAsNonRoot: true<br># fsGroupChangePolicy: "OnRootMismatch"<br># runAsGroup: 1001<br># seLinuxOptions:<br># type: spc_t<br># level: s0:c123,c456<br># seccompProfile:<br># type: Localhost<br># localhostProfile: localhost/profile.json<br># supplementalGroups:<br># - 1001<br># sysctls:<br># - name: net.ipv4.tcp_keepalive_time<br># value: "600"<br># - name: net.ipv4.tcp_keepalive_intvl<br># value: "60"<br>#<br> global:<br># repo1-retention-full: "14"<br># repo1-retention-full-type: time<br> repo1-path: /pgbackrest/postgres-operator/cluster1/repo1<br># repo1-cipher-type: aes-256-cbc<br> repo1-s3-uri-style: path<br> repo1-s3-verify-tls: 'n'<br># repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2<br># repo3-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo3<br># repo4-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo4<br> repoHost:<br># resources:<br># limits:<br># cpu: 200m<br># memory: 128Mi<br># requests:<br># cpu: 150m<br># memory: 120Mi<br> affinity:<br> podAntiAffinity:<br> preferredDuringSchedulingIgnoredDuringExecution:<br> - weight: 1<br> podAffinityTerm:<br> labelSelector:<br> matchLabels:<br> postgres-operator.crunchydata.com/data: pgbackrest<br> topologyKey: kubernetes.io/hostname<br># tolerations:<br># - effect: NoSchedule<br># key: role<br># operator: Equal<br># value: connection-poolers<br># priorityClassName: high-priority<br>#<br># topologySpreadConstraints:<br># - maxSkew: 1<br># topologyKey: my-node-label<br># whenUnsatisfiable: ScheduleAnyway<br># labelSelector:<br># matchLabels:<br># postgres-operator.crunchydata.com/pgbackrest: ""<br>#<br># securityContext:<br># fsGroup: 1001<br># runAsUser: 1001<br># runAsNonRoot: true<br># fsGroupChangePolicy: "OnRootMismatch"<br># runAsGroup: 1001<br># seLinuxOptions:<br># type: spc_t<br># level: s0:c123,c456<br># seccompProfile:<br># type: Localhost<br># localhostProfile: localhost/profile.json<br># supplementalGroups:<br># - 1001<br># sysctls:<br># - name: net.ipv4.tcp_keepalive_time<br># value: "600"<br># - name: net.ipv4.tcp_keepalive_intvl<br># value: "60"<br>#<br> manual:<br> repoName: repo1<br> options:<br> - --type=full<br># initialDelaySeconds: 120<br> repos:<br># - name: repo1<br># schedules:<br># full: "0 0 * * 6"<br># differential: "0 1 * * 1-6"<br># incremental: "0 1 * * 1-6"<br># volume:<br># volumeClaimSpec:<br># storageClassName: standard<br># accessModes:<br># - ReadWriteOnce<br># resources:<br># requests:<br># storage: 1Gi<br> - name: repo1<br> s3:<br> bucket: "ajtest"<br> endpoint: "https://host.k3d.internal:9000"<br> region: "us-east-1" |
4) Also, enable the stand-by and mention the target repo name in the [cr.yaml] file.
|
1 |
standby:<br> enabled: true<br> repoName: repo1<br> |
Finally, we can apply the modifications.
|
1 |
shell> kubectl apply -f deploy/cr.yaml -n postgres-operator2<br> |
5) Verifying data synchronization.
The existing pgBackRest backups will be now listed on the standby side also.
|
1 |
shell> kubectl exec -it cluster1-repo-host-0 -n postgres-operator -- sh<br>sh-5.1$ pgbackrest info<br>stanza: db<br> status: ok<br> cipher: none<br><br> db (current)<br> wal archive min/max (17): 00000002000000000000000B/000000060000000000000022<br><br> full backup: 20251107-164421F<br> timestamp start/stop: 2025-11-07 16:44:21+00 / 2025-11-07 16:44:24+00<br> wal start/stop: 00000002000000000000000C / 00000002000000000000000C<br> database size: 30.7MB, database backup size: 30.7MB<br> repo1: backup set size: 4MB, backup size: 4MB<br><br> full backup: 20251107-165613F<br> timestamp start/stop: 2025-11-07 16:56:13+00 / 2025-11-07 16:56:17+00<br> wal start/stop: 000000020000000000000013 / 000000020000000000000013<br> database size: 38.3MB, database backup size: 38.3MB<br> repo1: backup set size: 5MB, backup size: 5MB<br><br><br> full backup: 20251111-070032F<br> timestamp start/stop: 2025-11-11 07:00:32+00 / 2025-11-11 07:00:35+00<br> wal start/stop: 000000060000000000000025 / 000000060000000000000026<br> database size: 38.8MB, database backup size: 38.8MB<br> repo1: backup set size: 5.1MB, backup size: 5.1MB<br> |
Further, if we access the stand-by database the data sync will reflect there.
|
1 |
shell> kubectl exec -it cluster1-instance1-ft6m-0 -n postgres-operator2 -- sh<br>sh-5.1$ psql<br>psql (17.5 - Percona Server for PostgreSQL 17.5.2)<br>Type "help" for help.<br><br>postgres=# c hello<br>You are now connected to database "hello" as user "postgres".<br>hello=# dt<br> List of relations<br> Schema | Name | Type | Owner <br>--------+------+-------+----------<br> public | h1 | table | postgres<br><br>(1 rows)<br> |
If the changes do not reflect, try removing the old pod/pvc.
|
1 |
shell> kubectl delete pod <pod_name> -n <namespace><br>shell> kubectl delete pvc <pvc_name> -n <namespace><br> |
So, the above procedures we discussed basically outline a few ways to deploy a new standalone/stand-by cluster from the source primary cluster in the k8s/Percona operator-based environment. This also provides the flexibility to serve both the purpose of having a continuous data stream or just building a one time cluster with the exact data set.