Having a stand-by cluster ensures maximum data availability and a disaster recovery solution. In this blog post, we will cover how to set up a standby cluster using streaming replication and how to create an ad-hoc/standby cluster that uses a remote pgBackRest repository. Both the source and destination clusters can be deployed in different namespaces, regions, or data centers, with no such dependencies between them.
Let’s deep dive into each of the processes below.
Building a stand-by cluster using streaming replication
1) Below is the leader/primary cluster which is already set up and running.
|
1 2 3 4 5 6 7 |
shell> kubectl get pods -n postgres-operator NAME READY STATUS RESTARTS AGE cluster1-backup-wffk-9lbcf 0/1 Completed 0 2d22h cluster1-instance1-wltm-0 4/4 Running 1 (6h39m ago) 22h cluster1-pgbouncer-556659fb94-szvjt 2/2 Running 0 3d21h cluster1-repo-host-0 2/2 Running 0 2d22h percona-postgresql-operator-6746bff4c7-729z5 1/1 Running 3 (11h ago) 3d21h |
In order to stand-by to connect to leader/primary, we need to expose the service in the below part of [cr.yaml] file.
|
1 2 3 4 5 6 7 8 9 10 11 |
image: docker.io/percona/percona-postgresql-operator:2.7.0-ppg17.5.2-postgres imagePullPolicy: Always postgresVersion: 17 # port: 5432 expose: # annotations: # my-annotation: value1 # labels: # my-label: value2 type: clusterIP |
|
1 |
shell> kubectl apply -f cr.yaml -n postgres-operator |
|
1 2 3 4 5 6 7 8 |
shell> kubectl get services -n postgres-operator NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cluster1-ha ClusterIP 10.43.101.40 <none> 5432/TCP 2d15h cluster1-ha-config ClusterIP None <none> <none> 2d15h cluster1-pgbouncer ClusterIP 10.43.149.182 <none> 5432/TCP 2d15h cluster1-pods ClusterIP None <none> <none> 2d15h cluster1-primary ClusterIP None <none> 5432/TCP 2d15h cluster1-replicas ClusterIP 10.43.85.169 <none> 5432/TCP 2d15h |
The exact endpoint details below will be used later in stand-by cluster configuration.
|
1 |
<service-name>.<namespace>.svc.cluster.local |
E.g,
|
1 |
cluster1-ha.postgres-operator.svc.cluster.local |
2) Next, we need to make sure we have copied all the certificates from the leader/primary cluster and deployed the same on the stand-by cluster which we set up under a different namespace [postgres-operator2].
|
1 2 3 |
shell> kubectl get secret cluster1-cluster-ca-cert -n postgres-operator -o yaml > backup-cluster1-cluster-ca-cert.yaml shell> kubectl get secret cluster1-cluster-cert -n postgres-operator -o yaml > backup-cluster1-cluster-cert.yaml shell> kubectl get secret cluster1-replication-cert -n postgres-operator -o yaml > backup-cluster1-replication-cert.yaml |
Delete the old certificates from the new setup/ stand-by after taking the backup(if required).
|
1 2 3 |
shell> kubectl get secret cluster1-cluster-ca-cert -n postgres-operator2 -o yaml > backup-cluster1-cluster-ca-cert.yaml shell> kubectl get secret cluster1-cluster-cert -n postgres-operator2 -o yaml > backup-cluster1-cluster-cert.yaml shell> kubectl get secret cluster1-replication-cert -n postgres-operator2 -o yaml > backup-cluster1-replication-cert.yaml |
…
|
1 2 3 |
shell> kubectl delete secret cluster1-cluster-ca-cert -n postgres-operator2 shell> kubectl delete secret cluster1-cluster-cert -n postgres-operator2 shell> kubectl delete secret cluster1-replication-cert -n postgres-operator2 |
Before applying new secret changes make sure to change the namespace [postgres-operator2] as per new cluster.
|
1 2 3 |
shell> kubectl apply -f backup-cluster1-cluster-ca-cert.yaml -n postgres-operator2 shell> kubectl apply -f backup-cluster1-cluster-cert.yaml -n postgres-operator2 shell> kubectl apply -f backup-cluster1-replication-cert.yaml -n postgres-operator2 |
3) If we change the cert name to any different naming then we need to perform the changes in the stand-by [cr.yaml] file accordingly and re-apply the changes there.
|
1 2 3 4 5 6 7 8 9 10 11 12 |
secrets: # customRootCATLSSecret: # name: cluster1-ca-cert # items: # - key: "tls.crt" # path: "root.crt" # - key: "tls.key" # path: "root.key" customTLSSecret: name: cluster1-cert customReplicationTLSSecret: name: replication1-cert |
Additionally, we also need to enable the stand-by option and add the leader endpoint details in stand-by [cr.yaml].
|
1 2 3 |
standby: enabled: true host: cluster1-ha.postgres-operator.svc.cluster.local |
4) Finally, we can deploy the modified changes.
|
1 |
shell> kubectl apply -f deploy/cr.yaml -n pg-operator |
Also, make sure to delete the pod and associated pvc in case the changes are not reflects.
|
1 2 |
shell> kubectl delete pvc cluster1-instance1-ft6m-pgdata -n postgres-operator2 shell> kubectl delete pod cluster1-instance1-ft6m-0 -n postgres-operator2 |
5) Verify the changes on stand-by.
Primary/Leader:
|
1 2 3 4 5 6 7 |
shell> kubectl exec -it cluster1-instance1-wltm-0 -n postgres-operator -- sh hello=# dt List of relations Schema | Name | Type | Owner --------+------+-------+---------- public | h1 | table | postgres (1 rows) |
Stand-by:
|
1 2 3 4 5 6 7 |
shell> kubectl exec -it cluster1-instance1-ft6m-0 -n postgres-operator2 -- sh hello=# dt List of relations Schema | Name | Type | Owner --------+------+-------+---------- public | h1 | table | postgres (1 rows) |
|
1 2 3 4 5 6 7 |
sh-5.1$ patronictl list + Cluster: cluster1-ha (7569663519331602522) -------------------------+----------------+---------------------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +---------------------------+-----------------------------------------+----------------+---------------------+----+-----------+ | cluster1-instance1-ft6m-0 | cluster1-instance1-ft6m-0.cluster1-pods | Standby Leader | in archive recovery | 6 | | +---------------------------+-----------------------------------------+----------------+---------------------+----+-----------+ sh-5.1$ |
Building an stand-by/ad-hoc cluster using a pgBackRest repository
1) Considering the below standby cluster.
|
1 2 3 4 5 6 |
kubectl get pods -n postgres-operator2 NAME READY STATUS RESTARTS AGE cluster1-instance1-ft6m-0 4/4 Running 0 36h cluster1-pgbouncer-556659fb94-qk2ng 2/2 Running 0 2d15h cluster1-repo-host-0 2/2 Running 0 2d15h percona-postgresql-operator-6746bff4c7-w7l9h 1/1 Running 0 3d11h |
2) Next we need to set up our bucket/S3 credentials in a secret file.
|
1 2 3 4 5 |
shell> cat <<EOF | base64 -b 0 [global] repo1-s3-key=minioadmin repo1-s3-key-secret=minioadmin EOF |
Output:
|
1 |
W2dsb2JhbF0KcmVwbzEtczMta2V5PW1pbmlvYWRtaW4KcmVwbzEtczMta2V5LXNlY3JldD1taW5pb2FkbWluCg== |
|
1 2 3 4 5 6 7 8 9 |
shell> cat cluster1-pgbackrest-secrets.yaml apiVersion: v1 kind: Secret metadata: name: cluster1-pgbackrest-secrets type: Opaque data: s3.conf: W2dsb2JhbF0KcmVwbzEtczMta2V5PW1pbmlvYWRtaW4KcmVwbzEtczMta2V5LXNlY3JldD1taW5pb2FkbWluCg== |
…
|
1 |
shell> kubectl apply -f deploy/cluster1-pgbackrest-secrets.yaml -n postgres-operator2 |
Note – For configuring with other storage types like (GCB, ABS etc) please refer to the manual – https://docs.percona.com/percona-operator-for-postgresql/2.0/backups-storage.html#__tabbed_1_3
3) Once the secret file is deployed , we need to add the remote bucket/endpoint details along with the above secret [cluster1-pgbackrest-secrets] in the pgBackRest backup section of [cr.yaml] file. The backup stored in the remote S3 repository is initiated by the main primary cluster node with similar pgBackRest configuration.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
backups: # trackLatestRestorableTime: true pgbackrest: # metadata: # labels: image: docker.io/percona/percona-pgbackrest:2.55.0 # initContainer: # image: docker.io/percona/percona-postgresql-operator:2.7.0 # resources: # limits: # cpu: 2.0 # memory: 4Gi # requests: # cpu: 1.0 # memory: 3Gi # containerSecurityContext: # runAsUser: 1001 # runAsGroup: 1001 # runAsNonRoot: true # privileged: false # allowPrivilegeEscalation: false # readOnlyRootFilesystem: true # capabilities: # add: # - NET_ADMIN # - SYS_TIME # drop: # - ALL # seccompProfile: # type: Localhost # localhostProfile: localhost/profile.json # procMount: Default # seLinuxOptions: # type: spc_t # level: s0:c123,c456 # containers: # pgbackrest: # resources: # limits: # cpu: 200m # memory: 128Mi # requests: # cpu: 150m # memory: 120Mi # pgbackrestConfig: # resources: # limits: # cpu: 200m # memory: 128Mi # requests: # cpu: 150m # memory: 120Mi # configuration: - secret: name: cluster1-pgbackrest-secrets # jobs: # restartPolicy: OnFailure # backoffLimit: 2 # priorityClassName: high-priority # ttlSecondsAfterFinished: 60 # resources: # limits: # cpu: 200m # memory: 128Mi # requests: # cpu: 150m # memory: 120Mi # tolerations: # - effect: NoSchedule # key: role # operator: Equal # value: connection-poolers # # securityContext: # fsGroup: 1001 # runAsUser: 1001 # runAsNonRoot: true # fsGroupChangePolicy: "OnRootMismatch" # runAsGroup: 1001 # seLinuxOptions: # type: spc_t # level: s0:c123,c456 # seccompProfile: # type: Localhost # localhostProfile: localhost/profile.json # supplementalGroups: # - 1001 # sysctls: # - name: net.ipv4.tcp_keepalive_time # value: "600" # - name: net.ipv4.tcp_keepalive_intvl # value: "60" # global: # repo1-retention-full: "14" # repo1-retention-full-type: time repo1-path: /pgbackrest/postgres-operator/cluster1/repo1 # repo1-cipher-type: aes-256-cbc repo1-s3-uri-style: path repo1-s3-verify-tls: 'n' # repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2 # repo3-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo3 # repo4-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo4 repoHost: # resources: # limits: # cpu: 200m # memory: 128Mi # requests: # cpu: 150m # memory: 120Mi affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 podAffinityTerm: labelSelector: matchLabels: postgres-operator.crunchydata.com/data: pgbackrest topologyKey: kubernetes.io/hostname # tolerations: # - effect: NoSchedule # key: role # operator: Equal # value: connection-poolers # priorityClassName: high-priority # # topologySpreadConstraints: # - maxSkew: 1 # topologyKey: my-node-label # whenUnsatisfiable: ScheduleAnyway # labelSelector: # matchLabels: # postgres-operator.crunchydata.com/pgbackrest: "" # # securityContext: # fsGroup: 1001 # runAsUser: 1001 # runAsNonRoot: true # fsGroupChangePolicy: "OnRootMismatch" # runAsGroup: 1001 # seLinuxOptions: # type: spc_t # level: s0:c123,c456 # seccompProfile: # type: Localhost # localhostProfile: localhost/profile.json # supplementalGroups: # - 1001 # sysctls: # - name: net.ipv4.tcp_keepalive_time # value: "600" # - name: net.ipv4.tcp_keepalive_intvl # value: "60" # manual: repoName: repo1 options: - --type=full # initialDelaySeconds: 120 repos: # - name: repo1 # schedules: # full: "0 0 * * 6" # differential: "0 1 * * 1-6" # incremental: "0 1 * * 1-6" # volume: # volumeClaimSpec: # storageClassName: standard # accessModes: # - ReadWriteOnce # resources: # requests: # storage: 1Gi - name: repo1 s3: bucket: "ajtest" endpoint: "https://host.k3d.internal:9000" region: "us-east-1" |
4) Also, enable the stand-by and mention the target repo name in the [cr.yaml] file.
|
1 2 3 |
standby: enabled: true repoName: repo1 |
Finally, we can apply the modifications.
|
1 |
shell> kubectl apply -f deploy/cr.yaml -n postgres-operator2 |
5) Verifying data synchronization.
The existing pgBackRest backups will be now listed on the standby side also.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
shell> kubectl exec -it cluster1-repo-host-0 -n postgres-operator -- sh sh-5.1$ pgbackrest info stanza: db status: ok cipher: none db (current) wal archive min/max (17): 00000002000000000000000B/000000060000000000000022 full backup: 20251107-164421F timestamp start/stop: 2025-11-07 16:44:21+00 / 2025-11-07 16:44:24+00 wal start/stop: 00000002000000000000000C / 00000002000000000000000C database size: 30.7MB, database backup size: 30.7MB repo1: backup set size: 4MB, backup size: 4MB full backup: 20251107-165613F timestamp start/stop: 2025-11-07 16:56:13+00 / 2025-11-07 16:56:17+00 wal start/stop: 000000020000000000000013 / 000000020000000000000013 database size: 38.3MB, database backup size: 38.3MB repo1: backup set size: 5MB, backup size: 5MB full backup: 20251111-070032F timestamp start/stop: 2025-11-11 07:00:32+00 / 2025-11-11 07:00:35+00 wal start/stop: 000000060000000000000025 / 000000060000000000000026 database size: 38.8MB, database backup size: 38.8MB repo1: backup set size: 5.1MB, backup size: 5.1MB |
Further, if we access the stand-by database the data sync will reflect there.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
shell> kubectl exec -it cluster1-instance1-ft6m-0 -n postgres-operator2 -- sh sh-5.1$ psql psql (17.5 - Percona Server for PostgreSQL 17.5.2) Type "help" for help. postgres=# c hello You are now connected to database "hello" as user "postgres". hello=# dt List of relations Schema | Name | Type | Owner --------+------+-------+---------- public | h1 | table | postgres (1 rows) |
If the changes do not reflect, try removing the old pod/pvc.
|
1 2 |
shell> kubectl delete pod <pod_name> -n <namespace> shell> kubectl delete pvc <pvc_name> -n <namespace> |
Summary:
So, the above procedures we discussed basically outline a few ways to deploy a new standalone/stand-by cluster from the source primary cluster in the k8s/Percona operator-based environment. This also provides the flexibility to serve both the purpose of having a continuous data stream or just building a one time cluster with the exact data set.