Patroni is a Python-based template for managing high availability PostgreSQL clusters. Originally a fork of the Governor project by Compose, Patroni has evolved significantly with many new features and active community development. It supports integration with various Distributed Configuration Stores (DCS) like etcd, Consul, and ZooKeeper, and provides simple setup and robust failover management.
This blog explores the detailed process of ensuring High Availability (HA) and Disaster Recovery (DR) for a PostgreSQL cluster using Patroni. It focuses on performing a switchover from a primary Patroni cluster to its standby counterpart.
Prerequisites:
The following details are used to enhance the discussion about the switchover.
- DC(cluster11) and DR(cluster12) site
- Three Nodes on DC: One Leader and two Replicas.
- Two Nodes on DR: One Leader standby and a Replica
Process:
The blog post has been divided into three sections.
Patroni DC-DR setup using anydbver
AnyDBver is a quick deployment tool written and maintained by a former Percona Engineer. Using Ansible, it can configure and deploy temporary MySQL and Percona Server for MySQL/PostgreSQL/MongoDB environments. This tool is specially designed to set up a testing or demo environment, not for deploying production-ready environments. Additionally, it builds the entire environment on a single server.
The command below can be used to deploy a DC-DR setup using the Docker container, provided that Docker has been installed.
1 |
anydbver deploy ppg:16 patroni:cluster=cluster11 node1 ppg:16,master=node0 patroni:master=node0,cluster=cluster11 node2 ppg:16,master=node0 patroni:master=node0,cluster=cluster11 node3 ppg:16 patroni:standby=node0,cluster=cluster12 node4 ppg:16,master=node3 patroni:master=node3,cluster=cluster12 |
The output of the cluster deployment using anydbver will look like below.
1 2 3 4 5 6 |
2025/05/29 20:23:41 PLAY RECAP ********************************************************************* 2025/05/29 20:23:41 anydbver-node0 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0 2025/05/29 20:23:41 anydbver-node1 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0 2025/05/29 20:23:41 anydbver-node2 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0 2025/05/29 20:23:41 anydbver-node3 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0 2025/05/29 20:23:41 anydbver-node4 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0 |
Once the anydbver completes its deployment, you can see the Primary and Standby Cluster as below.
A primary cluster in Patroni refers to a complete, functional environment, often consisting of:
-
- One or more PostgreSQL nodes running under Patroni with one primary and multiple replicas.
- It can be referred to as a read-write cluster.
- A DCS (Distributed Configuration Store) node or cluster (etcd, Consul, Zookeeper).
- Shared storage or backup mechanisms (e.g., pgBackRest, S3) can be used.
- Replicas inside the primary cluster follow the master.
- Patroni uses DCS (Distributed Configuration Store) to elect a leader based on the node health and replication status.
Primary cluster status
1 2 3 4 5 6 7 8 |
patronictl -c /etc/patroni/cluster11-0.yml list + Cluster: cluster11 (7509880627210762172) -----------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +------------------+------------+---------+-----------+----+-----------+ | cluster11-0 | 172.19.0.2 | Leader | running | 1 | | | cluster1111530-1 | 172.19.0.3 | Replica | streaming | 1 | 0 | | cluster1129801-1 | 172.19.0.4 | Replica | streaming | 1 | 0 | +------------------+------------+---------+-----------+----+-----------+ |
The standby cluster can be described as follows
-
- A standby cluster is designed to address catastrophic events, such as major power outages or a complete data center loss due to network failure
- Patroni Standby Cluster is not part of a High-Availability (HA) setup but rather a critical component of a Disaster Recovery (DR) strategy
- The Patroni Standby Cluster plays a key role in this process by enabling a smooth and controlled failover of PostgreSQL workloads to the standby environment, helping minimize downtime and data loss during a disaster.
- A separate Patroni cluster is configured to replicate from another cluster (the primary cluster).
- Uses streaming replication, recovered replica using the WAL segment of the primary cluster, or both.
- Not part of the primary’s quorum or DCS.
- Can be promoted independently in disaster recovery (DR) scenarios.
- A kind of “disaster recovery site” or “geo-replication” setup.
- It is known as a read-only cluster until it is not promoted.
- Replicas inside the standby cluster are made using cascaded replication, as it follows the Standby leader.
- The standby leader holds and updates a leader lock in DCS. If the leader lock expires, cascade replicas will perform an election to choose another leader from the standbys.
- The standby cluster is not being displayed in patronictl list or patronictl topology output on the primary cluster.
Standby cluster status:
1 2 3 4 5 6 7 |
patronictl -c /etc/patroni/cluster1217542-1.yml list + Cluster: cluster12 (7509880523941320635) ------+---------------------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +------------------+------------+----------------+---------------------+----+-----------+ | cluster12-0 | 172.19.0.5 | Standby Leader | in archive recovery | 1 | | | cluster1217542-1 | 172.19.0.6 | Replica | streaming | 1 | 0 | +------------------+------------+----------------+---------------------+----+-----------+ |
Demotion of primary cluster
Demoting the primary cluster aims to transition the site during planned maintenance activities.
If the primary cluster is completely down and unreachable because of a network failure, you can’t demote it; instead, you can directly promote the standby cluster as a new primary cluster.
Here, we should consider the following points before demoting the primary cluster and promoting the standby cluster.
-
- Take appropriate application downtime
- Verify the timelines of each node in both clusters. The timelines should be the same in all the nodes.
- Monitor the possible replication lag between DC and DR, and let the lag be clear before performing switchover.
After deciding on a switchover, you can dynamically update the patroni config file by adding three lines in the primary cluster configuration as detailed below. The host IP(172.19.0.5) must be the address of the standby leader node from the DR site.
1 2 3 4 5 6 7 8 9 10 |
patronictl -c /etc/patroni/cluster11-0.yml edit-config cluster11 --- +++ @@ -16,3 +16,6 @@ use_slots: true retry_timeout: 10 ttl: 30 +standby_cluster: + host: 172.19.0.5 + port: 5432 |
Another approach is to use the Patroni API for this task.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
curl -s -XPATCH -d '{"standby_cluster":{"host":"172.19.0.5","port":"5432"}}' http://localhost:8008/config | jq . { "ttl": 30, "loop_wait": 10, "retry_timeout": 10, "maximum_lag_on_failover": 1048576, "postgresql": { "use_pg_rewind": true, "use_slots": true, "parameters": { "wal_level": "replica", "hot_standby": "on", "max_wal_senders": 10, "max_replication_slots": 10, "wal_log_hints": "on", "archive_mode": "on", "archive_timeout": "600s", "archive_command": "cp -f %p /home/postgres/archived/%f" }, "recovery_conf": { "restore_command": "cp /home/postgres/archived/%f %p" } }, "standby_cluster": { "host": "172.19.0.5", "port": "5432" } } |
After editing the config file, save it and check the status of the Patroni cluster. It should demote the existing leader of the primary cluster to a standby leader.
1 2 3 4 5 6 7 8 |
[root@node0 /]# patronictl -c /etc/patroni/cluster11-0.yml list + Cluster: cluster11 (7509880627210762172) ------+---------------------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +------------------+------------+----------------+---------------------+----+-----------+ | cluster11-0 | 172.19.0.2 | Standby Leader | in archive recovery | 1 | | | cluster1111530-1 | 172.19.0.3 | Replica | streaming | 1 | 0 | | cluster1129801-1 | 172.19.0.4 | Replica | streaming | 1 | 0 | +------------------+------------+----------------+---------------------+----+-----------+ |
Promotion of standby cluster
Before promoting the standby cluster, ensure the primary cluster is demoted to prevent any challenging split-brain scenarios. To do so, you need to remove the standby_cluster tag from the standby cluster configuration file, as given below.
1 2 3 4 5 6 7 8 9 10 11 |
patronictl -c /etc/patroni/cluster12-0.yml edit-config cluster12 --- +++ @@ -15,7 +15,4 @@ use_pg_rewind: true use_slots: true retry_timeout: 10 -standby_cluster: - host: 172.19.0.2 - port: 5432 ttl: 30 |
Another approach is to use the Patroni API for this task.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
curl -s -XPATCH -d '{"standby_cluster":null}' http://localhost:8008/config | jq . { "ttl": 30, "loop_wait": 10, "retry_timeout": 10, "maximum_lag_on_failover": 1048576, "postgresql": { "use_pg_rewind": true, "use_slots": true, "parameters": { "wal_level": "replica", "hot_standby": "on", "max_wal_senders": 10, "max_replication_slots": 10, "wal_log_hints": "on", "archive_mode": "on", "archive_timeout": "600s", "archive_command": "cp -f %p /home/postgres/archived/%f" }, "recovery_conf": { "restore_command": "cp /home/postgres/archived/%f %p" } } } |
After removing the standby_cluster tag, you can verify its status. Node 172.19.0.5 has been promoted to a new leader. Once the leader is switched to the DR site, you need to point the application to the DR leader node.
1 2 3 4 5 6 7 |
[root@node3 /]# patronictl -c /etc/patroni/cluster12-0.yml list + Cluster: cluster12 (7509880523941320635) -----------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +------------------+------------+---------+-----------+----+-----------+ | cluster12-0 | 172.19.0.5 | Leader | running | 2 | | | cluster1217542-1 | 172.19.0.6 | Replica | streaming | 2 | 0 | +------------------+------------+---------+-----------+----+-----------+ |
I hope this blog helps you grasp the DC and DC switchover concept using Patroni.
Our PostgreSQL training course, tailored for eager learners, is an instructor-led program that covers all the abovementioned topics and includes hands-on labs and expert guidance. Find more details and register at percona.com/training.
Further reading
- https://patroni.readthedocs.io/en/latest/standby_cluster.html
- Patroni versions can offer vastly different features. Unfortunately, some variations exist based on the installation method, such as different Linux distributions or installation from Python repositories.