Patroni is a Python-based template for managing high availability PostgreSQL clusters. Originally a fork of the Governor project by Compose, Patroni has evolved significantly with many new features and active community development. It supports integration with various Distributed Configuration Stores (DCS) like etcd, Consul, and ZooKeeper, and provides simple setup and robust failover management.
This blog explores the detailed process of ensuring High Availability (HA) and Disaster Recovery (DR) for a PostgreSQL cluster using Patroni. It focuses on performing a switchover from a primary Patroni cluster to its standby counterpart.
The following details are used to enhance the discussion about the switchover.
AnyDBver is a quick deployment tool written and maintained by a former Percona Engineer. Using Ansible, it can configure and deploy temporary MySQL and Percona Server for MySQL/PostgreSQL/MongoDB environments. This tool is specially designed to set up a testing or demo environment, not for deploying production-ready environments. Additionally, it builds the entire environment on a single server.
The command below can be used to deploy a DC-DR setup using the Docker container, provided that Docker has been installed.
|
1 |
anydbver deploy ppg:16 patroni_cluster=cluster11 node1 ppg:16,master=node0 patroni_master=node0,cluster=cluster11 node2 ppg:16,master=node0 patroni_master=node0,cluster=cluster11 node3 ppg:16 patroni_standby=node0,cluster=cluster12 node4 ppg:16,master=node3 patroni_master=node3,cluster=cluster12 |
The output of the cluster deployment using anydbver will look like below.
|
1 2 3 4 5 6 |
2025/05/29 20:23:41 PLAY RECAP ********************************************************************* 2025/05/29 20:23:41 anydbver-node0 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0 2025/05/29 20:23:41 anydbver-node1 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0 2025/05/29 20:23:41 anydbver-node2 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0 2025/05/29 20:23:41 anydbver-node3 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0 2025/05/29 20:23:41 anydbver-node4 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0 |
Once the anydbver completes its deployment, you can see the Primary and Standby Cluster as below.
A primary cluster in Patroni refers to a complete, functional environment, often consisting of:
Primary cluster status
|
1 2 3 4 5 6 7 8 |
patronictl -c /etc/patroni/cluster11-0.yml list + Cluster: cluster11 (7509880627210762172) -----------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +------------------+------------+---------+-----------+----+-----------+ | cluster11-0 | 172.19.0.2 | Leader | running | 1 | | | cluster1111530-1 | 172.19.0.3 | Replica | streaming | 1 | 0 | | cluster1129801-1 | 172.19.0.4 | Replica | streaming | 1 | 0 | +------------------+------------+---------+-----------+----+-----------+ |
The standby cluster can be described as follows
Standby cluster status:
|
1 2 3 4 5 6 7 |
patronictl -c /etc/patroni/cluster1217542-1.yml list + Cluster: cluster12 (7509880523941320635) ------+---------------------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +------------------+------------+----------------+---------------------+----+-----------+ | cluster12-0 | 172.19.0.5 | Standby Leader | in archive recovery | 1 | | | cluster1217542-1 | 172.19.0.6 | Replica | streaming | 1 | 0 | +------------------+------------+----------------+---------------------+----+-----------+ |
Demoting the primary cluster aims to transition the site during planned maintenance activities.
If the primary cluster is completely down and unreachable because of a network failure, you can’t demote it; instead, you can directly promote the standby cluster as a new primary cluster.
Here, we should consider the following points before demoting the primary cluster and promoting the standby cluster.
After deciding on a switchover, you can dynamically update the patroni config file by adding three lines in the primary cluster configuration as detailed below. The host IP(172.19.0.5) must be the address of the standby leader node from the DR site.
|
1 2 3 4 5 6 7 8 9 10 |
patronictl -c /etc/patroni/cluster11-0.yml edit-config cluster11 --- +++ @@ -16,3 +16,6 @@ use_slots: true retry_timeout: 10 ttl: 30 +standby_cluster: + host: 172.19.0.5 + port: 5432 |
Another approach is to use the Patroni API for this task.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
curl -s -XPATCH -d '{"standby_cluster":{"host":"172.19.0.5","port":"5432"}}' http://localhost:8008/config | jq . { "ttl": 30, "loop_wait": 10, "retry_timeout": 10, "maximum_lag_on_failover": 1048576, "postgresql": { "use_pg_rewind": true, "use_slots": true, "parameters": { "wal_level": "replica", "hot_standby": "on", "max_wal_senders": 10, "max_replication_slots": 10, "wal_log_hints": "on", "archive_mode": "on", "archive_timeout": "600s", "archive_command": "cp -f %p /home/postgres/archived/%f" }, "recovery_conf": { "restore_command": "cp /home/postgres/archived/%f %p" } }, "standby_cluster": { "host": "172.19.0.5", "port": "5432" } } |
After editing the config file, save it and check the status of the Patroni cluster. It should demote the existing leader of the primary cluster to a standby leader.
|
1 2 3 4 5 6 7 8 |
[root@node0 /]# patronictl -c /etc/patroni/cluster11-0.yml list + Cluster: cluster11 (7509880627210762172) ------+---------------------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +------------------+------------+----------------+---------------------+----+-----------+ | cluster11-0 | 172.19.0.2 | Standby Leader | in archive recovery | 1 | | | cluster1111530-1 | 172.19.0.3 | Replica | streaming | 1 | 0 | | cluster1129801-1 | 172.19.0.4 | Replica | streaming | 1 | 0 | +------------------+------------+----------------+---------------------+----+-----------+ |
Before promoting the standby cluster, ensure the primary cluster is demoted to prevent any challenging split-brain scenarios. To do so, you need to remove the standby_cluster tag from the standby cluster configuration file, as given below.
|
1 2 3 4 5 6 7 8 9 10 11 |
patronictl -c /etc/patroni/cluster12-0.yml edit-config cluster12 --- +++ @@ -15,7 +15,4 @@ use_pg_rewind: true use_slots: true retry_timeout: 10 -standby_cluster: - host: 172.19.0.2 - port: 5432 ttl: 30 |
Another approach is to use the Patroni API for this task.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
curl -s -XPATCH -d '{"standby_cluster":null}' http://localhost:8008/config | jq . { "ttl": 30, "loop_wait": 10, "retry_timeout": 10, "maximum_lag_on_failover": 1048576, "postgresql": { "use_pg_rewind": true, "use_slots": true, "parameters": { "wal_level": "replica", "hot_standby": "on", "max_wal_senders": 10, "max_replication_slots": 10, "wal_log_hints": "on", "archive_mode": "on", "archive_timeout": "600s", "archive_command": "cp -f %p /home/postgres/archived/%f" }, "recovery_conf": { "restore_command": "cp /home/postgres/archived/%f %p" } } } |
After removing the standby_cluster tag, you can verify its status. Node 172.19.0.5 has been promoted to a new leader. Once the leader is switched to the DR site, you need to point the application to the DR leader node.
|
1 2 3 4 5 6 7 |
[root@node3 /]# patronictl -c /etc/patroni/cluster12-0.yml list + Cluster: cluster12 (7509880523941320635) -----------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +------------------+------------+---------+-----------+----+-----------+ | cluster12-0 | 172.19.0.5 | Leader | running | 2 | | | cluster1217542-1 | 172.19.0.6 | Replica | streaming | 2 | 0 | +------------------+------------+---------+-----------+----+-----------+ |
I hope this blog helps you grasp the DC and DC switchover concept using Patroni.
Our PostgreSQL training course, tailored for eager learners, is an instructor-led program that covers all the abovementioned topics and includes hands-on labs and expert guidance. Find more details and register at percona.com/training.