Disaster Recovery Switchover with Patroni for PostgreSQL

Patroni is a Python-based template for managing high availability PostgreSQL clusters. Originally a fork of the Governor project by Compose, Patroni has evolved significantly with many new features and active community development. It supports integration with various Distributed Configuration Stores (DCS) like etcd, Consul, and ZooKeeper, and provides simple setup and robust failover management.

This blog explores the detailed process of ensuring High Availability (HA) and Disaster Recovery (DR) for a PostgreSQL cluster using Patroni. It focuses on performing a switchover from a primary Patroni cluster to its standby counterpart.

Prerequisites:

The following details are used to enhance the discussion about the switchover.

DC(cluster11) and DR(cluster12) site
Three Nodes on DC: One Leader and two Replicas.
Two Nodes on DR: One Leader standby and a Replica

Process:

The blog post has been divided into three sections.

Patroni DC-DR setup using anydbver

AnyDBver is a quick deployment tool written and maintained by a former Percona Engineer. Using Ansible, it can configure and deploy temporary MySQL and Percona Server for MySQL/PostgreSQL/MongoDB environments. This tool is specially designed to set up a testing or demo environment, not for deploying production-ready environments. Additionally, it builds the entire environment on a single server.

The command below can be used to deploy a DC-DR setup using the Docker container, provided that Docker has been installed.

anydbver deploy ppg:16 patroni:cluster=cluster11 node1 ppg:16,master=node0  patroni:master=node0,cluster=cluster11 node2 ppg:16,master=node0 patroni:master=node0,cluster=cluster11   node3 ppg:16 patroni:standby=node0,cluster=cluster12  node4 ppg:16,master=node3 patroni:master=node3,cluster=cluster12

1	anydbver deploy ppg:16 patroni:cluster=cluster11 node1 ppg:16,master=node0 patroni:master=node0,cluster=cluster11 node2 ppg:16,master=node0 patroni:master=node0,cluster=cluster11 node3 ppg:16 patroni:standby=node0,cluster=cluster12 node4 ppg:16,master=node3 patroni:master=node3,cluster=cluster12

The output of the cluster deployment using anydbver will look like below.

2025/05/29 20:23:41  PLAY RECAP *********************************************************************
2025/05/29 20:23:41  anydbver-node0             : ok=39   changed=21   unreachable=0    failed=0    skipped=65   rescued=0    ignored=0
2025/05/29 20:23:41  anydbver-node1             : ok=39   changed=21   unreachable=0    failed=0    skipped=65   rescued=0    ignored=0
2025/05/29 20:23:41  anydbver-node2             : ok=39   changed=21   unreachable=0    failed=0    skipped=65   rescued=0    ignored=0
2025/05/29 20:23:41  anydbver-node3             : ok=39   changed=21   unreachable=0    failed=0    skipped=65   rescued=0    ignored=0
2025/05/29 20:23:41  anydbver-node4             : ok=39   changed=21   unreachable=0    failed=0    skipped=65   rescued=0    ignored=0

2025/05/29 20:23:41 PLAY RECAP *********************************************************************

2025/05/29 20:23:41 anydbver-node0 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0

2025/05/29 20:23:41 anydbver-node1 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0

2025/05/29 20:23:41 anydbver-node2 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0

2025/05/29 20:23:41 anydbver-node3 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0

2025/05/29 20:23:41 anydbver-node4 : ok=39 changed=21 unreachable=0 failed=0 skipped=65 rescued=0 ignored=0

Once the anydbver completes its deployment, you can see the Primary and Standby Cluster as below.

A primary cluster in Patroni refers to a complete, functional environment, often consisting of:

- One or more PostgreSQL nodes running under Patroni with one primary and multiple replicas.
- It can be referred to as a read-write cluster.
- A DCS (Distributed Configuration Store) node or cluster (etcd, Consul, Zookeeper).
- Shared storage or backup mechanisms (e.g., pgBackRest, S3) can be used.
- Replicas inside the primary cluster follow the master.
- Patroni uses DCS (Distributed Configuration Store) to elect a leader based on the node health and replication status.

Primary cluster status

patronictl -c /etc/patroni/cluster11-0.yml list
+ Cluster: cluster11 (7509880627210762172) -----------+----+-----------+
| Member           | Host       | Role    | State     | TL | Lag in MB |
+------------------+------------+---------+-----------+----+-----------+
| cluster11-0      | 172.19.0.2 | Leader  | running   |  1 |           |
| cluster1111530-1 | 172.19.0.3 | Replica | streaming |  1 |         0 |
| cluster1129801-1 | 172.19.0.4 | Replica | streaming |  1 |         0 |
+------------------+------------+---------+-----------+----+-----------+

patronictl -c /etc/patroni/cluster11-0.yml list

+ Cluster: cluster11 (7509880627210762172) -----------+----+-----------+

+------------------+------------+---------+-----------+----+-----------+

| cluster1111530-1 | 172.19.0.3 | Replica | streaming | 1 | 0 |

| cluster1129801-1 | 172.19.0.4 | Replica | streaming | 1 | 0 |

+------------------+------------+---------+-----------+----+-----------+

The standby cluster can be described as follows

- A standby cluster is designed to address catastrophic events, such as major power outages or a complete data center loss due to network failure
- Patroni Standby Cluster is not part of a High-Availability (HA) setup but rather a critical component of a Disaster Recovery (DR) strategy
- The Patroni Standby Cluster plays a key role in this process by enabling a smooth and controlled failover of PostgreSQL workloads to the standby environment, helping minimize downtime and data loss during a disaster.
- A separate Patroni cluster is configured to replicate from another cluster (the primary cluster).
- Uses streaming replication, recovered replica using the WAL segment of the primary cluster, or both.
- Not part of the primary’s quorum or DCS.
- Can be promoted independently in disaster recovery (DR) scenarios.
- A kind of “disaster recovery site” or “geo-replication” setup.
- It is known as a read-only cluster until it is not promoted.
- Replicas inside the standby cluster are made using cascaded replication, as it follows the Standby leader.
- The standby leader holds and updates a leader lock in DCS. If the leader lock expires, cascade replicas will perform an election to choose another leader from the standbys.
- The standby cluster is not being displayed in patronictl list or patronictl topology output on the primary cluster.

Standby cluster status:

patronictl -c /etc/patroni/cluster1217542-1.yml list
+ Cluster: cluster12 (7509880523941320635) ------+---------------------+----+-----------+
| Member           | Host       | Role           | State               | TL | Lag in MB |
+------------------+------------+----------------+---------------------+----+-----------+
| cluster12-0      | 172.19.0.5 | Standby Leader | in archive recovery |  1 |           |
| cluster1217542-1 | 172.19.0.6 | Replica        | streaming           |  1 |         0 |
+------------------+------------+----------------+---------------------+----+-----------+

patronictl -c /etc/patroni/cluster1217542-1.yml list

+ Cluster: cluster12 (7509880523941320635) ------+---------------------+----+-----------+

+------------------+------------+----------------+---------------------+----+-----------+

| cluster1217542-1 | 172.19.0.6 | Replica | streaming | 1 | 0 |

+------------------+------------+----------------+---------------------+----+-----------+

Demotion of primary cluster

Demoting the primary cluster aims to transition the site during planned maintenance activities.

If the primary cluster is completely down and unreachable because of a network failure, you can’t demote it; instead, you can directly promote the standby cluster as a new primary cluster.
Here, we should consider the following points before demoting the primary cluster and promoting the standby cluster.

- Take appropriate application downtime
- Verify the timelines of each node in both clusters. The timelines should be the same in all the nodes.
- Monitor the possible replication lag between DC and DR, and let the lag be clear before performing switchover.

After deciding on a switchover, you can dynamically update the patroni config file by adding three lines in the primary cluster configuration as detailed below. The host IP(172.19.0.5) must be the address of the standby leader node from the DR site.

patronictl -c /etc/patroni/cluster11-0.yml edit-config cluster11
---
+++
@@ -16,3 +16,6 @@
  use_slots: true
retry_timeout: 10
ttl: 30
+standby_cluster:
+   host: 172.19.0.5
+   port: 5432

patronictl -c /etc/patroni/cluster11-0.yml edit-config cluster11

---

+++

@@ -16,3 +16,6 @@

use_slots: true

retry_timeout: 10

ttl: 30

+standby_cluster:

+ host: 172.19.0.5

+ port: 5432

Another approach is to use the Patroni API for this task.

curl -s -XPATCH -d '{"standby_cluster":{"host":"172.19.0.5","port":"5432"}}' http://localhost:8008/config | jq .
{
  "ttl": 30,
  "loop_wait": 10,
  "retry_timeout": 10,
  "maximum_lag_on_failover": 1048576,
  "postgresql": {
    "use_pg_rewind": true,
    "use_slots": true,
    "parameters": {
      "wal_level": "replica",
      "hot_standby": "on",
      "max_wal_senders": 10,
      "max_replication_slots": 10,
      "wal_log_hints": "on",
      "archive_mode": "on",
      "archive_timeout": "600s",
      "archive_command": "cp -f %p /home/postgres/archived/%f"
    },
    "recovery_conf": {
      "restore_command": "cp /home/postgres/archived/%f %p"
    }
  },
  "standby_cluster": {
    "host": "172.19.0.5",
    "port": "5432"
  }
}

curl -s -XPATCH -d '{"standby_cluster":{"host":"172.19.0.5","port":"5432"}}' http://localhost:8008/config | jq .

{

"ttl": 30,

"loop_wait": 10,

"retry_timeout": 10,

"maximum_lag_on_failover": 1048576,

"postgresql": {

"use_pg_rewind": true,

"use_slots": true,

"parameters": {

"wal_level": "replica",

"hot_standby": "on",

"max_wal_senders": 10,

"max_replication_slots": 10,

"wal_log_hints": "on",

"archive_mode": "on",

"archive_timeout": "600s",

"archive_command": "cp -f %p /home/postgres/archived/%f"

"recovery_conf": {

"restore_command": "cp /home/postgres/archived/%f %p"

}

"standby_cluster": {

"host": "172.19.0.5",

"port": "5432"

}

After editing the config file, save it and check the status of the Patroni cluster. It should demote the existing leader of the primary cluster to a standby leader.

[root@node0 /]# patronictl -c /etc/patroni/cluster11-0.yml list
+ Cluster: cluster11 (7509880627210762172) ------+---------------------+----+-----------+
| Member           | Host       | Role           | State               | TL | Lag in MB |
+------------------+------------+----------------+---------------------+----+-----------+
| cluster11-0      | 172.19.0.2 | Standby Leader | in archive recovery |  1 |           |
| cluster1111530-1 | 172.19.0.3 | Replica        | streaming           |  1 |         0 |
| cluster1129801-1 | 172.19.0.4 | Replica        | streaming           |  1 |         0 |
+------------------+------------+----------------+---------------------+----+-----------+

[root@node0 /]# patronictl -c /etc/patroni/cluster11-0.yml list

+ Cluster: cluster11 (7509880627210762172) ------+---------------------+----+-----------+

+------------------+------------+----------------+---------------------+----+-----------+

| cluster1111530-1 | 172.19.0.3 | Replica | streaming | 1 | 0 |

| cluster1129801-1 | 172.19.0.4 | Replica | streaming | 1 | 0 |

+------------------+------------+----------------+---------------------+----+-----------+

Promotion of standby cluster

Before promoting the standby cluster, ensure the primary cluster is demoted to prevent any challenging split-brain scenarios. To do so, you need to remove the standby_cluster tag from the standby cluster configuration file, as given below.

patronictl -c /etc/patroni/cluster12-0.yml edit-config cluster12
---
+++
@@ -15,7 +15,4 @@
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
-standby_cluster:
-  host: 172.19.0.2
-  port: 5432
ttl: 30

patronictl -c /etc/patroni/cluster12-0.yml edit-config cluster12

---

+++

@@ -15,7 +15,4 @@

use_pg_rewind: true

use_slots: true

retry_timeout: 10

-standby_cluster:

- host: 172.19.0.2

- port: 5432

ttl: 30

Another approach is to use the Patroni API for this task.

curl -s -XPATCH -d '{"standby_cluster":null}' http://localhost:8008/config | jq .
{
  "ttl": 30,
  "loop_wait": 10,
  "retry_timeout": 10,
  "maximum_lag_on_failover": 1048576,
  "postgresql": {
    "use_pg_rewind": true,
    "use_slots": true,
    "parameters": {
      "wal_level": "replica",
      "hot_standby": "on",
      "max_wal_senders": 10,
      "max_replication_slots": 10,
      "wal_log_hints": "on",
      "archive_mode": "on",
      "archive_timeout": "600s",
      "archive_command": "cp -f %p /home/postgres/archived/%f"
    },
    "recovery_conf": {
      "restore_command": "cp /home/postgres/archived/%f %p"
    }
  }
}

curl -s -XPATCH -d '{"standby_cluster":null}' http://localhost:8008/config | jq .

{

"ttl": 30,

"loop_wait": 10,

"retry_timeout": 10,

"maximum_lag_on_failover": 1048576,

"postgresql": {

"use_pg_rewind": true,

"use_slots": true,

"parameters": {

"wal_level": "replica",

"hot_standby": "on",

"max_wal_senders": 10,

"max_replication_slots": 10,

"wal_log_hints": "on",

"archive_mode": "on",

"archive_timeout": "600s",

"archive_command": "cp -f %p /home/postgres/archived/%f"

"recovery_conf": {

"restore_command": "cp /home/postgres/archived/%f %p"

}

After removing the standby_cluster tag, you can verify its status. Node 172.19.0.5 has been promoted to a new leader. Once the leader is switched to the DR site, you need to point the application to the DR leader node.

[root@node3 /]# patronictl -c /etc/patroni/cluster12-0.yml list
+ Cluster: cluster12 (7509880523941320635) -----------+----+-----------+
| Member           | Host       | Role    | State     | TL | Lag in MB |
+------------------+------------+---------+-----------+----+-----------+
| cluster12-0      | 172.19.0.5 | Leader  | running   |  2 |           |
| cluster1217542-1 | 172.19.0.6 | Replica | streaming |  2 |         0 |
+------------------+------------+---------+-----------+----+-----------+

[root@node3 /]# patronictl -c /etc/patroni/cluster12-0.yml list

+ Cluster: cluster12 (7509880523941320635) -----------+----+-----------+

+------------------+------------+---------+-----------+----+-----------+

| cluster1217542-1 | 172.19.0.6 | Replica | streaming | 2 | 0 |

+------------------+------------+---------+-----------+----+-----------+

I hope this blog helps you grasp the DC and DC switchover concept using Patroni.

Our PostgreSQL training course, tailored for eager learners, is an instructor-led program that covers all the abovementioned topics and includes hands-on labs and expert guidance. Find more details and register at percona.com/training.

MySQL 5.7
Support

Compare Percona to Leading Database Solutions

Software
Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How to Perform a Disaster Recovery Switchover with Patroni for PostgreSQL

Prerequisites:

Process:

The blog post has been divided into three sections.

Patroni DC-DR setup using anydbver

Demotion of primary cluster

Promotion of standby cluster

Further reading

Related Blog Articles

RECOMMENDED ARTICLES

A Tale of Two Databases: No-Op Updates in PostgreSQL and MySQL

Distributing Data in a Redis/Valkey Cluster: Slots, Hash Tags, and Hot Spots

PostgreSQL OIDC Authentication with pg_oidc_validator

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How to Perform a Disaster Recovery Switchover with Patroni for PostgreSQL

Prerequisites:

Process:

The blog post has been divided into three sections.

Patroni DC-DR setup using anydbver

Demotion of primary cluster

Promotion of standby cluster

Further reading

Share This Post!

Stay up to date with the Percona Blog

Related Blog Articles

RECOMMENDED ARTICLES

A Tale of Two Databases: No-Op Updates in PostgreSQL and MySQL

Distributing Data in a Redis/Valkey Cluster: Slots, Hash Tags, and Hot Spots

PostgreSQL OIDC Authentication with pg_oidc_validator

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7
Support

Software
Downloads