Unlike the standard multi-node Postgres replication cluster, when managed by Patroni, all failovers are automatically executed. However, this is not the case when dealing with inter-datacentre failovers when for instance a standby datacentre must take over from a failed primary.
The following describes the mechanisms required to perform such a procedure when the case arises.
Herein two mechanism are described, both of which administers the clusters’ DCS:
This is the base command permitting changes to the DCs.
patronictl -c /etc/patroni/config.yml edit-config --help
Usage: patronictl edit-config [OPTIONS] [CLUSTER_NAME]
Edit cluster configuration
Options:
--group INTEGER Citus group
-q, --quiet Do not show changes
-s, --set TEXT Set specific configuration value. Can be specified multiple
times
-p, --pg TEXT Set specific PostgreSQL parameter value. Shorthand for -s
postgresql.parameters. Can be specified multiple times
--apply TEXT Apply configuration from file. Use - for stdin.
--replace TEXT Apply configuration from file, replacing existing
configuration. Use - for stdin.
--force Do not ask for confirmation at any point
--help Show this message and exit.
Patroni’s REST API is integral to Patroni’s leader election process for operations like failovers, switchovers, reinitializations, restarts, and reloads. It can also be employed by load balancers such as HAProxy for HTTP health checks as well for monitoring too.
Typically the CLI curl and jq are used when performing operations with the REST API.
For the purposes of documentation, assume the following: Two datacentres comprising a 3 node replication cluster in each datacentre. Frankfurt is currently configured as the PRIMARY and AMSTERDAM the STANDBY datacentre respectively.
+ Cluster: Frankfurt (7546601820334116441) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+---------+----------------+---------+-----------+----+-----------+
| fradb1v | 10.108.134.168 | Leader | running | 2 | |
| fradb2v | 10.108.134.113 | Replica | streaming | 2 | 0 |
| fradb3v | 10.108.134.150 | Replica | streaming | 2 | 0 |
+---------+----------------+---------+-----------+----+-----------+
+ Cluster: Amsterdam (7546601820334116441) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+---------+----------------+----------------+-----------+----+-----------+
| amsdb1v | 10.108.134.211 | Standby Leader | streaming | 2 | |
| amsdb2v | 10.108.134.235 | Replica | streaming | 2 | 0 |
| amsdb3v | 10.108.134.29 | Replica | streaming | 2 | 0 |
+---------+----------------+----------------+-----------+----+-----------+
Two steps are required to promote the standby leader:
The only difference between these two variants is that one is interactive and the other is not.
Variation A
# interactive; asks before commiting
patronictl -c /etc/patroni/config.yml edit-config --set standby_cluster=null
Variation B
# forced execution
patronictl -c /etc/patroni/config.yml edit-config --set standby_cluster=null --force
This could be considered an optional activity where a replication slot is explicitly created with the understanding that a new Standby datacentre is to be provisioned i.e. the failed primary datacentre.
patronictl -c /etc/patroni/config.yml edit-config
--set slots.standby_cluster.type=physical --force
Attention: This and the previous command can be combined as a single command. For only documentation purposes were they split into two separate ones.
The benefit of this method is that it can be executed from any host that can access any of the cluster’s port 8008 which is managed by Patroni.
The host is promoted, recovery is completed and becomes read-write.
curl -s -XPATCH -d '{"standby_cluster":null}' http://localhost:8008/config | jq .As per the previously demonstrated example, a new slot is created on the new Leader for the purposes of replicating to a Standby Leader.
curl -s -XPATCH -d
'{
"slots":{"standby_cluster":{"type": "physical"}}
}' http://localhost:8008/config | jq .
The following instructions are somewhat similar to the previous ones. Execute the following in Patroni cluster “Frankfurt”.
patronictl -c /etc/patroni/config.yml edit-config
--set standby_cluster.host='amsdb1v,amsdb2v,amsdb3v'
--set standby_cluster.port=5432
--set standby_cluster.primary_slot_name=standby_cluster
--set standby_cluster.create_replica_methods=basebackup
--force
Unless there’s an acute reason requiring the slot to remain on the newly promoted Leader it should be removed.
patronictl -c /etc/patroni/config.yml edit-config --set slots=null --force
Attention: attempting to remove the slot using the postgres function called pg_drop_replication_slot will fail because patroni will simply put it back.
curl -s -XPATCH -d
'{
"standby_cluster": {
"host": "amsdb1v,amsdb2v,amsdb3v",
"port": 5432,
"primary_slot_name": "standby_cluster",
"create_replica_methods": [
"basebackup"
]
}
}' http://localhost:8008/config | jq .
curl -s -XPATCH -d '{"slots":null}' http://localhost:8008/config | jq .Finally: here’s a previous blog on Patroni disaster recovery which also explains the basic thinking behind the technology.