Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Demonstrating pg_rewind Using Linux Containers in PostgreSQL

April 22, 2021

Author

Share this Post:

These scripts demonstrate the command line interface, CLI, “pg_rewind” re-provisioning a formerly deprecated PRIMARY host, pg1, into a viable STANDBY. Although the scripts have been written expressly for a custom-made set of two Linux containers using CENTOS7, it has been written in a form that is hopefully clear and concise enough that they can be of use to you and can be customized to suit your particular needs.

There are five scripts:

- 01.setup_lxc_environment.sh: Create two Linux containers from a custom made image

- 02.setup_replication.sh: Setup a 2 node async replication cluster

- 03.switchover.sh: Invoke a switchover i.e. stop pg1 and promote pg2

- 04.reprovision_pg1.sh: Here’s where the magic takes place

- 05.validate.sh: A simple test validating replication works correctly

Script 1: 01.setup_lxc_environment.sh

This script starts by destroying preexisting containers and rebuilding them from an already preexisting image that was custom-made, with postgres already installed. Although not necessary, the IP addresses for each container were hard-coded. Both containers have the postgres 12 binaries installed, with an already initialized data cluster.

#!/bin/bash
set -e

echo "=== stage 1 ==="

for u in pg1 pg2
do
    set +e
    lxc stop $u 2>/dev/null
    lxc rm $u
    lxc launch template-CENTOS7-pg $u
    set -e
done

echo "=== stage 2 ==="

for u in pg1 pg2
do
    lxc exec $u bash <<_eof_
        hostnamectl set-hostname $u
        rm -f /var/lib/pgsql/12/data/log/*
_eof_
done

for u in pg1 pg2
do
    lxc stop $u
done

echo "=== stage 3 ==="

for u in pg1 pg2
do
    lxc network attach lxdbr0 $u eth0
done

echo "=== stage 4 ==="

echo "sleeping ..." && sleep 2s
lxc config device set pg1 eth0 ipv4.address 10.231.38.111
lxc config device set pg2 eth0 ipv4.address 10.231.38.112

for u in pg1 pg2
do
    lxc start $u
done

echo "sleeping ..." && sleep 8s
lxc ls -cn4 pg

echo "DONE"

#!/bin/bash

set -e

echo "=== stage 1 ==="

for u in pg1 pg2

set +e

lxc stop $u 2>/dev/null

lxc rm $u

lxc launch template-CENTOS7-pg $u

set -e

done

echo "=== stage 2 ==="

for u in pg1 pg2

lxc exec $u bash <<_eof_

hostnamectl set-hostname $u

rm -f /var/lib/pgsql/12/data/log/*

_eof_

done

for u in pg1 pg2

lxc stop $u

done

echo "=== stage 3 ==="

for u in pg1 pg2

lxc network attach lxdbr0 $u eth0

done

echo "=== stage 4 ==="

echo "sleeping ..." && sleep 2s

lxc config device set pg1 eth0 ipv4.address 10.231.38.111

lxc config device set pg2 eth0 ipv4.address 10.231.38.112

for u in pg1 pg2

lxc start $u

done

echo "sleeping ..." && sleep 8s

lxc ls -cn4 pg

echo "DONE"

Script 2: 02.setup_replication.sh

The two postgres configuration files, pg_hba.conf and postgresql.conf, are configured for remote access with verbose logging providing plenty of information and is set for replication service.

While all the runtime parameters are important, pay special attention to these three runtime parameters:

- hot_standby_feedback: for replication service

- wal_keep_segments: you want to hold onto enough wal segments for the standby, just in case…

- wal_log_hints: pg_rewind requires this parameter otherwise it won’t work

The CLI pg_receivewal is useful in a script because it can test if a slot has already been created, thus pg_basebackup simply looks for the slot and is not required to create one.

#!/bin/bash
set -e

echo "=== stage 1 ==="

lxc exec pg1 bash <<_eof_

su - postgres <<_eof1_

source /etc/profile.d/sh.local

echo "
# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            md5
host    all             all             0.0.0.0/0               md5
# IPv6 local connections:
host    all             all             ::1/128                 md5
host    all             all             ::0/0                   md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            md5
host    replication     all             0.0.0.0/0               md5
host    replication     all             ::1/128                 md5
host    replication     all             ::0/0                   md5
" > /var/lib/pgsql/12/data/pg_hba.conf

echo "
archive_command='/bin/true'
archive_mode='on'
autovacuum_freeze_max_age=200000000
autovacuum_multixact_freeze_max_age=400000000
deadlock_timeout='1s'
hot_standby_feedback='on'
idle_in_transaction_session_timeout=0
listen_addresses='*'
lock_timeout=0
log_autovacuum_min_duration=0
log_checkpoints='on'
log_connections='on'
log_disconnections='on'
log_duration='on'
log_filename='postgresql-%a.log'
logging_collector='on'
log_hostname='on'
log_line_prefix=' app=%a user=%u error=%e line=%l vtxid=%v txid=%x tstamp=%m [%p] '
log_lock_waits='on'
log_min_duration_statement=0
log_replication_commands='on'
log_rotation_size=0
log_statement='ddl'
log_temp_files=0
log_truncate_on_rotation='on'
max_standby_archive_delay='30s'
max_standby_streaming_delay='30s'
old_snapshot_threshold=-1
statement_timeout=0
track_activity_query_size=2048
track_commit_timestamp='on'
track_functions='pl'
track_io_timing='on'
vacuum_defer_cleanup_age=0
wal_keep_segments=60
wal_log_hints='on'
wal_receiver_timeout='60s'
wal_sender_timeout='60s'
" > /var/lib/pgsql/12/data/postgresql.auto.conf
_eof1_
systemctl start postgresql-12
_eof_

echo "=== stage 2 ==="

lxc exec pg2 bash <<_eof_
su - postgres <<_eof1_
    rm -rf /var/lib/pgsql/12/data/

    /usr/pgsql-12/bin/pg_receivewal 
        --slot=pg2 
        --create-slot 
        --if-not-exists 
        -d 'host=pg1 user=postgres password=postgres'

    /usr/pgsql-12/bin/pg_basebackup 
        -d 'host=pg1 user=postgres dbname=postgres' 
        -D /var/lib/pgsql/12/data/ 
        -l initial_backup_pg2 
        -c fast 
        -R -P 
        -S pg2

    rm -f /var/lib/pgsql/12/data/log/*.log
_eof1_
systemctl start postgresql-12 && sleep 2s
_eof_

#!/bin/bash

set -e

echo "=== stage 1 ==="

lxc exec pg1 bash <<_eof_

su - postgres <<_eof1_

source /etc/profile.d/sh.local

echo "

# TYPE DATABASE USER ADDRESS METHOD

# "local" is for Unix domain socket connections only

local all all trust

# IPv4 local connections:

host all all 127.0.0.1/32 md5

host all all 0.0.0.0/0 md5

# IPv6 local connections:

host all all ::1/128 md5

host all all ::0/0 md5

# Allow replication connections from localhost, by a user with the

# replication privilege.

local replication all trust

host replication all 127.0.0.1/32 md5

host replication all 0.0.0.0/0 md5

host replication all ::1/128 md5

host replication all ::0/0 md5

" > /var/lib/pgsql/12/data/pg_hba.conf

echo "

archive_command='/bin/true'

archive_mode='on'

autovacuum_freeze_max_age=200000000

autovacuum_multixact_freeze_max_age=400000000

deadlock_timeout='1s'

hot_standby_feedback='on'

idle_in_transaction_session_timeout=0

listen_addresses='*'

lock_timeout=0

log_autovacuum_min_duration=0

log_checkpoints='on'

log_connections='on'

log_disconnections='on'

log_duration='on'

log_filename='postgresql-%a.log'

logging_collector='on'

log_hostname='on'

log_line_prefix=' app=%a user=%u error=%e line=%l vtxid=%v txid=%x tstamp=%m [%p] '

log_lock_waits='on'

log_min_duration_statement=0

log_replication_commands='on'

log_rotation_size=0

log_statement='ddl'

log_temp_files=0

log_truncate_on_rotation='on'

max_standby_archive_delay='30s'

max_standby_streaming_delay='30s'

old_snapshot_threshold=-1

statement_timeout=0

track_activity_query_size=2048

track_commit_timestamp='on'

track_functions='pl'

track_io_timing='on'

vacuum_defer_cleanup_age=0

wal_keep_segments=60

wal_log_hints='on'

wal_receiver_timeout='60s'

wal_sender_timeout='60s'

" > /var/lib/pgsql/12/data/postgresql.auto.conf

_eof1_

systemctl start postgresql-12

_eof_

echo "=== stage 2 ==="

lxc exec pg2 bash <<_eof_

su - postgres <<_eof1_

rm -rf /var/lib/pgsql/12/data/

/usr/pgsql-12/bin/pg_receivewal

--slot=pg2

--create-slot

--if-not-exists

-d 'host=pg1 user=postgres password=postgres'

/usr/pgsql-12/bin/pg_basebackup

-d 'host=pg1 user=postgres dbname=postgres'

-D /var/lib/pgsql/12/data/

-l initial_backup_pg2

-c fast

-R -P

-S pg2

rm -f /var/lib/pgsql/12/data/log/*.log

_eof1_

systemctl start postgresql-12 && sleep 2s

_eof_

Script 03.switchover.sh

It’s more accurate to call this a switchover than a failover. Notice that a new WAL is generated on pg1 before it’s stopped.

Promoting the STANDBY, pg2, is extremely easy in postgres these days. Just log in as the superuser and execute the SQL “select pg_promote()”.

The final step to this script creates and populates a table with some records.

#!/bin/bash
set -e

echo "=== stage 1 ==="

lxc exec pg1 bash <<_eof_

psql 'host=pg1 user=postgres password=postgres' <<_eof1_
    checkpoint;
    select * from pg_walfile_name(pg_switch_wal());
_eof1_

systemctl stop postgresql-12 && sleep 2s
_eof_

echo "=== stage 2 ==="

lxc exec pg2 bash <<_eof_

psql 'host=pg2 user=postgres password=postgres' <<_eof1_
    select pg_promote();
    drop database if exists db01;
    create database db01;
   c 'host=pg2 user=postgres password=postgres dbname=db01'

    select *,'hello world'::text as comments into t1
        from (select * from generate_series(1,1E6))t;
_eof1_
_eof_

echo "Host pg1 is shutdown and host pg2 is promoted as a read-write node"
echo "DONE"

#!/bin/bash

set -e

echo "=== stage 1 ==="

lxc exec pg1 bash <<_eof_

psql 'host=pg1 user=postgres password=postgres' <<_eof1_

checkpoint;

select * from pg_walfile_name(pg_switch_wal());

_eof1_

systemctl stop postgresql-12 && sleep 2s

_eof_

echo "=== stage 2 ==="

lxc exec pg2 bash <<_eof_

psql 'host=pg2 user=postgres password=postgres' <<_eof1_

select pg_promote();

drop database if exists db01;

create database db01;

c 'host=pg2 user=postgres password=postgres dbname=db01'

select *,'hello world'::text as comments into t1

from (select * from generate_series(1,1E6))t;

_eof1_

_eof_

echo "Host pg1 is shutdown and host pg2 is promoted as a read-write node"

echo "DONE"

Script 04.reprovision_pg1.sh

And here’s where the magic takes place.

Just before rewinding pg1 begins, a slot is explicitly added to PRIMARY host pg2. Although in this particular instance it isn’t really needed to make the replication work, since runtime parameter “wal_keep_segments”, which no longer exists in postgres version 13, is configured to retain 60 WAL segments, more than enough WALS for this demonstration.

As soon as pg_rewind completes, host pg1 needs some final tuning preparing it for STANDBY service:

- Configuration file postgresql.auto.conf is updated

- Creating file standby.signal informs postgres that this host is meant to replicated from another one.

#!/bin/bash
set -e

lxc exec pg1 bash <<_eof_
/usr/pgsql-12/bin/pg_receivewal 
    --create-slot 
    --if-not-exists 
    --slot pg1 
    -d 'host=pg2 user=postgres password=postgres'

su - postgres <<_eof1_
/usr/pgsql-12/bin/pg_rewind 
    --target-pgdata /var/lib/pgsql/12/data 
    --source-server='user=postgres password=postgres host=pg2'

echo "
hot_standby = 'on'
primary_conninfo = 'user=postgres password=postgres host=pg2 port=5432'
primary_slot_name = 'pg1'
recovery_target_timeline = 'latest'
" >> /var/lib/pgsql/12/data/postgresql.auto.conf

touch /var/lib/pgsql/12/data/standby.signal
_eof1_
systemctl start postgresql-12 && sleep 2s
_eof_

echo "DONE: Host pg1 is now a viable READ-ONLY STANDBY to the PRIMARY host pg2"

#!/bin/bash

set -e

lxc exec pg1 bash <<_eof_

/usr/pgsql-12/bin/pg_receivewal

--create-slot

--if-not-exists

--slot pg1

-d 'host=pg2 user=postgres password=postgres'

su - postgres <<_eof1_

/usr/pgsql-12/bin/pg_rewind

--target-pgdata /var/lib/pgsql/12/data

--source-server='user=postgres password=postgres host=pg2'

echo "

hot_standby = 'on'

primary_conninfo = 'user=postgres password=postgres host=pg2 port=5432'

primary_slot_name = 'pg1'

recovery_target_timeline = 'latest'

" >> /var/lib/pgsql/12/data/postgresql.auto.conf

touch /var/lib/pgsql/12/data/standby.signal

_eof1_

systemctl start postgresql-12 && sleep 2s

_eof_

echo "DONE: Host pg1 is now a viable READ-ONLY STANDBY to the PRIMARY host pg2"

Script 05.validate.sh

Validation tests a number of conditions and includes:

- Is pg1 a read-only standby?

- Can we see the table that was created after pg2 was promoted?

- Do we see the database that was created after pg1 was re-provisioned proving that replication is active?

#!/bin/bash
set -e

lxc exec pg1 bash <<_eof_

psql 'host=pg2 user=postgres password=postgres' <<_eof1_
    qecho ---HOST PG2, READ-WRITE PRIMARY---
    drop database if exists db_test;
    create database db_test;
    select pg_is_in_recovery from pg_is_in_recovery(),pg_sleep(3);
    l
_eof1_

psql 'host=pg1 user=postgres password=postgres dbname=db01' <<_eof1_
    qecho ---HOST PG1, READ-ONLY STANDBY---
    select * from pg_is_in_recovery();
    select count(*) from t1;
    l
_eof1_
_eof_

echo "DONE"

#!/bin/bash

set -e

lxc exec pg1 bash <<_eof_

psql 'host=pg2 user=postgres password=postgres' <<_eof1_

qecho ---HOST PG2, READ-WRITE PRIMARY---

drop database if exists db_test;

create database db_test;

select pg_is_in_recovery from pg_is_in_recovery(),pg_sleep(3);

_eof1_

psql 'host=pg1 user=postgres password=postgres dbname=db01' <<_eof1_

qecho ---HOST PG1, READ-ONLY STANDBY---

select * from pg_is_in_recovery();

select count(*) from t1;

_eof1_

_eof_

echo "DONE"

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!