Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

How pgBackRest is Addressing Slow PostgreSQL WAL Archiving Using Asynchronous Feature

December 1, 2020

Author

Share this Post:

In one of my previous blog posts, Why PostgreSQL WAL Archival is Slow, I tried to explain three of the major design limitations of PostgreSQL’s WAL archiver which is not so great for a database with high WAL generation. In this post, I want to discuss how pgBackRest is addressing one of the problems (cause number two in the previous post) using its Asynchronous WAL archiving feature.

Let me explain the problem in a bit more detail here.

The PostgreSQL archiver process has an internal function pgarch_archiveXlog() which calls the system() system call which executes the archive_command which will be a shell command/script.

So the execution is one-after-another without any parallelism or batching. Each execution starts up a separate process.

Let’s look at a simple WAL archive using Unix/Linux cp command:

postgres 2639 1551 0 08:49 ? 00:00:00 postgres: stampede: archiver archiving 0000000700000000000000AC
postgres 2954 2639 1 09:37 ? 00:00:00 cp -f pg_wal/0000000700000000000000AC /home/postgres/archived/0000000700000000000000AC

1 2	postgres 2639 1551 0 08:49 ? 00:00:00 postgres: stampede: archiver archiving 0000000700000000000000AC postgres 2954 2639 1 09:37 ? 00:00:00 cp -f pg_wal/0000000700000000000000AC /home/postgres/archived/0000000700000000000000AC

Archiver process (PID 2639) started a cp process (PID 2954). Then it starts another process for the next WAL segment (PID 2957).

postgres 2639 1551 0 08:49 ? 00:00:00 postgres: stampede: archiver archiving 0000000700000000000000AD
postgres 2957 2639 0 09:37 ? 00:00:00 cp -f pg_wal/0000000700000000000000AD /home/postgres/archived/0000000700000000000000AD

1 2	postgres 2639 1551 0 08:49 ? 00:00:00 postgres: stampede: archiver archiving 0000000700000000000000AD postgres 2957 2639 0 09:37 ? 00:00:00 cp -f pg_wal/0000000700000000000000AD /home/postgres/archived/0000000700000000000000AD

Then another one for the next WAL segment:

postgres 2639 1551 0 08:49 ? 00:00:00 postgres: stampede: archiver archiving 0000000700000000000000AE
postgres 2960 2639 0 09:38 ? 00:00:00 cp -f pg_wal/0000000700000000000000AE /home/postgres/archived/0000000700000000000000AE

1 2	postgres 2639 1551 0 08:49 ? 00:00:00 postgres: stampede: archiver archiving 0000000700000000000000AE postgres 2960 2639 0 09:38 ? 00:00:00 cp -f pg_wal/0000000700000000000000AE /home/postgres/archived/0000000700000000000000AE

We can argue that starting up a cp command can be light.

But what if we need to use scp, where the ssh key exchanges and authentication negotiations need to happen? The startup time can easily go higher than the time for transferring a 16MB file. And what if we want to take the WAL to Cloud storage like Amazon S3 or Azure Storage? The startup time/cost can go much higher than just transferring the small file.

Typically Database Administrator’s worry is not about the overhead. but the WAL archiving falling behind the WAL generation which intern affects the space utilization, availability of the database, and reliability of backups, point-in-time recoverability, etc.

Even a single-threaded pgbench load can show us the limitations of WAL archiving and how things can fall apart.

Here is a sample test result of direct WAL push to S3 bucket with archive_command=/usr/bin/aws s3 cp %p s3://pg-jobin

postgres=# select pg_walfile_name(pg_current_wal_lsn()),last_archived_wal,last_failed_wal,
('x'||substring(pg_walfile_name(pg_current_wal_lsn()),9,8))::bit(32)::int*256 +
('x'||substring(pg_walfile_name(pg_current_wal_lsn()),17))::bit(32)::int -
('x'||substring(last_archived_wal,9,8))::bit(32)::int*256 -
('x'||substring(last_archived_wal,17))::bit(32)::int
as diff from pg_stat_archiver;
pg_walfile_name | last_archived_wal | last_failed_wal | diff
--------------------------+--------------------------+-----------------+------
000000010000002F0000003F | 000000010000002E00000029 | | 278
(1 row)

postgres=# select pg_walfile_name(pg_current_wal_lsn()),last_archived_wal,last_failed_wal,

('x'||substring(pg_walfile_name(pg_current_wal_lsn()),9,8))::bit(32)::int*256 +

('x'||substring(pg_walfile_name(pg_current_wal_lsn()),17))::bit(32)::int -

('x'||substring(last_archived_wal,9,8))::bit(32)::int*256 -

('x'||substring(last_archived_wal,17))::bit(32)::int

as diff from pg_stat_archiver;

pg_walfile_name | last_archived_wal | last_failed_wal | diff

--------------------------+--------------------------+-----------------+------

000000010000002F0000003F | 000000010000002E00000029 | | 278

(1 row)

In a matter of a couple of minutes, the archiver started lagging by 278 WAL segments.

With pgBackRest, the compression of WAL segments does help to reduce the impact. But still, the gap is significant.

pg_walfile_name | last_archived_wal | last_failed_wal | diff
--------------------------+--------------------------+-----------------+------
000000010000002D000000C6 | 000000010000002D00000020 | | 166

pg_walfile_name | last_archived_wal | last_failed_wal | diff

--------------------------+--------------------------+-----------------+------

000000010000002D000000C6 | 000000010000002D00000020 | | 166

** The archive_command used in this case is pgbackrest --stanza=pg0app archive-push %p

If the cloud bucket access is logged/audited, it can reveal the activities by WAL archive push.

In a typical test environment, the best case for WAL push to S3 was taking around 500 milliseconds for each WAL segment files. (Sometimes, I saw slower performance which takes up to 1400 ms). So effectively, one or two WAL segments can be archived per second. But this won’t be sufficient for a very high transaction system.

pgBackRest and Asynchronous WAL Archive

pgBackRest addresses the above-mentioned limitation by converting the archive_command which is inherently a sequential and synchronous operation into a parallel and asynchronous operation. Additionally, it is able to reuse the connection.

The asynchronous operation works using a spool directory for exchanging the status. The spool-path must be configured if asynchronous archiving enabled. Spool-path is used for storing data for the asynchronous archive-push and archive-get command. The asynchronous archive-push command writes acknowledgments into the spool path when it has successfully stored WAL in the archive (and errors on failure) so the foreground process can quickly notify PostgreSQL.

Parameters for the Asynchronous and Parallel Operation

archive-async

This is the parameter to enable the asynchronous archive-push and archive-get commands. By default, it is disabled (n). Setting this like archive-async=y enables the asynchronous archive mode.

archive-push-queue-max

This is a protection mechanism. If the archive process becomes slow, too many WAL segments can get accumulated on the system. This can lead to filling up the disk and database outage. DBAs are always looking for a solution that can protect their database and there are a lot of visits to blog posts like PostgreSQL WAL Retention and Clean Up: pg_archivecleanup for the same reason.

This parameter allows us to specify the maximum size of the PostgreSQL archive to be queued up. If this maximum limit is reached, pgBackRest will just drop the WALs and report back to PostgreSQL that the archiving was successful. Cheating! But, it saves the database from a difficult situation. Obviously, the point-in-time recovery won’t be possible as we are losing WAL segments. So, a fresh full backup is required if this happens.

archive-get-queue-max

This parameter is not important for the WAL archive push, but for restoring (get). It can use the spool-path location for speeding up the restore of WAL to PostgreSQL.

process-max

This parameter specifies the parallelism and these many parallel processes will be executing. Each process will perform compression and transfer to make the command run faster. This parameter can be specified for each of the backup commands as such:

[global:archive-push]
process-max=4

[global:archive-get]
process-max=2

[global:archive-push]

process-max=4

[global:archive-get]

process-max=2

Configuration for Asynchronous WAL Archive

pgBackRest’s backup configuration for asynchronous and parallel backup is just a matter of adding the above-mentioned parameter values to the configuration file. Here is a sample configuration file.

My test configuration file finally looks as follows:

[global]
repo1-path=/home/jobin.augustine/bkup/pgbackrest
repo1-retention-full=5
repo1-s3-bucket=pg-jobin
repo1-s3-endpoint=s3.us-east-1.amazonaws.com
repo1-s3-key=AKIAxxxxxxxxxxxxx
repo1-s3-key-secret=D+yDDxxxxxxxxxxxxxxxxxxxxxxxxxxxx
repo1-s3-region=us-east-1
repo1-s3-verify-ssl=n
repo1-type=s3
archive-async=y
spool-path=/var/spool/pgbackrest

[global:archive-get]
process-max=2

[global:archive-push]
process-max=4

[mypg]
pg1-path=/home/jobin.augustine/pg12pri
pg1-port=5432

[global]

repo1-path=/home/jobin.augustine/bkup/pgbackrest

repo1-retention-full=5

repo1-s3-bucket=pg-jobin

repo1-s3-endpoint=s3.us-east-1.amazonaws.com

repo1-s3-key=AKIAxxxxxxxxxxxxx

repo1-s3-key-secret=D+yDDxxxxxxxxxxxxxxxxxxxxxxxxxxxx

repo1-s3-region=us-east-1

repo1-s3-verify-ssl=n

repo1-type=s3

archive-async=y

spool-path=/var/spool/pgbackrest

[global:archive-get]

process-max=2

[global:archive-push]

process-max=4

[mypg]

pg1-path=/home/jobin.augustine/pg12pri

pg1-port=5432

As explained by archive-async, spool-path, process-max are specified.

On testing with the same load which originally created a gap up to 278, I couldn’t see any considerable gap anymore. The maximum I could detect is a gap of 10, and the gap got closed almost instantaneously once the load decreased. Many times, the time is taken for WAL push was just 2 milliseconds at the PostgreSQL level.

INFO: archive-push command end: completed successfully (2ms)
INFO: archive-push command end: completed successfully (1ms)
INFO: archive-push command end: completed successfully (2ms)
INFO: archive-push command end: completed successfully (2ms)
INFO: archive-push command end: completed successfully (705ms)
INFO: archive-push command end: completed successfully (1ms)
INFO: archive-push command end: completed successfully (2ms)
INFO: archive-push command end: completed successfully (2ms)

INFO: archive-push command end: completed successfully (2ms)

INFO: archive-push command end: completed successfully (1ms)

INFO: archive-push command end: completed successfully (2ms)

INFO: archive-push command end: completed successfully (705ms)

INFO: archive-push command end: completed successfully (1ms)

INFO: archive-push command end: completed successfully (2ms)

This is because the WAL is already pushed by pgBackRest processes and it just acknowledges back to PostgreSQL about the success.

How it Works

If we look at the backend process, the work of pushing the WAL segment is actually performed by another 4+1 process. See the PIDs 8930,8931,8932,8934 and 8935:

postgres  8928 21747  0 12:03 ?        00:00:00 pgbackrest --stanza=mypg --log-level-stderr=info archive-push pg_wal/000000010000003200000075
postgres  8930     1  0 12:03 ?        00:00:00 pgbackrest --log-level-console=off --log-level-stderr=off --stanza=mypg archive-push:async /home/jobin.augustine/pg12pri/pg_wal
postgres  8931  8930  0 12:03 ?        00:00:00 pgbackrest --host-id=1 --log-level-console=off --log-level-file=off --log-level-stderr=error --process=1 --remote-type=repo --stanza=mypg archive-push:local
postgres  8932  8930  0 12:03 ?        00:00:00 pgbackrest --host-id=1 --log-level-console=off --log-level-file=off --log-level-stderr=error --process=2 --remote-type=repo --stanza=mypg archive-push:local
postgres  8934  8930  0 12:03 ?        00:00:00 pgbackrest --host-id=1 --log-level-console=off --log-level-file=off --log-level-stderr=error --process=3 --remote-type=repo --stanza=mypg archive-push:local
postgres  8935  8930  0 12:03 ?        00:00:00 pgbackrest --host-id=1 --log-level-console=off --log-level-file=off --log-level-stderr=error --process=4 --remote-type=repo --stanza=mypg archive-push:local

postgres 8928 21747 0 12:03 ? 00:00:00 pgbackrest --stanza=mypg --log-level-stderr=info archive-push pg_wal/000000010000003200000075

postgres 8930 1 0 12:03 ? 00:00:00 pgbackrest --log-level-console=off --log-level-stderr=off --stanza=mypg archive-push:async /home/jobin.augustine/pg12pri/pg_wal

postgres 8931 8930 0 12:03 ? 00:00:00 pgbackrest --host-id=1 --log-level-console=off --log-level-file=off --log-level-stderr=error --process=1 --remote-type=repo --stanza=mypg archive-push:local

postgres 8932 8930 0 12:03 ? 00:00:00 pgbackrest --host-id=1 --log-level-console=off --log-level-file=off --log-level-stderr=error --process=2 --remote-type=repo --stanza=mypg archive-push:local

postgres 8934 8930 0 12:03 ? 00:00:00 pgbackrest --host-id=1 --log-level-console=off --log-level-file=off --log-level-stderr=error --process=3 --remote-type=repo --stanza=mypg archive-push:local

postgres 8935 8930 0 12:03 ? 00:00:00 pgbackrest --host-id=1 --log-level-console=off --log-level-file=off --log-level-stderr=error --process=4 --remote-type=repo --stanza=mypg archive-push:local

They are independent of the process (PID: 8928) created by the archive_command of PostgreSQL’s archiver Process (PID 21747). This process created by PostgreSQL mainly works as an intermediary.

pgBackRest creates directories and files in this spool directory in spool-path for processing and coordination with asynchronous jobs.

$ ls -R
.:
archive

./archive:
mypg

./archive/mypg:
out

./archive/mypg/out:
000000010000000000000040.ok  000000010000000000000043.ok  000000010000000000000046.ok
000000010000000000000041.ok  000000010000000000000044.ok  000000010000000000000047.ok
000000010000000000000042.ok  000000010000000000000045.ok  000000010000000000000048.ok

$ ls -R

References

1. https://pgbackrest.org/configuration.html

1. https://github.com/pgbackrest/pgbackrest (source code)

1. https://pgstef.github.io/2019/03/26/pgbackrest_archiving_tricks.html

Our white paper “Why Choose PostgreSQL?” looks at the features and benefits of PostgreSQL and presents some practical usage examples. We also examine how PostgreSQL can be useful for companies looking to migrate from Oracle.

Download PDF

0 0 votes

Article Rating

4 Comments

Oldest

Newest Most Voted

David Steele

5 years ago

This an excellent article — clearly written and very accurate.

I do have one small correction to suggest: An .ok file indicates that it is already taken for processing should really be An .ok file indicates that it is already *been* processed. Jobs do not need to be “claimed” because the async process holds a lock the entire time it is running so another async process won’t start and try to run the same jobs.

Author

Jobin Augustine

5 years ago

Reply to David Steele

Thank you @David Steele,
Its great to see your valuable comment on my blog post. Edited as per your suggestion.
I hope that makes it more clear for a reader.
Thank you.

Markus

5 years ago

Cool. 2 Questions.

Is there any signal (log entry, …) when pgbackrest deletes the archive-push-queue when it reached archive-push-queue-max?

Would a single pgbackrest server be fast enough to serve around 200 postgres instances that way?

Author

Jobin Augustine

5 years ago

Hi Markus,
Yes, pgBackrest spits out warning message so that there will be log entry in PostgreSQL log when pgbackrest deletes the archive-push-queue when it reached archive-push-queue-max

Regarding,
>>Would a single pgbackrest server be fast enough to serve around 200 postgres instances that way?
A dedicated pgBackRest backup server will be constrained by the resources available to it like CPU, IO and Network bandwidths. But quantifying in terms of number of PostgreSQL instances will be difficult because environments vary a lot in terms of it size and transaction volume.