PostgreSQL logical replication adoption is becoming more popular as significant advances continue to expand its range of capabilities.
While quite a few blogs have described features, there seems to be a lack of simple and straightforward advice on restoring stalled replication.
This blog demonstrates an extremely powerful approach to resolving replication problems using the Log Sequence Number, LSN.
For the purposes of this blog we’ll confine ourselves to INSERT operations that fail to logically replicate from the PUBLICATION to the SUBSCRIBED table. The technique described herein demonstrates the manipulation of these records by “skipping” them via function calls using the LSN. Once understood, this method can be used for more complex scenarios.
Stalled logical replication can occur under any number of circumstances:
In a manner of speaking, a logical replication stall is basically the poster child for Murphy’s Law.
The following SQL statements can detect logical decoding failures. Keep in mind that most modern PostgreSQL monitoring solutions already use one or more of these metrics:
PUBLICATION Side
select * from pg_get_replication_slots();
select * from pg_publication_tables;
select * from pg_stat_activity;
select * from pg_stat_replication;
select * from pg_publication;
select * from pg_publication_tables;
SUBSCRIPTION Side
select * from pg_subscription;
select * from pg_stat_subscription;
Recognizing the significance of the replication slot and its current state is crucial:
A single, inconsistent record in just one table is sufficient to stall replication for an entire collection of subscribed tables.
The LSN points to specific locations within the WAL stream. This cluster-wide identity plays a fundamental role in replication, recovery, and data consistency by tracking all DML and DDL operations over the database’s lifetime.
LSNs are used in logical replication by the subscriber by tracking how far they have processed the WAL stream. When properly synchronized, the LSN guarantees that all changes to the PUBLICATION are consistent with the PUBLICATION
In the case of stalled replication, Postgres functions can use LSNs as arguments in order to restore logical replication.
By adding a constraint to the subscribed table, a replication stall is caused when inserting multiple records in the published table with the same value that violates the unique constraint in the subscribed table.
In this example, resolution is achieved by “skipping” the conflicted record(s).
PG1: PUBLICATION, table t1
PG2: SUBSCRIPTION, table t1 (a UNIQUE constraint is added on column “c2”)
PG1
PG2

PG1: Add conflicting records
PG2: logical replication is stalled due to UNIQUE constraint on column “c2”
PG2: Conflicting LSN flagged in postgres logs
![]()
The function pg_logical_slot_peek_binary_changes is used to query the state of replication.
Function pg_replication_slot_advance advances the current position of the replication slot after the oldest blocking INSERT, permitting replication to resume.
PG1: Slot activity is FALSE in order for this query to work

PG1: Executing this query identifies the blocking records both by transaction ID and LSN.



Attention: Note that in a typical production environment, unlike this simple scenario, both logically correct and stalling records ordered by LSN will result in mixed-order records. A nuanced approach to skipping records will, therefore, be required.
Querying table t1 in both publication and subscription demonstrates that replication has been restored, but the problem records, while still present in PG1, have not been replicated to PG2.
