A Nightmare no DBA wants

One day, I had a nightmare, the kind every DBA fears.
My manager called me and said,
“Hey… Mr. Clever. We don’t need you on the team anymore.”

I panicked.
“What did I do wrong?!”

He replied,
“You tell me. Think about your ‘great’ work yesterday.”

Trying to defend myself, I said proudly:
“Sir, one node was down. PostgreSQL service was inactive, so I started it… and the node came back up!
Pretty smart, right?”

He sighed, paused for dramatic effect, and said,
“Get well soon.”
Then he hung up.

I woke up instantly, my heart racing.
Not because of the boss,
But because the real horror hit me:

We are running a Patroni cluster…
…and I started PostgreSQL manually in my dream.
That’s not “clever work.”
That’s a database-level crime, punishable by
“Why-is-my-cluster-broken-now?”
and
“Why-are-my-replicas-on-a-different-timeline?”
and possibly
“Why-is-ETCD-looking-at-me-like-that?”

Thankfully, it was only a nightmare.

This blog aims to help prevent anyone from living that nightmare. It clearly explains what happened, why it’s dangerous, and how to avoid it.

Understanding Patroni’s Role in the Cluster :

Patroni is responsible for managing :

  • Startup and shutdown of PostgreSQL
  • Replication configuration
  • Leader election and failover/fencing decisions
  • WAL (Write-Ahead Logging) timeline coordination
  • Recovery decisions and cluster consistency
  • Coordinating cluster state with the DCS (e.g., ETCD)
  • Preventing divergence and corruption

What Actually Happens When PostgreSQL Starts Outside Patroni 

In a Patroni-based high-availability (HA) cluster, it is not recommended to manually start PostgreSQL in a running Patroni cluster using either the postgres service or pg_ctl.

If you start PostgreSQL manually while Patroni is down:

When PostgreSQL is manually started while Patroni is down, the node is no longer under Patroni’s control and begins operating as a standalone instance.

However, this does not immediately create a conflicting timeline.

  • A conflicting (divergent) timeline only occurs if a failover or switchover happens while the node is standalone.
  • If the manually started node was previously a replica, and no failover occurred, it may still continue replicating from the primary but it is invisible to Patroni and cannot participate in coordinated HA.
  • If the manually started node was the former leader, and Patroni elects a new leader elsewhere, only then does a true timeline divergence occur.
  • Standalone replica + no failover → replication may continue, no new timeline
  • Standalone leader + failover occurstimeline divergence, requiring pg_rewind or full reinit

Takeaway: A manually started primary is the most dangerous scenario; it can immediately lead to a divergent timeline if a new leader is elected. A manually started replica is still dangerous, but divergence occurs only if a leader change happens while it runs standalone.

Before diving deeper, I’d like to share a test with you to provide a better understanding of potential outcomes.

Detailed Test Case:

Capture Baseline State

This confirms cluster health: replicas have received and replayed WAL up to the leader’s current LSN and all nodes are on the same timeline.

Crash Test & Failure Simulation:
Now, simulating  the crash on node1:
1] Kill Patroni on node1 (the current leader)
2] Start PostgreSQL manually:

3] Node1 is now a standalone primary on timeline 4.
4] Meanwhile, Patroni promotes node3 → timeline 5.
   
Patroni now shows the changed cluster state:

From the output shown above, we can see that a timeline divergence has occurred, and node1 is missing because its services are down. Rather than starting the Patroni services, I directly initiated the PostgreSQL service. The output also indicates that node3 is now the leader and node2 is functioning as the replica, both operating on the same timeline. Additionally, it’s clear that node1 is no longer part of the Patroni cluster.

Divergence Demonstration

Writes on standalone node1

Now perform transactions on the standalone node1:

Meanwhile, on the new leader (node3): 

Node2 replicates Node3’s changes normally.

Patroni listing on node3:

Rejoining node1 (start Patroni)

When Patroni is restarted on node1, it detects the divergence and runs pg_rewind to synchronize the datadir with the new leader:

Node1 is rewound and joins as a replica.

Now check the table created earlier on node1:

The table created on node3 (crash_new_leader) is now present on node1:

What happened: Let me explain.
Patroni detected:

  • Node1 was on timeline 4
  • The cluster progressed to timeline 5 (via promotion of node3)
  • Divergence occurred

Patroni invoked pg_rewind:

As a result:

  • Node1’s data directory was rewritten to match the new leader (node3)
  • The table crash_patroni created while node1 was standalone was lost (data loss)
  • Node1 successfully rejoined as a replica on timeline 5
  • Writes performed during standalone mode were erased

Pg_rewind: prerequisites and failure modes

pg_rewind is the safest recovery method to repair a diverged node because it:

  • Resets the diverging timeline
  • Reconstructs the datadir to match the new leader
  • Rejoins the node without requiring a full base backup

Important prerequisites:

  • wal_log_hints = on OR data checksums must be enabled.
    Patroni typically ensures wal_log_hints = on for you, but always verify the setting, especially on manually built clusters or upgraded nodes.

If pg_rewind is:

  • Disabled
  • Misconfigured
  • Fails due to missing WAL segments or catalog mismatch
  • Crashes unexpectedly

then reinitialization (a full base backup + restore) is the only recovery option.

Key Risks and Considerations

1. Patroni must be the sole controller

        Manual PostgreSQL start = Rogue instance.

2. Standalone writes WILL be lost

      Every operation performed in standalone mode disappears.

3. Split-brain–like symptoms

       Even if not a true split-brain, symptoms include:

  • Diverged timelines
  • Orphaned tables
  • Inconsistent LSNs
  • Confusing recovery decisions
4. Operational confusion

        Different nodes may disagree on whether a table exists.

Precautions practical steps

Mask the OS service
Prevent accidental systemctl start postgresql-XX.service by masking the unit:

Monitor both Patroni and PostgreSQL services
Create alerts for:

  • Unexpected postgresql-XX.service = active when patroni. service is down
  • Missing Patroni heartbeat
  • Timeline divergence detection
  • Replicas not receiving WAL
  • Unexpected PID or process-name mismatches
  • pg_rewind events in Patroni logs

Validate pg_rewind prerequisites
Confirm wal_log_hints = on or data checksums are enabled before taking cluster nodes into production.

Check ETCD compatibility
Ensure your ETCD major version is compatible with your Patroni release; older distributions may contain legacy ETCD packages.

Use port-based health checks
Use netstat checks for port 5432 rather than process-name matching across heterogeneous OS families.

Backup when in doubt
If you suspect standalone writes occurred before restarting Patroni, take a filesystem-level backup of the datadir if possible — this can help forensic investigation.

                                                 

                                                              Early detection is essential to prevent data loss.

Summary

Starting PostgreSQL manually in a Patroni-managed HA cluster is not a harmless mistake — it is a data-consistency risk. It can cause:

  • Timeline divergence
  • Split-brain–like behavior
  • Lost transactions and DDL
  • Cluster confusion and administrative overhead
  • Forced pg_rewind or full reinitialization

Follow the precautions above, ensure pg_rewind prerequisites, monitor both Patroni and PostgreSQL, and always start PostgreSQL via Patroni. Do that, and you’ll greatly reduce the risk of waking up to a DBA nightmare.

 

If you’re using a Patroni cluster, always start PostgreSQL through the Patroni service. Your database will appreciate it. Let’s keep everything running smoothly!

 

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments