Transaction ID wraparound occurs when the VACUUM process cannot keep up with database activity and the PostgreSQL service is forced to shut down.
In more technical terms, transaction ID wraparound occurs when the semantics of Multi-Version Concurrency Control, or MVCC, fail and the number of unique transaction IDs reaches its maximum, which is about two billion.
This situation occurs when the VACUUM process, managed either by autovacuum workers or by user interaction through manual vacuuming, does not keep up with DML operations.
Transaction ID wraparound can be caused by a combination of one or more of the following circumstances:
Transaction ID wraparound can cause a spontaneous shutdown of the Postgres database server in order to protect the integrity of the data.
PostgreSQL tracks transactions using unique IDs. Every so often, that number approaches the upper limit that can be registered. For example, 200 million transactions is the default threshold used by autovacuum_freeze_max_age. If the number of unique transaction IDs reaches the maximum transaction limit, known as TXID wraparound, Postgres will force a shutdown in order to protect the data.
Here is how it works:
If the autovacuum daemon is falling behind across the entire data cluster, review your monitoring solution to identify trends in these metrics:
Mitigation steps include:
pg_stat_activity and look for a query string containing PREVENTING TRANSACTION ID WRAPAROUND. This is actually a normal message, but you should not see autovacuum running solely for the purpose of mitigating wraparound.Here is an example error message that you can find in the Postgres logs when threatened by shutdown due to wraparound:
|
1 2 3 |
# When fewer than 10 million transactions remain before shutdown. WARNING: database "mydb" must be vacuumed within 177009986 transactions HINT: To avoid a database shutdown, execute a database-wide VACUUM in "mydb". |
Here is a set of queries that will help you determine if wraparound is a risk:
|
1 2 3 4 5 6 7 8 |
-- Database query for transaction age per database -- and as a percentage of maximum permitted transactions. SELECT datname, age(datfrozenxid), (age(datfrozenxid)::numeric / 1000000000 * 100)::numeric(4, 2) AS "% WRAPAROUND RISK" FROM pg_database ORDER BY 2 DESC; |
|
1 2 3 4 5 6 7 8 9 10 |
-- Database query for transaction age per table. SELECT c.oid::regclass AS table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) AS "TXID age", (greatest(age(c.relfrozenxid), age(t.relfrozenxid))::numeric / 1000000000 * 100)::numeric(4, 2) AS "% WRAPAROUND RISK" FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm') ORDER BY 2 DESC; |
|
1 2 3 4 5 6 7 8 9 |
-- Other vacuum runtime parameters of interest, -- returning TXID age. SELECT name, setting FROM pg_settings WHERE name ~ 'vacuum' AND name ~ '_age$' ORDER BY 1 ASC; |
First and foremost, make certain all tables are regularly vacuumed. A correctly configured autovacuum process takes care of this before it becomes an issue. Otherwise, you will need to consider a manual VACUUM strategy.
The following are suggestions, since each situation is highly subjective.
If you have the time, run the following invocation of vacuumdb. The value for option -j can vary from a couple of workers to a value equal to the number of CPUs on the host. The option -a processes each database in alphabetical order.
|
1 |
vacuumdb -F -a -z -j 10 -v |
Consider a bash script targeting individual databases if one database is more urgent than another:
|
1 |
vacuumdb -z -j 10 -v <mydatabase> |
The following is the set of actions to take when transaction wraparound is imminent. Remember, you are in a race against time.
You must vacuum the entire data cluster before the remaining available transaction ID drops to 1 million transactions.
vacuumdb.vacuumdb against individual databases and, if necessary, individual tables.-a.Here is a pair of example scripts that you can use as a starting point when developing your own mitigation protocol.
vacuumdb and vacuums one table per invocation.The secret sauce is xargs, which enables one to use as many CPUs as reasonably possible. The following pair of bash scripts invoke vacuumdb against a series of tables. Of course, there is more than one way to do this.
Script one generates a list of tables in a selected database and calls script two, which executes VACUUM on each of those tables individually.
Script one (go1_highspeed_vacuum.sh)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
#!/bin/bash # # Invocation: # ./go1_highspeed_vacuum.sh # ######################################################## # Edit as required. export CPU=4 export PAGER=less export PGUSER=postgres export PGPASSWORD=mypassword export PGDATABASE=db01 export PGOPTIONS='-c statement_timeout=0' ######################################################## SQL1=" WITH a AS ( SELECT c.oid::regclass AS table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm') ORDER BY 2 DESC ) SELECT table_name FROM a; " LIST="$(echo "${SQL1}" | psql -t)" # The -P option sets the number of CPUs to use simultaneously. xargs -t -n 1 -P "${CPU}" ./go2_highspeed_vacuum.sh "${PGDATABASE}" <<< "${LIST}" echo "$(date): DONE" |
Script two (go2_highspeed_vacuum.sh)
|
1 2 3 4 5 6 7 8 9 10 11 12 |
#!/bin/bash ######################################################## # Edit as required. export PAGER=less export PGUSER=postgres export PGPASSWORD=mypassword export PGOPTIONS='-c statement_timeout=0' export DB="$1" ######################################################## vacuumdb --verbose "${DB}" > "${DB}.log" 2>&1 |
vacuumdb against databases in reverse alphabetical order to avoid clashing with autovacuum worker processes, which vacuum in forward alphabetical order.pg_stat_activity.One recovers from a forced shutdown due to transaction ID wraparound by performing a cluster-wide vacuum in single-user mode.
Log in to the host and, as the Unix user postgres, execute an invocation similar to this:
|
1 2 |
# It is understood that environment variable PGDATA points to the data cluster. postgres --single -D "$PGDATA" postgres <<< 'VACUUM ANALYZE;' |
I suggest scripting the vacuum process because you will need to log in to each database to perform the VACUUM.
Generate and edit a list of all databases:
|
1 2 3 |
postgres --single -D "$PGDATA" postgres <<< 'SELECT datname FROM pg_database;' \ | grep '"' \ | cut -d '"' -f 2 > list_db |
Here is an example using the aforementioned list_db file:
|
1 2 3 4 |
# It is understood that the database list has been edited before invoking this code snippet. for u in $(cat list_db); do postgres --single -D "$PGDATA" "$u" <<< 'VACUUM ANALYZE;' done |
TXID wraparound is one of the scariest scenarios that can occur. Thankfully, this is an extremely rare incident and only occurs when systems are either under extremely heavy load or have been neglected.
Do not get caught.
Remember: the best DBA is the one who is never noticed.
Resources
RELATED POSTS
Is there any reason to associate the TOAST table age to the main table? A manual vacuum of the main table will also vacuum the TOAST table but not the reverse. if just trying to vacuum the oldest tables, is there any reason to not just vacuum the ones that are the oldest even if it’s a TOAST table? Vacuuming the main table may be wasting resources if only the TOAST table has an old relfrozenxid.
I recently found myself facing the wraparound situation and grabbed a list of oldest tables and started vacuum threads. When I looked at the list I realized they were all TOAST tables. Still trying to figure out why autovacuum was leaving those alone but there didn’t seem to be any problem manually vacuuming the TOAST directly and it progressed faster.
I have not yet gotten to play with PostgreSQL 13+ and I see there are plenty of changes in this area.
Thanks for the article!
Hi,
> Is there any reason to associate the TOAST table age to the main table?
Hi, I hope I can answer in a manner that will satisfy your question.
Creating something like a column of type text or bytea in a regular table automatically creates the TOASTED table as the column at some point becomes a reference pointer to the values which are in the toasted table and not in the table i.e. there’s a relationship between the two.
Because vacuum cannot operate on TOAST tables alone, as they are not relations, one has no choice but to vacuum the table itself that the toast is related. At least that’s the current architecture of postgres.
Alternatively you can always vacuum the column itself in the table.
Hope this helps.
Thanks for the reply. I’m not sure that I understand the premise. Autovacuum does process TOAST tables independently as seen in pg_stat_activity and pg_stat_progress_vacuum and I can run a manual vacuum on the TOAST table as mentioned earlier. pg_class.relfrozenxid of the main table and its associated entry for the TOAST appear to be maintained separately.
Until recently, I thought of the TOAST as an adjunct to the main table where all management was through the main table. I’m trying to clarify some DO’s and DON’Ts.
Thanks