This blog post was originally published in January 2023 and was updated in March 2025.
In PostgreSQL, the VACUUM command is a critical maintenance task essential to optimize database performance and reclaim wasted disk space. Using the PostgreSQL VACUUM command involves removing deleted or outdated row versions (“dead tuples”) from tables and indexes, and optionally updating statistics used by the query planner. This process is necessary because of PostgreSQL’s concurrency control mechanism and prevents the accumulation of bloat, which consumes disk space and can significantly slow down queries. Regular vacuuming is vital for a healthy, performant PostgreSQL database.
Ensure your databases are performing their best — today and tomorrow — with proactive database optimization and query tuning. Book a database assessment
Understanding MVCC: Why PostgreSQL VACUUM is necessary
To maintain data consistency during concurrent operations, PostgreSQL employs Multi-Version Concurrency Control (MVCC). MVCC allows multiple transactions to access data simultaneously without blocking each other, ensuring consistent reads. This is achieved by storing multiple versions of each row.
When a row is inserted, it gets a transaction ID (xmin) indicating when it became visible. When a row is UPDATEd or DELETEd, PostgreSQL doesn’t immediately remove the old row version. Instead, it marks the old version as “dead” by setting its expiration transaction ID (xmax) and inserts a new version (for UPDATE).
Transactions use these xmin and xmax values against their transaction “snapshot” to determine which row versions are visible to them. This ensures a consistent view but means old, “dead” row versions accumulate within the table files. These dead tuples consume space and can slow down scans. This is where the VACUUM command comes in.
What does the PostgreSQL VACUUM command do?
The primary purpose of the PostgreSQL VACUUM command is database maintenance to counteract the effects of MVCC. Its main functions are:
-
Reclaiming Space: It marks space occupied by dead tuples as free and available for reuse by new row versions within the same table. Note that standard VACUUM often doesn’t return space to the operating system immediately.
-
Updating Visibility Information: It updates internal data structures (like the visibility map) to speed up future index scans.
-
Updating Statistics: Optionally (with ANALYZE), it updates table statistics used by the query planner to choose efficient execution plans.
-
Preventing Transaction ID Wraparound: It performs essential cleanup to prevent the 32-bit transaction ID counter from wrapping around and causing data corruption issues (discussed later).
Periodically running VACUUM is essential to keep the database running smoothly.
PostgreSQL VACUUM variations and options
Standard VACUUM
This is the regular form (VACUUM table_name; or just VACUUM; for all tables).
-
Removes dead tuples, making space reusable within the table.
-
Does not require an exclusive lock (reads and writes can continue, though there might be some performance impact).
-
Generally does not shrink the table file size significantly or return space to the OS.
VACUUM Full
The VACUUM FULL command performs a more aggressive cleanup.
-
Rewrites the entire table to a new disk file without any dead space, effectively returning unused space to the operating system.
-
Requires an exclusive lock on the table, blocking all reads and writes until it completes.
-
Can be very slow and resource-intensive, especially on large tables.
-
Use case: Should be used sparingly, typically only when significant space reclamation is needed and downtime is acceptable. Often, regular VACUUM and Autovacuum are sufficient.
VACUUM with ANALYZE
The ANALYZE operation collects statistics about table contents (data distribution, distinct values, etc.) used by the query planner. While VACUUM reclaims space, ANALYZE updates these vital statistics.
-
Running VACUUM ANALYZE table_name; performs both operations together efficiently.
-
It is generally recommended to run ANALYZE regularly, either combined with VACUUM or separately, especially after significant data changes. Accurate statistics are critical for optimizing database performance.
Verbose option
Adding the VERBOSE option (e.g., VACUUM (VERBOSE, ANALYZE) table_name;) provides detailed output about the vacuuming process, showing which tables/indexes are being processed and statistics about dead tuples removed. This is useful for monitoring and debugging. (Note: This relates to command output, not a separate vacuum type.)
Automating maintenance with Autovacuum
Manually running VACUUM can be tedious. PostgreSQL provides the Autovacuum daemon, a background process that automatically runs VACUUM and ANALYZE operations when certain thresholds are met.
Autovacuum is enabled by default and is crucial for maintaining database health without constant manual intervention. Its behavior is controlled by several parameters in postgresql.conf:
- autovacuum: This setting enables or disables the autovacuum background process. By default, autovacuum is enabled.
- autovacuum_vacuum_threshold: This setting determines the minimum number of dead rows that must be present in a table before it is vacuumed. The default value is 50.
- autovacuum_analyze_threshold: This setting determines the minimum number of live rows that must be present in a table before it is analyzed. The default value is 50.
- autovacuum_vacuum_scale_factor: This setting is a multiplier that determines how many dead rows are needed to trigger a vacuum based on the table size. The default value is 0.2.
- autovacuum_analyze_scale_factor: This setting is a multiplier that determines how many live rows are needed to trigger an analyze based on the size of the table. The default value is 0.1.
- autovacuum_vacuum_cost_delay: This setting determines the time (in milliseconds) the autovacuum will wait before starting a vacuum operation. The default value is 20.
- autovacuum_vacuum_cost_limit: This setting determines the maximum number of rows that can be vacuumed in a single vacuum operation. The default value is 200.
Here is an example of configuring some of the vacuum parameters in the postgresql.conf file:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
autovacuum = on autovacuum_vacuum_threshold = 100 autovacuum_analyze_threshold = 100 autovacuum_vacuum_scale_factor = 0.5 autovacuum_analyze_scale_factor = 0.2 autovacuum_vacuum_cost_delay = 50 autovacuum_vacuum_cost_limit = 500 |
In this example, autovacuum is enabled, and the thresholds for vacuum and analyze are set to 100. The scale factors for vacuum and analyze are set to 0.5 and 0.2, respectively, which means that a vacuum will be triggered when there are 50 dead rows per 1,000 live rows in the table (0.5 x 100), and an analyze will be triggered when there are 20 live rows per 1,000 rows in the table (0.2 x 100). The vacuum cost delay is set to 50 milliseconds, and the vacuum cost limit is set to 500 rows, which means that the autovacuum will wait 50 milliseconds before starting a vacuum operation and will vacuum a maximum of 500 rows at a time.
It is important to configure these settings to ensure that vacuum and analyze properly are running effectively and not causing too much load on the database. It is also a good idea to monitor the activity of the autovacuum and manually vacuum tables that are not adequately maintained by the autovacuum.
Autovacuum can also be configured on a per-table basis using the autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit parameters in the table’s storage parameters. These parameters control how aggressively autovacuum vacuums the table, with a lower cost delay causing the vacuum to run more frequently and a higher cost limit allowing more rows to be vacuumed at once.
Tuning Autovacuum:
-
Default settings are conservative. For busy databases or tables with high update/delete rates, you may need to lower thresholds (autovacuum_vacuum_threshold, autovacuum_vacuum_scale_factor) to make autovacuum run more frequently on specific tables (using ALTER TABLE … SET (…)).
-
Conversely, on very large, mostly static tables, you might increase thresholds to prevent unnecessary runs.
-
Adjusting cost delay/limit helps balance vacuum speed vs. impact on foreground operations.
-
Monitoring autovacuum activity using system views (pg_stat_activity, pg_stat_user_tables) is essential for effective tuning.
Leveraging Parallel VACUUM for large tables
Parallel vacuum is a feature in PostgreSQL that allows the vacuum process to be run concurrently on multiple cores or processors for the same table, improving the performance of the vacuum operation. This can be especially useful for vacuuming large tables, as it allows the vacuum process to use multiple CPUs to scan and process the table in parallel.
The parallel vacuum was introduced in PostgreSQL 13 as an experimental feature and made generally available in PostgreSQL 14. To use a parallel vacuum, you need to set the “max_parallel_workers_per_gather” parameter in the postgresql.conf configuration file to a value greater than one. To enable parallel processing, specify the “PARALLEL” option when running a vacuum or analyze command.
For example, to run a parallel vacuum on a table named “foo_table”, you can use the following command:
1 |
VACUUM (PARALLEL, ANALYZE) foo_table; |
You can also specify the “PARALLEL” option when running a vacuum or analyze command on an entire schema or database:
1 |
VACUUM (PARALLEL, ANALYZE) schema_name.*; VACUUM (PARALLEL, ANALYZE); |
Note: Keep in mind that parallel vacuum can increase the load on the database server, as it requires multiple CPU cores to be used simultaneously. You should carefully monitor the performance of your PostgreSQL database when using a parallel vacuum and adjust the “max_parallel_workers_per_gather” parameter as needed to find the optimal setting for your workload.
PostgreSQL Performance Tuning: Optimizing Database Parameters for Maximum Efficiency
Preventing transaction ID wraparound with VACUUM
PostgreSQL uses a 32-bit transaction ID (TXID). When this counter reaches its maximum value (approx. 4 billion), it “wraps around” and starts reusing old TXIDs. Transaction ID wraparound is a serious condition. If VACUUM hasn’t processed old rows sufficiently, reused TXIDs could make very old data suddenly appear to be “in the future,” potentially leading to data corruption or making rows invisible.
-
Prevention: Regular VACUUM processing (especially by Autovacuum) is the primary defense. It freezes old row versions, marking them as visible to all transactions, regardless of TXID comparisons.
-
Monitoring: Parameters like autovacuum_freeze_max_age trigger more aggressive autovacuums on tables approaching the wraparound limit. Monitor age(datfrozenxid) for databases and age(relfrozenxid) for tables.
Monitoring VACUUM activity and statistics
To view the vacuum history for all tables in the current schema:
This query retrieves vacuum statistics for all tables in the “public” schema in the current database. This query can help monitor the status of the vacuum process and identify tables that may need to be vacuumed or analyzed. For example, if a table has many dead rows or has not been vacuumed or analyzed recently. In that case, it may be worth running a manual vacuum or analyzing operations to improve the performance of the database. Download our free e-book, “The 6 Common Causes of Poor Database Performance,” to see the most common database performance issues and get expert insights on how to fix them.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
SELECT n.nspname as schema_name, c.relname as table_name, c.reltuples as row_count, c.relpages as page_count, s.n_dead_tup as dead_row_count, s.last_vacuum, s.last_autovacuum, s.last_analyze, s.last_autoanalyze FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace LEFT JOIN pg_stat_user_tables s ON s.relid = c.oid WHERE c.relkind = 'r' AND n.nspname = 'public'; -[ RECORD 1 ]----+------------------------------ schema_name | public table_name | pgbench_accounts row_count | 9.999965e+06 page_count | 163935 dead_row_count | 41705 last_vacuum | 2022-12-25 16:00:36.231734+00 last_autovacuum | last_analyze | 2022-12-25 16:00:18.90299+00 last_autoanalyze | -[ RECORD 2 ]----+------------------------------ schema_name | public table_name | pgbench_branches row_count | 100 page_count | 1 dead_row_count | 41 last_vacuum | 2022-12-25 16:00:44.722317+00 last_autovacuum | last_analyze | 2022-12-25 16:00:16.254529+00 last_autoanalyze | 2022-12-25 16:01:45.957663+00 |
To view the list of tables that have been modified since the last vacuum:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
SELECT n.nspname as schema_name, c.relname as table_or_index_name, c.relkind as table_or_index, c.reltuples as row_count, s.last_vacuum, s.last_autovacuum, s.last_analyze, s.last_autoanalyze FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace LEFT JOIN pg_stat_user_tables s ON s.relid = c.oid WHERE (c.relkind = 'r' or c.relkind = 'i') AND (s.last_vacuum < s.last_autovacuum OR s.last_vacuum < s.last_analyze); |
To view the list of tables and indexes that have a high number of dead rows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
SELECT n.nspname as schema_name, c.relname as table_name, c.reltuples as row_count, s.n_dead_tup as dead_row_count FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace LEFT JOIN pg_stat_user_tables s ON s.relid = c.oid WHERE c.relkind = 'r' AND s.n_dead_tup > 0; SELECT n.nspname as schema_name, c.relname as index_name, s.n_dead_tup as dead_row_count FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace LEFT JOIN pg_stat_user_tables s ON s.relid = c.oid WHERE (c.relkind = ''r'' or c.relkind = ''i'') AND (s.last_vacuum < s.last_autovacuum OR s.last_vacuum < s.last_analyze)') AS t( schema_name text, table_or_index_name text, table_or_index char(1), row_count bigint, last_vacuum timestamp, last_autovacuum timestamp, last_analyze timestamp, last_autoanalyze timestamp ); |
You can write the above query for all the databases using dblink. This will give information about all the databases.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
SELECT * FROM dblink('host=<host> port=<port> dbname=<database> user=<username> password=<password>', 'SELECT n.nspname as schema_name, c.relname as table_or_index_name, c.relkind as table_or_index, c.reltuples as row_count, s.last_vacuum, s.last_autovacuum, s.last_analyze, s.last_autoanalyze FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace LEFT JOIN pg_stat_user_tables s ON s.relid = c.oid WHERE (c.relkind = ''r'' or c.relkind = ''i'') AND (s.last_vacuum < s.last_autovacuum OR s.last_vacuum < s.last_analyze)') AS t( schema_name text, table_or_index_name text, table_or_index char(1), row_count bigint, last_vacuum timestamp, last_autovacuum timestamp, last_analyze timestamp, last_autoanalyze timestamp ); |
Note: You need to replace the placeholders <host>, <port>, <database>, <username>, and <password> with the actual values for your server.
Conclusion
The PostgreSQL VACUUM command, in its various forms, is not just optional maintenance; it’s fundamental to sustained database performance, efficient space usage, and data integrity (preventing transaction wraparound). Understanding MVCC explains why vacuuming is needed to clean up dead tuples. While manual VACUUM and ANALYZE are possible, leveraging and properly tuning Autovacuum is the recommended approach for most workloads. Monitoring vacuum activity and statistics allows administrators to ensure this critical process runs effectively, keeping the PostgreSQL database optimized and healthy.
See why running open source PostgreSQL in-house demands more time, expertise, and resources than most teams expect — and what it means for IT and the business.
PostgreSQL in the Enterprise: The Real Cost of Going DIY
FAQs
What is vacuum in PostgreSQL?
The term “vacuum” refers to a database maintenance procedure that reclaims storage space and optimizes database performance. Whenever data is inserted, updated, or deleted in a PostgreSQL database, it can generate “dead tuples” – rows that have become obsolete or are inaccessible.
The VACUUM operation in PostgreSQL identifies and eliminates these dead tuples, freeing up disk space to utilize for future operations.
What is the difference between vacuum full and vacuum freeze in PostgreSQL?
Both the VACUUM FULL and VACUUM FREEZE operations are both used in PostgreSQL for maintenance purposes, but they serve different functions.
The VACUUM FULL operation reclaims storage space by physically rewriting the entire table, relocating live tuples to fresh data pages, and discarding obsolete tuples. This can be useful when a significant amount of data has been updated and/or deleted, which results in fragmented space within the table. This operation effectively recaptures the space, but it comes with a cost; longer execution time and exclusive locks on the table, impacting concurrency.
VACUUM FREEZE addresses the issue of transaction ID wraparound. In PostgreSQL, every transaction is associated with a unique transaction ID, represented by a 32-bit number. As transactions progress, the transaction ID incrementally increases, and when it reaches its maximum value, it wraps around to zero, causing issues with transaction visibility and data integrity. The FREEZE operation marks all tuples in a table as frozen, meaning their transaction IDs are set to the maximum value, preventing wraparound and ensuring these tuples are marked as old and not subject to removal during VACUUM operations.
What is the vacuum threshold in PostgreSQL?
The “autovacuum_vacuum_threshold” parameter in PostgreSQL determines the minimum number of updated or deleted tuples required in a table before the autovacuum process is triggered. The default is 50 tuples, meaning that if 50 or more tuples are modified in a table, autovacuum will be triggered. However, you can adjust this value in the PostgreSQL configuration file (postgresql.conf) or by changing table storage parameters.
What is the difference between vacuum and vacuum full?
VACUUM: The VACUUM command analyzes and reclaims storage space in a PostgreSQL database. When rows in a table are deleted or updated, the space they once occupied is not immediately freed up. Rather, it becomes marked as available for future use. This command does not need access to the table, enabling other transactions to read from and write to the table concurrently.
VACUUM FULL: This is a more forceful form of vacuuming, reclaiming space by physically relocating the data within the table, ensuring the removal of all defunct rows, and compacting the storage space. It can take longer to complete than regular vacuuming and requires exclusive access to the table, making it inaccessible for reading or writing by other transactions while the command is in progress.
The primary distinction between VACUUM and VACUUM FULL lies in the level of aggressiveness used to free up storage space. VACUUM is a lightweight and faster operation, marking the space as available for reuse, while VACUUM FULL physically rearranges the data and compresses the storage, resulting in a more thorough cleanup.
Good article. It refers to a query one can use to check for tables at risk of xid wraparound, but I don’t see the query I think the author intended to include. Feel free to delete this comment from me if you fix the missing query.
he vacuum cost delay is set to 50 milliseconds, and the vacuum cost limit is set to 500 rows, which means that the autovacuum will wait 50 milliseconds before starting a vacuum operation and will vacuum a maximum of 500 rows at a time.
— I believe 500 here is pages not 500 rows ?