Tuning Autovacuum in PostgreSQL and Autovacuum Internals

The performance of a PostgreSQL database can be compromised by dead tuples since they continue to occupy space and can lead to bloat. We provided an introduction to VACUUM and bloat in an earlier blog post. Now, though, it’s time to look at autovacuum for postgres, and the internals you to know to maintain a high-performance PostgreSQL database needed by demanding applications.

What is autovacuum?

Autovacuum is one of the background utility processes that starts automatically when you start PostgreSQL. As you see in the following log, the postmaster (parent PostgreSQL process) with pid 2862 has started the autovacuum launcher process with pid 2868. To start autovacuum, you must have the parameter autovacuum set to ON. In fact, you should not set it to OFF in a production system unless you are 100% sure about what you are doing and its implications.

Why is autovacuum needed? 

We need VACUUM to remove dead tuples so that the space occupied by dead tuples can be re-used by the table for future inserts/updates. To know more about dead tuples and bloat, please read our previous blog post. We also need ANALYZE on the table that updates the table statistics, so that the optimizer can choose optimal execution plans for an SQL statement. It is the autovacuum in postgres that is responsible for performing both vacuum and analyze on tables.

There exists another background process in postgres called Stats Collector that tracks the usage and activity information. The information collected by this process is used by autovacuum launcher to identify the list of candidate tables for autovacuum. PostgreSQL identifies the tables needing vacuum or analyze automatically, but only when autovacuum is enabled. This ensures that postgres heals itself and stops the database from developing more bloat/fragmentation.

Parameters needed to enable autovacuum in PostgreSQL are :

track_counts  is used by the stats collector. Without that in place, autovacuum cannot access the candidate tables.

Logging autovacuum

Eventually, you may want to log the tables on which autovacuum spends more time. In that case, set the parameter log_autovacuum_min_duration to a value (defaults to milliseconds), so that any autovacuum that runs for more than this value is logged to the PostgreSQL log file. This may help tune your table level autovacuum settings appropriately.

Here is an example log of autovacuum vacuum and analyze

When does PostgreSQL run autovacuum on a table? 

As discussed earlier, autovacuum in postgres refers to both automatic VACUUM and ANALYZE and not just VACUUM. An automatic vacuum or analyze runs on a table depending on the following mathematic equations.

The formula for calculating the effective table level autovacuum threshold is :

With the equation above, it is clear that if the actual number of dead tuples in a table exceeds this effective threshold, due to updates and deletes, that table becomes a candidate for autovacuum vacuum.

The above equation says that any table with a total number of inserts/deletes/updates exceeding this threshold—since last analyze—is eligible for an autovacuum analyze.

Let’s understand these parameters in detail.

  • autovacuum_vacuum_scale_factor Or autovacuum_analyze_scale_factor : Fraction of the table records that will be added to the formula. For example, a value of 0.2 equals to 20% of the table records.
  • autovacuum_vacuum_threshold Or autovacuum_analyze_threshold : Minimum number of obsolete records or dml’s needed to trigger an autovacuum.

Let’s consider a table: percona.employee with 1000 records and the following autovacuum parameters.

Using the above mentioned mathematical formulae as reference,

Tuning Autovacuum in PostgreSQL

We need to understand that these are global settings. These settings are applicable to all the databases in the instance. This means, regardless of the table size, if the above formula is reached, a table is eligible for autovacuum vacuum or analyze.

Is this a problem?

Consider a table with ten records versus a table with a million records. Even though the table with a million records may be involved in transactions far more often, the frequency at which a vacuum or an analyze runs automatically could be greater for the table with just ten records.

Consequently, PostgreSQL allows you to configure individual table level autovacuum settings that bypass global settings.

The above setting runs autovacuum vacuum on the table scott.employee only once there is more than 100 obsolete records.

How do we identify the tables that need their autovacuum settings tuned? 

In order to tune autovacuum for tables individually, you must know the number of inserts/deletes/updates on a table for an interval. You can also view the postgres catalog view : pg_stat_user_tables to get that information.

As observed in the above log, taking a snapshot of this data for a certain interval should help you understand the frequency of DMLs on each table. In turn, this should help you with tuning your autovacuum settings for individual tables.

How many autovacuum processes can run at a time? 

There cannot be more than autovacuum_max_workers number of autovacuum processes running at a time, across the instance/cluster that may contain more than one database. Autovacuum launcher background process starts a worker process for a table that needs a vacuum or an analyze. If there are four databases with autovacuum_max_workers set to 3, then, the 4th database has to wait until one of the existing worker process gets free.

Before starting the next autovacuum, it waits for autovacuum_naptime, the default is 1 min on most of the versions. If you have three databases, the next autovacuum waits for 60/3 seconds. So, the wait time before starting next autovacuum is always (autovacuum_naptime/N) where N is the total number of databases in the instance.

Does increasing autovacuum_max_workers alone increase the number of autovacuum processes that can run in parallel?
NO. This is explained better in the next few lines.

Is VACUUM IO intensive? 

Autovacuum can be considered as a cleanup. As discussed earlier, we have 1 worker process per table. Autovacuum reads 8KB (default block_size) pages of a table from disk and modifies/writes to the pages containing dead tuples. This involves both read and write IO. Thus, this could be an IO intensive operation, when there is an autovacuum running on a huge table with many dead tuples, during a peak transaction time. To avoid this issue, we have a few parameters that are set to minimize the impact on IO due to vacuum.

The following are the parameters used to tune autovacuum IO

  • autovacuum_vacuum_cost_limit : total cost limit autovacuum could reach (combined by all autovacuum jobs).
  • autovacuum_vacuum_cost_delay : autovacuum will sleep for these many milliseconds when a cleanup reaching autovacuum_vacuum_cost_limit cost is done.
  • vacuum_cost_page_hit : Cost of reading a page that is already in shared buffers and doesn’t need a disk read.
  • vacuum_cost_page_miss : Cost of fetching a page that is not in shared buffers.
  • vacuum_cost_page_dirty : Cost of writing to each page when dead tuples are found in it.