Tuning Autovacuum in PostgreSQL and Autovacuum Internals

Tuning Autovacuum in PostgreSQL and Autovacuum Internals

PREVIOUS POST
NEXT POST

The performance of a PostgreSQL database can be compromised by dead tuples, since they continue to occupy space and can lead to bloat. We provided an introduction to VACUUM and bloat in an earlier blog post. Now, though, it’s time to look at autovacuum for postgres, and the internals you to know to maintain a high performance PostgreSQL database needed by demanding applications.

What is autovacuum ?

Autovacuum is one of the background utility processes that starts automatically when you start PostgreSQL. As you see in the following log, the postmaster (parent PostgreSQL process) with pid 2862 has started the autovacuum launcher process with pid 2868. To start autovacuum, you must have the parameter autovacuum set to ON. In fact, you should not set it to OFF in a production system unless you are 100% sure about what you are doing and its implications.

Why is autovacuum needed ? 

We need VACUUM to remove dead tuples, so that the space occupied by dead tuples can be re-used by the table for future inserts/updates. To know more about dead tuples and bloat, please read our previous blog post. We also need ANALYZE on the table that updates the table statistics, so that the optimizer can choose optimal execution plans for an SQL statement. It is the autovacuum in postgres that is responsible for performing both vacuum and analyze on tables.

There exists another background process in postgres called Stats Collector that tracks the usage and activity information. The information collected by this process is used by autovacuum launcher to identify the list of candidate tables for autovacuum. PostgreSQL identifies the tables needing vacuum or analyze automatically, but only when autovacuum is enabled. This ensures that postgres heals itself and stops the database from developing more bloat/fragmentation.

Parameters needed to enable autovacuum in PostgreSQL are :

track_counts  is used by the stats collector. Without that in place, autovacuum cannot access the candidate tables.

Logging autovacuum

Eventually, you may want to log the tables on which autovacuum spends more time. In that case, set the parameter log_autovacuum_min_duration to a value (defaults to milliseconds), so that any autovacuum that runs for more than this value is logged to the PostgreSQL log file. This may help tune your table level autovacuum settings appropriately.

Here is an example log of autovacuum vacuum and analyze

When does PostgreSQL run autovacuum on a table ? 

As discussed earlier, autovacuum in postgres refers to both automatic VACUUM and ANALYZE and not just VACUUM. An automatic vacuum or analyze runs on a table depending on the following mathematic equations.

The formula for calculating the effective table level autovacuum threshold is :

With the equation above, it is clear that if the actual number of dead tuples in a table exceeds this effective threshold, due to updates and deletes, that table becomes a candidate for autovacuum vacuum.

The above equation says that any table with a total number of inserts/deletes/updates exceeding this threshold—since last analyze—is eligible for an autovacuum analyze.

Let’s understand these parameters in detail.

  • autovacuum_vacuum_scale_factor Or autovacuum_analyze_scale_factor : Fraction of the table records that will be added to the formula. For example, a value of 0.2 equals to 20% of the table records.
  • autovacuum_vacuum_threshold Or autovacuum_analyze_threshold : Minimum number of obsolete records or dml’s needed to trigger an autovacuum.

Let’s consider a table: percona.employee with 1000 records and the following autovacuum parameters.

Using the above mentioned mathematical formulae as reference,

Tuning Autovacuum in PostgreSQL

We need to understand that these are global settings. These settings are applicable to all the databases in the instance. This means, regardless of the table size, if the above formula is reached, a table is eligible for autovacuum vacuum or analyze.

Is this a problem ?

Consider a table with ten records versus a table with a million records. Even though the table with a million records may be involved in transactions far more often, the frequency at which a vacuum or an analyze runs automatically could be greater for the table with just ten records.

Consequently, PostgreSQL allows you to configure individual table level autovacuum settings that bypass global settings.

The above setting runs autovacuum vacuum on the table scott.employee only once there is more than 100 obsolete records.

How do we identify the tables that need their autovacuum settings tuned ? 

In order to tune autovacuum for tables individually, you must know the number of inserts/deletes/updates on a table for an interval. You can also view the postgres catalog view : pg_stat_user_tables to get that information.

As observed in the above log, taking a snapshot of this data for a certain interval should help you understand the frequency of DMLs on each table. In turn, this should help you with tuning your autovacuum settings for individual tables.

How many autovacuum processes can run at a time ? 

There cannot be more than autovacuum_max_workers number of autovacuum processes running at a time, across the instance/cluster that may contain more than one database. Autovacuum launcher background process starts a worker process for a table that needs a vacuum or an analyze. If there are four databases with autovacuum_max_workers set to 3, then, the 4th database has to wait until one of the existing worker process gets free.

Before starting the next autovacuum, it waits for autovacuum_naptime, the default is 1 min on most of the versions. If you have three databases, the next autovacuum waits for 60/3 seconds. So, the wait time before starting next autovacuum is always (autovacuum_naptime/N) where N is the total number of databases in the instance.

Does increasing autovacuum_max_workers alone increase the number of autovacuum processes that can run in parallel ?
NO. This is explained better in next few lines.

Is VACUUM IO intensive? 

Autovacuum can be considered as a cleanup. As discussed earlier, we have 1 worker process per table. Autovacuum reads 8KB (default block_size) pages of a table from disk and modifies/writes to the pages containing dead tuples. This involves both read and write IO. Thus, this could be an IO intensive operation, when there is an autovacuum running on a huge table with many dead tuples, during a peak transaction time. To avoid this issue, we have a few parameters that are set to minimize the impact on IO due to vacuum.

The following are the parameters used to tune autovacuum IO

  • autovacuum_vacuum_cost_limit : total cost limit autovacuum could reach (combined by all autovacuum jobs).
  • autovacuum_vacuum_cost_delay : autovacuum will sleep for these many milliseconds when a cleanup reaching autovacuum_vacuum_cost_limit cost is done.
  • vacuum_cost_page_hit : Cost of reading a page that is already in shared buffers and doesn’t need a disk read.
  • vacuum_cost_page_miss : Cost of fetching a page that is not in shared buffers.
  • vacuum_cost_page_dirty : Cost of writing to each page when dead tuples are found in it.

Consider autovacuum VACUUM running on the table percona.employee.

Let’s imagine what can happen in 1 second. (1 second = 1000 milliseconds)

In a best case scenario where read latency is 0 milliseconds, autovacuum can wake up and go for sleep 50 times (1000 milliseconds / 20 ms) because the delay between wake-ups needs to be 20 milliseconds.

Since the cost associated per reading a page in shared_buffers is 1, in every wake up 200 pages can be read, and in 50 wake-ups 50*200 pages can be read.

If all the pages with dead tuples are found in shared buffers, with an autovacuum_vacuum_cost_delay of 20ms, then it can read: ((200 / vacuum_cost_page_hit) * 8) KB in each round that needs to wait forautovacuum_vacuum_cost_delay amount of time.

Thus, at the most, an autovacuum can read : 50 * 200 * 8 KB = 78.13 MB per second (if blocks are already found in shared_buffers), considering the block_size as 8192 bytes.

If the blocks are not in shared buffers and need to fetched from disk, an autovacuum can read : 50 * ((200 / vacuum_cost_page_miss) * 8) KB = 7.81 MB per second.

All the information we have seen above is for read IO.

Now, in order to delete dead tuples from a page/block, the cost of a write operation is : vacuum_cost_page_dirty, set to 20 by default.

At the most, an autovacuum can write/dirty : 50 * ((200 / vacuum_cost_page_dirty) * 8) KB = 3.9 MB per second.

Generally, this cost is equally divided to all the autovacuum_max_workers number of autovacuum processes running in the Instance. So, increasing the autovacuum_max_workers may delay the autovacuum execution for the currently running autovacuum workers. And increasing the autovacuum_vacuum_cost_limit may cause IO bottlenecks. An important point to note is that this behaviour can be overridden by setting the storage parameters of individual tables, which would subsequently ignore the global settings.

Thus, on a busy OLTP database, always have a strategy to implement manual VACUUM on tables that are frequently hit with DMLs, during a low peak window. You may have as many parallel vacuum jobs as possible when you run it manually after setting relevant autovacuum_* settings. For this reason, a scheduled manual Vacuum Job is always recommended alongside finely tuned autovacuum settings.

PREVIOUS POST
NEXT POST

Share this post

Comments (14)

  • Shaik Reply

    Great

    August 11, 2018 at 10:05 pm
  • Peter Zaitsev Reply

    The overhead in terms of additional IO resource usage is clear. Couple of other questions

    – Does autovacuum uses the cache (and hence can cause cache pollution) or does it use some different IO path ?

    – What is about locking ? Are there any locks which are being set by AutoVacum which can impact parallel queries beyond competition for IO resources ?

    August 12, 2018 at 8:14 pm
    • Andrey Reply

      Autovacuum acquire AccessExclusiveLock or ShareUpdateExclusiveLock on a vacuumed relation according to vacuum options.

      August 13, 2018 at 7:32 am
    • Jobin Augustine Reply

      Regarding the IO overhead, PostgreSQL is just delaying the I/O.
      Few databases move old snapshots/records to Undo area as part of transaction causing extra I/O as part of the transaction. But PostgreSQL takes an approach “Leave it there. don’t move/copy anywhere”. So PostgreSQL saves some IO there. But that saving is not permanent and ultimately it needs to do the garbage collection which is called vacuum.

      – Does autovacuum uses the cache (and hence can cause cache pollution) or does it use some different IO path ?
      Yes, it can cause cache pollution just like any garbage collector. However, The autovacuum algorithms are improved continuously from version to version and newer versions are able to avoid scanning pages wherever possible. The community effort is to bring it down to scan only those pages which are really modified. There is a high chance that such modified pages are there in the cache already.

      – What is about locking? Are there any locks which are being set by AutoVacum which can impact parallel queries beyond competition for IO resources?
      Autovacuum won’t acquire locks which can interfere with other sessions doing SELECT/INSERT/UPDATE/DELETE. The lightweight “cleanup lock” it acquires can potentially can affect DDLs.
      However, Autovacuum is designed to be so gentle on the system, It has mechanisms to cancel itself, skip blocks or entire table itself,
      Running a newer version PostgreSQL is very important to get the best of it.

      Having said, Autovacuum is far away from perfectness. More efforts are underway to improve it further and address all edge cases.

      August 13, 2018 at 8:25 am
      • Ujang Jaenudin Reply

        “Yes, it can cause cache pollution just like any garbage collector.”
        I heard there is “ring buffer” for special purposes i.e. vacuuming and also scanning big tables which 1/4 of shared buffer; could you please elaborate?

        September 3, 2018 at 8:26 pm
        • Jobin Augustine Reply

          I think that is a related topic but extension to what we discussed in this blog. Yes, you are correct, there are separate ring buffers to help autovacuum workers, let us consider what happens in a common workload.
          Autovacuum is not an aggressive backend process, it scans through tables from the candidate list one by one. so the pages required by autovacuum could be quite different from the current workload demands. There is a chance of getting the pages from current cache. However, the pages dirtied by autovacuum may not be needed again.
          To reduce the overall impact of autovacuum on shared buffers, the seperate ring buffer is maintained. There is a good chance that vacuum will almost always dirty pages once and never again, and flushing dirty pages back to the OS cache ASAP helps avoid a read-modify-write cycle if the page didn’t leave the OS cache.
          vacuum related ring buffers are not currently tunable by parameters, But we can modify the source code and build the PostgreSQL. But we can expect an improved vacuum performance by doing so in servers with several GBs of RAM
          Source file : /backend/storage/buffer/freelist.c
          Function : GetAccessStrategy(BufferAccessStrategyType btype)
          However, There is a counter-argument from the community that if we optimize autovacuum to make it run faster, it may take away more IO bandwidth and cache. Still, many community members reported overall better performance in their workload.
          The takeaway is that same size won’t fit for every workload. Do modifications with good amount of testing.
          When we work with PostgreSQL, always we have to keep in mind that we generally depends on OS level page cache to a good extent. General recommendation is to give 1/4 of RAM to shared buffers and rest to OS page cache. Any process accessing any memory page will have a side effect on the cache.

          September 4, 2018 at 10:53 am
      • Ian Warford (@iwarford) Reply

        >Autovacuum won’t acquire locks
        >which can interfere with other
        >sessions doing SELECT/INSERT/
        >UPDATE/DELETE. The lightweight
        >“cleanup lock” it acquires can
        >potentially can affect DDLs.

        Another question related to this. What are the long-term consequences from vacuum/autovacuum being unable to acquire that cleanup lock? Our experience showed that it was a brief exclusive lock on the table required to free up the space. However, on large busy tables, this wasn’t possible for autovacuum to ever get. Has this been your experience?

        November 8, 2018 at 12:06 pm
  • raghavendra Reply

    What is the role of “autovacuum_vacuum_cost_limit”?

    August 13, 2018 at 5:56 am
    • Jobin Augustine Reply

      This is a mechanism to prevent autovacuum consuming excessive resource (I/O) on the server. By specifying this parameter we are saying to autovacuum workers to take a pause when the cost reaches a specific value.

      August 13, 2018 at 8:39 am
  • Henry Reply

    is autovacuum starting its modification as a transaction? what if autovacuum failed in the middle of cleaning dead tuples (like a power outage sth).

    October 17, 2018 at 10:11 pm
    • Andrey Reply

      All VACUUM modifications are logging into WAL, block-by-block, and after crashing will be rollbacked

      October 17, 2018 at 10:17 pm
  • Henry Reply

    thanks, i tried to add a stack dump to each XLogInsert call, and check if there are any that originates from autovacuum, but failed to find any instance to prove autovacuum calls are logged. possible to show the code path that dumps to wal? thanks

    October 17, 2018 at 10:37 pm
    • Andrey Reply

      I do not understand your task in general, but you can use WAL record XLOG_BTREE_VACUUM as a strong prove – it is log of index cleanup during vacuum.

      October 17, 2018 at 11:00 pm
  • raviteja Reply

    We are facing the issue with dead tuples where it’s not getting deleted even vacuum invokes and there is no lock on that table

    November 29, 2018 at 8:44 am

Leave a Reply