Understanding of Bloat and VACUUM in PostgreSQL

VACUUM and Bloat PostgreSQLImplementation of MVCC (Multi-Version Concurrency Control) in PostgreSQL is different and special when compared with other RDBMS. MVCC in PostgreSQL controls which tuples can be visible to transactions via versioning.

What is versioning in PostgreSQL?

Let’s consider the case of an Oracle or a MySQL Database. What happens when you perform a DELETE or an UPDATE of a row? You see an UNDO record maintained in a global UNDO Segment. This UNDO segment contains the past image of a row, to help database achieve consistency. (the “C” in A.C.I.D). For example, if there is an old transaction that depends on the row that got deleted, the row may still be visible to it because the past image is still maintained in the UNDO. If you are an Oracle DBA reading this blog post, you may quickly recollect the error ORA-01555 snapshot too old . What this error means is—you may have a smaller undo_retention or not a huge UNDO segment that could retain all the past images (versions) needed by the existing or old transactions.

You may not have to worry about that with PostgreSQL.

Then how does PostgreSQL manage UNDO?

In simple terms, PostgreSQL maintains both the past image and the latest image of a row in its own Table. It means, UNDO is maintained within each table. And this is done through versioning. Now, we may get a hint that, every row of PostgreSQL table has a version number. And that is absolutely correct. In order to understand how these versions are maintained within each table, you should understand the hidden columns of a table (especially xmin) in PostgreSQL.

Understanding the Hidden Columns of a Table

When you describe a table, you would only see the columns you have added, like you see in the following log.

However, if you look at all the columns of the table in pg_attribute, you should see several hidden columns as you see in the following log.

Let’s understand a few of these hidden columns in detail.

tableoid : Contains the OID of the table that contains this row. Used by queries that select from inheritance hierarchies.
More details on table inheritance can be found here : https://www.postgresql.org/docs/10/static/ddl-inherit.html

xmin : The transaction ID(xid) of the inserting transaction for this row version. Upon update, a new row version is inserted. Let’s see the following log to understand the xmin more.

As you see in the above log, the transaction ID was 646 for the command => select txid_current(). Thus, the immediate INSERT statement got a transaction ID 647. Hence, the record was assigned an xmin of 647. This means, no transaction ID that has started before the ID 647, can see this row. In other words, already running transactions with txid less than 647 cannot see the row inserted by txid 647. 

With the above example, you should now understand that every tuple has an xmin that is assigned the txid that inserted it.

Note: the behavior may change depending on the isolation levels you choose, would be discussed later in another blog post.

xmax : This values is 0 if it was not a deleted row version. Before the DELETE is committed, the xmax of the row version changes to the ID of the transaction that has issued the DELETE. Let’s observe the following log to understand that better.

On Terminal A : We open a transaction and delete a row without committing it.

On Terminal B : Observe the xmax values before and after the delete (that has not been committed).

As you see in the above logs, the xmax value changed to the transaction ID that has issued the delete. If you have issued a ROLLBACK, or if the transaction got aborted, xmax remains at the transaction ID that tried to DELETE it (which is 655) in this case.

Now that we understand the hidden columns xmin and xmax, let’s observe what happens after a DELETE or an UPDATE in PostgreSQL. As we discussed earlier, through the hidden columns in PostgreSQL for every table, we understand that there are multiple versions of rows maintained within each table. Let’s see the following example to understand this better.

We’ll insert 10 records to the table : scott.employee

Now, let’s DELETE 5 records from the table.

Now, when you check the count after DELETE, you would not see the records that have been DELETED. To see any row versions that exist in the table but are not visible, we have an extension called pageinspect. The pageinspect module provides functions that allow you to inspect the contents of database pages at a low level, which is useful for debugging purposes. Let’s create this extension to see the older row versions those have been deleted.