BRIN Index for PostgreSQL: Don’t Forget the Benefits

PostgreSQLBRIN Index was introduced in PostgreSQL 9.5, but many users postponed the usage of it in their design and development just because it was “new”. But now we understand that it has stood the test-of-time! It is time to reconsider BRIN if you have not done it yet. I often see users who forget there is a provision to select the type of Index by specifying USING clause when creating an index.

BRIN Index is a revolutionary idea in indexing first proposed by PostgreSQL contributor Alvaro Herrera. BRIN stands for “Block Range INdex”. A block range is a group of pages adjacent to each other, where summary information about all those pages is stored in Index.  For example, Datatypes like integers – dates where sort order is linear – can be stored as min and max value in the range. Other database systems including Oracle announced similar features later. BRIN index often gives similar gains as Partitioning a table.

BRIN usage will return all the tuples in all the pages in the particular range. So the index is lossy and extra work is needed to further filter out records. So while one might say that is not good, there are a few advantages.

  1. Since only summary information about a range of pages is stored, BRIN indexes are usually very small compared to B-Tree indexes. So if we want to squeeze the working set of data to shared_buffer, this is a great help.
  2. Lossiness of BRIN can be controlled by specifying pages per range (discussed in a later section)
  3. Offloads the summarization work to vacuum or autovacuum. So the overhead of index maintenance on transaction / DML operation is minimal.

Putting BRIN into a test

Let’s take a simple example to examine the benefits of BRIN index by creating a simple table.

Now let’s Insert some data into this table.

Please note that values in id column and date columns keep increasing for new records, which is common for transaction records. This is an important property which BRIN make use off.

A query at this stage may have to do a full scan of the table and it will be quite expensive.

As we can see above, PostgreSQL employs 2 workers and does the scan in parallel, where it takes 1766.454 ms. In a nutshell, it is heavy on the system and takes a good amount of time.

As usual, our tendency is to create an index on the filtering column. (B-Tree by default)

Now let’s see how the previous SELECT query works:

Obviously, B-Tree is lossless, and it can give a tremendous boost to the performance and efficiency of SELECT queries, especially when Index is freshly created. But a B-Tree index has following side effects:

  1. There will be a reasonable penalty for DMLs
  2. Index size can go big and consume a good amount of shared_buffers
  3. The random page seeks, especially when index pages are not cached, can become expensive.

It will be interesting to check the Index to table size ratio of this fresh index.

So we have B-Tree index 171MB for a 1.3GB table. So Index to table size ratio is 0.13, for this is a fresh index.

But this index-table ratio can keep deteriorating over time as index undergoes continuous updates. Index ratio crossing 0.5 is common in many production environments. As the ratio becomes bad, the efficiency of the index goes bad and it starts occupying more shared buffers.

Things get much worse if the table grows to a bigger size (hundreds of GB or TB) as Index also grows in the same ratio. The impact of B-Tree index on DMLs is heavy – sometimes I measured up to 30% overhead especially for bulk loads.

Now let’s see what happens if we replace a B-Tree index with a BRIN index.

The very first observation is that the impact of Index on the same bulk insert operation is measured to be 3% which is within my noise/error limits. 30% overhead Vs 3% overhead.

If we consider the size of the newly created BRIN index:

As we can see it is just 64kB! 171MB of B-Tree index Vs 64 kb of BRIN index.

So far BRIN wins my heart. Now its time to look at how much query performance improvement it can bring in.