Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

BRIN Index for PostgreSQL: Don’t Forget the Benefits

July 16, 2019

Author

Jobin Augustine

PostgreSQL

Share this Post:

BRIN Index was introduced in PostgreSQL 9.5, but many users postponed the usage of it in their design and development just because it was “new”. But now we understand that it has stood the test-of-time! It is time to reconsider BRIN if you have not done it yet. I often see users who forget there is a provision to select the type of Index by specifying USING clause when creating an index.

BRIN Index is a revolutionary idea in indexing first proposed by PostgreSQL contributor Alvaro Herrera. BRIN stands for “Block Range INdex”. A block range is a group of pages adjacent to each other, where summary information about all those pages is stored in Index. For example, Datatypes like integers – dates where sort order is linear – can be stored as min and max value in the range. Other database systems including Oracle announced similar features later. BRIN index often gives similar gains as Partitioning a table.

BRIN usage will return all the tuples in all the pages in the particular range. So the index is lossy and extra work is needed to further filter out records. So while one might say that is not good, there are a few advantages.

1. Since only summary information about a range of pages is stored, BRIN indexes are usually very small compared to B-Tree indexes. So if we want to squeeze the working set of data to shared_buffer, this is a great help.

1. Lossiness of BRIN can be controlled by specifying pages per range (discussed in a later section)

1. Offloads the summarization work to vacuum or autovacuum. So the overhead of index maintenance on transaction / DML operation is minimal.

Putting BRIN into a test

Let’s take a simple example to examine the benefits of BRIN index by creating a simple table.

postgres=# CREATE TABLE testtab (id int NOT NULL PRIMARY KEY,date TIMESTAMP NOT NULL, level INTEGER, msg TEXT);
CREATE TABLE

1 2	postgres=# CREATE TABLE testtab (id int NOT NULL PRIMARY KEY,date TIMESTAMP NOT NULL, level INTEGER, msg TEXT); CREATE TABLE

Now let’s Insert some data into this table.

postgres=# INSERT INTO testtab (id, date, level, msg) SELECT g, CURRENT_TIMESTAMP + ( g || 'minute' ) :: interval, random() * 6, md5(g::text) FROM generate_series(1,8000000) as g;
INSERT 0 8000000

1 2	postgres=# INSERT INTO testtab (id, date, level, msg) SELECT g, CURRENT_TIMESTAMP + ( g \|\| 'minute' ) :: interval, random() * 6, md5(g::text) FROM generate_series(1,8000000) as g; INSERT 0 8000000

Please note that values in id column and date columns keep increasing for new records, which is common for transaction records. This is an important property which BRIN make use off.

A query at this stage may have to do a full scan of the table and it will be quite expensive.

postgres=# explain analyze select * from public.testtab where date between '2019-08-08 14:40:47.974791' and '2019-08-08 14:50:47.974791';
                                                                          QUERY PLAN                                                                         
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather  (cost=1000.00..133476.10 rows=11 width=49) (actual time=31.835..1766.409 rows=11 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  -> Parallel Seq Scan on testtab  (cost=0.00..132475.00 rows=5 width=49) (actual time=1162.029..1739.713 rows=4 loops=3)
        Filter: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))
        Rows Removed by Filter: 2666663
Planning Time: 0.296 ms
Execution Time: 1766.454 ms
(8 rows)

postgres=# explain analyze select * from public.testtab where date between '2019-08-08 14:40:47.974791' and '2019-08-08 14:50:47.974791';

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------

Gather (cost=1000.00..133476.10 rows=11 width=49) (actual time=31.835..1766.409 rows=11 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Parallel Seq Scan on testtab (cost=0.00..132475.00 rows=5 width=49) (actual time=1162.029..1739.713 rows=4 loops=3)

Filter: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))

Rows Removed by Filter: 2666663

Planning Time: 0.296 ms

Execution Time: 1766.454 ms

(8 rows)

As we can see above, PostgreSQL employs 2 workers and does the scan in parallel, where it takes 1766.454 ms. In a nutshell, it is heavy on the system and takes a good amount of time.

As usual, our tendency is to create an index on the filtering column. (B-Tree by default)

postgres=# create index testtab_date_idx  on testtab(date);
CREATE INDEX

1 2	postgres=# create index testtab_date_idx on testtab(date); CREATE INDEX

Now let’s see how the previous SELECT query works:

postgres=# explain analyze select * from public.testtab where date between '2019-08-08 14:40:47.974791' and '2019-08-08 14:50:47.974791';
                                                                        QUERY PLAN                                                                        
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using testtab_date_idx on testtab  (cost=0.43..8.65 rows=11 width=49) (actual time=1.088..1.090 rows=11 loops=1)
  Index Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))
Planning Time: 22.225 ms
Execution Time: 2.657 ms
(4 rows)

postgres=# explain analyze select * from public.testtab where date between '2019-08-08 14:40:47.974791' and '2019-08-08 14:50:47.974791';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------

Index Scan using testtab_date_idx on testtab (cost=0.43..8.65 rows=11 width=49) (actual time=1.088..1.090 rows=11 loops=1)

Index Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))

Planning Time: 22.225 ms

Execution Time: 2.657 ms

(4 rows)

Obviously, B-Tree is lossless, and it can give a tremendous boost to the performance and efficiency of SELECT queries, especially when Index is freshly created. But a B-Tree index has following side effects:

1. There will be a reasonable penalty for DMLs

1. Index size can go big and consume a good amount of shared_buffers

1. The random page seeks, especially when index pages are not cached, can become expensive.

It will be interesting to check the Index to table size ratio of this fresh index.

postgres=# di+ testtab_date_idx;
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+------------------+-------+----------+---------+--------+-------------
public | testtab_date_idx | index | postgres | testtab | 171 MB |
(1 row)

postgres=# di+ testtab_date_idx;

List of relations

--------+------------------+-------+----------+---------+--------+-------------

(1 row)

So we have B-Tree index 171MB for a 1.3GB table. So Index to table size ratio is 0.13, for this is a fresh index.

But this index-table ratio can keep deteriorating over time as index undergoes continuous updates. Index ratio crossing 0.5 is common in many production environments. As the ratio becomes bad, the efficiency of the index goes bad and it starts occupying more shared buffers.

Things get much worse if the table grows to a bigger size (hundreds of GB or TB) as Index also grows in the same ratio. The impact of B-Tree index on DMLs is heavy – sometimes I measured up to 30% overhead especially for bulk loads.

Now let’s see what happens if we replace a B-Tree index with a BRIN index.

postgres=# create index testtab_date_brin_idx  on testtab using brin (date);
CREATE INDEX

1 2	postgres=# create index testtab_date_brin_idx on testtab using brin (date); CREATE INDEX

The very first observation is that the impact of Index on the same bulk insert operation is measured to be 3% which is within my noise/error limits. 30% overhead Vs 3% overhead.

If we consider the size of the newly created BRIN index:

postgres=# di+ testtab_date_brin_idx;
                                List of relations
Schema |         Name          | Type  |  Owner   |  Table  | Size  | Description
--------+-----------------------+-------+----------+---------+-------+-------------
public | testtab_date_brin_idx | index | postgres | testtab | 64 kB |
(1 row)

postgres=# di+ testtab_date_brin_idx;

List of relations

--------+-----------------------+-------+----------+---------+-------+-------------

(1 row)

As we can see it is just 64kB! 171MB of B-Tree index Vs 64 kb of BRIN index.

So far BRIN wins my heart. Now its time to look at how much query performance improvement it can bring in.

postgres=# explain analyze select * from public.testtab where date between '2019-08-08 14:40:47.974791' and '2019-08-08 14:50:47.974791';
                                                                            QUERY PLAN                                                                           
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on testtab  (cost=20.03..33406.84 rows=11 width=49) (actual time=62.762..87.471 rows=11 loops=1)
  Recheck Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))
  Rows Removed by Index Recheck: 12405
  Heap Blocks: lossy=128
  -> Bitmap Index Scan on testtab_date_brin_idx  (cost=0.00..20.03 rows=12403 width=0) (actual time=1.498..1.498 rows=1280 loops=1)
        Index Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))
Planning Time: 0.272 ms
Execution Time: 87.703 ms
(8 rows)

postgres=# explain analyze select * from public.testtab where date between '2019-08-08 14:40:47.974791' and '2019-08-08 14:50:47.974791';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on testtab (cost=20.03..33406.84 rows=11 width=49) (actual time=62.762..87.471 rows=11 loops=1)

Recheck Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))

Rows Removed by Index Recheck: 12405

Heap Blocks: lossy=128

-> Bitmap Index Scan on testtab_date_brin_idx (cost=0.00..20.03 rows=12403 width=0) (actual time=1.498..1.498 rows=1280 loops=1)

Index Cond: ((date >= '2019-08-08 14:40:47.974791'::timestamp without time zone) AND (date <= '2019-08-08 14:50:47.974791'::timestamp without time zone))

Planning Time: 0.272 ms

Execution Time: 87.703 ms

(8 rows)

As I expected, it is not that efficient as a fully cached B-Tree index. However, performance improvement from 1766.454 ms to 87.703 ms means approximately 20 times better! That’s with a single worker consuming fewer resources. With just a 64kb Overhead, the cost-benefit of BRIN is very positive.

Storage and maintenance

As we saw the BRIN index uses much less space compared to normal B-Tree index, but the lossy index can reduce the effectiveness of Index. Luckily this can be adjusted using the storage parameter pages_per_range. BRIN stores entries for a range of pages in the corresponding table. The larger the range of pages, the smaller the index, and it gets lossier.

create index testtab_date_brin_idx on testtab using brin (date) with (pages_per_range = 32);

1	create index testtab_date_brin_idx on testtab using brin (date) with (pages_per_range = 32);

As we already discussed, part of the index maintenance during DML is offloaded to vacuum. In fact, there are 2 cases.

1. Pages in the table which are already summarized are updated

1. New pages which are not part of the summary.

In the first case, summary in BRIN is updated straight away along with DML. But if new pages are not summarized already, it will be done by VACUUM or AUTOVACUUM.

There are 2 functions provided for this. A high-level, function call which can be executed against the index like:

postgres=# select brin_summarize_new_values('testtab_date_brin_idx'::regclass);
brin_summarize_new_values
---------------------------
2577
(1 row)

postgres=# select brin_summarize_new_values('testtab_date_brin_idx'::regclass);

brin_summarize_new_values

---------------------------

2577

(1 row)

This summarizes all ranges that are not currently summarized. The return value indicates the number of new page range summaries that were inserted into the index. This function operates on the top of a lower-level function brin_summarize_range which accepts the range of pages.

postgres=# SELECT brin_summarize_range('testtab_date_brin_idx', 10);
brin_summarize_range
----------------------
0
(1 row)

postgres=# SELECT brin_summarize_range('testtab_date_brin_idx', 10);

brin_summarize_range

----------------------

(1 row)

A key point to note is that auto-summarization is off by default, which we can enable by storage parameter auto summarize:

postgres=# alter index testtab_date_brin_idx set (pages_per_range = 64,autosummarize = on);
ALTER INDEX

1 2	postgres=# alter index testtab_date_brin_idx set (pages_per_range = 64,autosummarize = on); ALTER INDEX

In this case, automatic summarization will be executed by autovacuum as insertions occur.

Limitation:

BRIN indexes are efficient if the ordering of the key values follows the organization of blocks in the storage layer. In the simplest case, this could require the physical ordering of the table, which is often the creation order of the rows within it, to match the key’s order. Keys on generated sequence numbers or created data are best candidates for BRIN index.

Discuss on Hacker News

0 0 votes

Article Rating

9 Comments

Oldest

Newest Most Voted

Radu

6 years ago

Can you include some tests that consider updates? Last time I’ve checked BRIN indexes were showing great performance for append only tables but any update activity killed performance.

Author

Jobin Augustine

6 years ago

Reply to Radu

Thank you for your valuable comment.
I shall publish benchmark results on the same soon.

Rahmat Ihsan

6 years ago

Is there a feature in mysql similar to this?

Robert

6 years ago

> it is meaningful to have only one BRIN index on a table

I think that’s going a bit far. There are plenty of cases where columns have strong correlation with each other, in which case BRIN indexes on either should work pretty well together, no?

And I suspect there are many cases where, even with less-correlated data, the performance can still be an improvement over a sequential scan.

Author

Jobin Augustine

6 years ago

Reply to Robert

I agree with you Robert, that was a bit stretched statement. if there is good correlation among values sitting in adjacent pages, Nothing should be stopping us from using BRIN. I am removing that statement.
Thank you for your valuable comment.

Spindel

6 years ago

In our usecases we found that adding a Btree index on the columns, then performing a CLUSTER using the Btree index, and finally replacing it with a BRIN index was the way to get long-term storage and performance efficiency using Brin indexes.

Brin can have horrible perfomance if the on-disk layout is adversial, so it’s use-case is quite limited as that, and since it cannot be used as a index for CLUSTER.

A typical example is time-series data:
id, timestamp, [value, value…]

Where queries often are of the form id=X timestamp between X, y but where the natural insert-order of things will cause data to be fragmented for such queries, making a BRIN index in “natural order” horribly inefficient. doing a one-time “index-cluster-index” operation can then lay it out so the BRIN index is instead incredibly efficient.

It needs to be taken in account when picking indexes.