First, a big thanks for the kind responses to this series, and, as requested, here is an overview of index types available in PostgreSQL. The supporting video with bonus material can be found here.
PostgreSQL has several popular index types including B-tree, Hash, GiST, SP-GiST, GIN, and BRIN. Actually, a couple of those are frameworks that allow you to take advantage of special cases someone else has engineered or allow you to make your own index handler. Each index type uses a different algorithm that is best suited to different types of queries.
By default, the CREATE INDEX command creates B-tree (actually b+ trees) indexes. Yup, just like MySQL. B+ Trees are great for binary and range searches.
|
1 |
test=# CREATE TABLE staff (id) as SELECT 'Employee' || x FROM generate_series(1,500) as g(x);<br>SELECT 500<br>test=# CREATE INDEX staff_id ON staff(id);<br>CREATE INDEX<br>test=# EXPLAIN (verbose) SELECT * FROM staff WHERE id='Employee101';<br> QUERY PLAN<br>-----------------------------------------------------------------------------------<br> Index Only Scan using staff_id on public.staff (cost=0.27..8.29 rows=1 width=11)<br> Output: id<br> Index Cond: (staff.id = 'Employee101'::text)<br>(3 rows) |
The above example shows a simple table created and populated with data. Then an index is created on the sole column in the table. Using EXPLAIN (with the VERBOSE option) shows us that the indexed is used to speed up the query.
Bitmaps are cool as they are used when an index lookup might generate multiple hits on the same heap (data) page or using multiple indexes for a single query is useful. Bitmap indexes allow heap pages to be visited only once for multiple matches, they can merge the results from several indexes with AND/OR filtering, and they are automatically enabled by the optimizer.
MySQL has had hash joins for a while now and you are probably already used to them. Hash indexes store a 32-bit hash code derived from the value of the indexed column. These values can only be compared by simple equality comparisons. The query planner will consider using a hash index whenever an indexed column is involved in a comparison using the equal operator.
GiST is an infrastructure within which many different tree-based indexing strategies can be implemented.
GIST Indexes are not a single kind of index.
Some of the contributed models:
GiST indexes are also capable of optimizing “nearest-neighbor” searches, such asSELECT * FROM places
ORDER BY location point '(101,456)'
LIMIT 10;
This will find the ten places closest to a given target point.
SP-GiST indexes, like GiST indexes, offer an infrastructure that supports various kinds of searches. SP-GiST permits the implementation of a wide range of different non-balanced disk-based data structures, such as quadtrees, k-d trees, and radix trees (tries).
As an example, the standard distribution of PostgreSQL includes SP-GiST operator classes for two-dimensional points
GIN indexes are “inverted indexes” which are appropriate for data values that contain multiple component values, such as arrays. Think text or JSON data for this type of index. And GIN is similar to MySQL’s multi-valued indexes where a key is stored once with many row pointers.
BRIN or Block Range INdexes indexes store summaries of the values stored in consecutive physical block ranges of a table. Thus, they are most effective for columns whose values are well-correlated with the physical order of the table rows. If the data is truly random, or if there is much churn of the key values in a ‘hot’ database, the assumptions underlying BRIN may break down. Like GiST, SP-GiST and GIN, BRIN can support many different indexing strategies, and the particular operators with which a BRIN index can be used vary depending on the indexing strategy. For data types that have a linear sort order, the indexed data corresponds to the minimum and maximum values of the values in the column for each block range.
This supports indexed queries using these operators: < = >
In the following example note that the BRIN index is measured in KILObytes not MEGAbytes.
|
1 |
test=# create table demo as SELECT generate_series(1,1000000) as x;<br>SELECT 1000000<br>test=# create index btree_idx on demo(x);<br>CREATE INDEX<br>test=# create index brin_idx on demo using brin(x);<br>CREATE INDEX<br>test=# SELECT relname, pg_size_pretty(pg_relation_size(oid)) <br> FROM pg_class WHERE relname LIKE 'brin_%' OR relname = 'btree_idx' <br> ORDER BY relname;<br> relname | pg_size_pretty<br>-------------------+----------------<br> brin_idx | 24 kB<br> brin_random_x | 24 kB<br> btree_idx | 21 MB<br> |
Suggested Reading
I used these two sources in preparing this post. The PostgreSQL manual is an amazing resource and Bruce Momjian is a treasure (and you should be reading all his presentations):
https://www.postgresql.org/docs/current/indexes-types.html
https://momjian.us/main/writings/pgsql/indexing.pdf
The past videos for PostgreSQL for MySQL Database Administrators (DBA) can be found here: episode one, episode two, episode three, episode four, episode five, episode six, episode seven, episode eight, episode nine, and episode ten.