Hypothetical Indexes in PostgreSQL

PostgreSQL LogoAt recent conferences, we have received a curious question from users who have used Oracle in the past and are now using PostgreSQL: “Do we have hypothetical indexes in PostgreSQL ?“. The answer to that question is YES. The true meaning of hypothetical is imaginary but not real. We can try creating indexes that are imaginary for the PostgreSQL optimizer which don’t exist in reality. Let’s look at some detailed examples.

How are hypothetical indexes helpful?

The execution time of an SQL in most of the relational databases depends on the cost of the execution. An optimizer chooses the plan that has the lowest cost, and thus considers further phases such as execute and fetch using that plan. One of the easiest ways of optimizing a well-written SQL is through appropriate indexes that are suitable for that query. An index may reduce the number of pages fetched from disk and may live in the cache due to its size (<<< table size). So, indexing is always a low-hanging fruit for admins and developers wishing to tune an SQL.

But often developers wish to see if an index can be really helpful. The only option available is to create the index and check to see if the queries are able to use it with a reduced cost. Creating the index may be fine if it is in a test environment, or a table that is not huge in size, but if for some reason you wish to see if an index can be helpful for an SQL without actually creating it, then you may test and try this extension to create hypothetical indexes.

PostgreSQL Extension for hypothetical indexes

In order to achieve the functionality of creating imaginary indexes in PostgreSQL, we need to use an extension named hypopg. This extension is made available in the PGDG repository from PostgreSQL 9.2 and works until the latest active release PostgreSQL 11.3. In order to create this extension, you may just use yum or apt depending on your linux distro or compile it from source.

Installing hypopg

In RedHat/CentOS

For Debian/Ubuntu, you must make sure you have the development package named: postgresql-server-dev-X where X is the major version.

Creating the Extension

In order to show you the list of functions created by hypopg, I have created this extension in a different schema. Here is the list:

hypopg Functions

hypopg_create_index: This function is used to create a hypothetical index. We do not have to specify the name of the index because it is ignored by this function anyway.

hypopg_drop_index: This function is used to drop a hypothetical index upon testing. As soon as you disconnect your session, the index automatically gets dropped as it is only created in the private memory of that session.

hypopg_get_indexdef: The definition of the hypothetical index we have created using this function.

hypopg_list_indexes: This function helps us list all the hypothetical indexes we have created in our session.

hypopg_relation_size: This function helps us estimate the index size, so we know the approximate amount of space this index could occupy.

hypopg_reset: This function drops all the hypothetical indexes created by us in a single command.

How does a hypothetical index work in reality?

When you use the hypopg_create_index function to create a hypothetical index, it creates an index in our connection’s private memory. None of the catalog tables nor the original tables are actually touched. The only way to see if we can benefit from that index is by running an EXPLAIN <QUERY>. However, if you wish to run an EXPLAIN ANALYZE that runs that SQL and provides the run time stats, it would not be able to use that hypothetical index. This is because a hypothetical index does not exist in reality.

Testing hypothetical indexes

While testing hypothetical indexes, I want to check if there are any cases where it may not work. Because our main reason for using this extension is to test whether the optimizer could identify the index we wish to create, I have thought of testing almost all types of the most-used indexes in PostgreSQL such as BTREE, HASH, GIN, GiST, BRIN, and BLOOM indexes. Only BTREE indexes are currently mentioned as applicable for hypothetical indexes.

For this reason, I have created the following table which may not make much sense for a real-time use case but helps us test all the types of indexes we could create.

Btree Indexes

We may try creating a BTREE index on the quantity column to speed up the performance of queries that use that column as a predicate. As you can see in the following log, I have created a hypothetical btree index, and the EXPLAIN on the SQL shows an index scan where the cost of using an index is much less than a sequence scan without the index. With this exercise, we know that creating this index could help optimize the SQL we tested.

BLOOM Indexes

The documentation does not mention that hypothetical indexes currently work for BLOOM indexes. However, I have tested to see if an EXPLAIN could show a plan that uses the hypothetical bloom index, and it did indeed work.

BRIN Indexes

I have then tried to create a hypothetical BRIN index to see if I get any errors because only btree hypothetical indexes are currently supported. To my surprise, I didn’t see any errors. But then I do see a strange error when trying to do an EXPLAIN on the SQL, or even when I try to run a SELECT on that table, as you see in the following log. So, just because it allows you to create the brin hypothetical index doesn’t mean it will work.

Hash, GIN, GiST or other Indexes

Unlike BRIN indexes, when we try to create any other type of hypothetical index, it throws an exact error message that states that the index type we have specified is not supported.

Conclusion

It is great to see some effort towards helping developers and admins with an extension which helps them know if an index can be used without actually having to create it. This is something we could use to automate index recommendations with a combination of another extension called: pg_qualstats, which we shall discuss in my next blog post. Thanks to all the contributors who have helped us achieve this functionality in PostgreSQL.

Discuss on HackerNews.

Share this post

Comment (1)

  • Julien Rouhaud Reply

    Thanks for this article! Here are a few comments:

    – the issue with BRIN hypothetical indexes is a actually regression introduced in pg10, I submitted a patch to fix that (https://commitfest.postgresql.org/23/2179/), I hope it’ll be fixed with the next minor release
    – I’ll make it clearer in the documentation that multiple AM can be used with hypopg.
    – about pg_qualstats and index suggestion, you can look at PoWA (https://powa.readthedocs.io/en/latest/), which has a decent global index suggestion feature, among other things. You can see it in action in the demo website, the “tpc” database is quite relevant for that.

    June 28, 2019 at 7:02 am

Leave a Reply