Automatic Index Recommendations in PostgreSQL using pg_qualstats and hypopg

PostgreSQLIn my previous blog post, we have seen how the extension hypopg can be helpful in creating hypothetical indexes in PostgreSQL. If you have read it already, you know that it is very useful in verifying whether an Index can improve the performance of an SQL without having to create it in reality. Considering this, is there also a way to automatically suggest which indexes can improve the performance of some of your SQL’s? The answer is… YES!

In this blog post, we will see how we can get a direct recommendation to improve a specific query, as seen in the following snippet.

Without any further delay, let’s discuss the extension pg_qualstats which enables us to achieve this requirement for PostgreSQL versions 9.4 or later. Following that, we will take a look at a logic which could automatically suggest what indexes would be helpful for query optimizations – without much manual work.

All of the logic discussed in this blog post is reproducible, so please feel free to do so using the commands and try tuning your custom queries in test environments.

pg_qualstats

pg_qualstats is an extension developed by the POWA Team to uncover the need for storing predicates (quals). It stores the predicates found in WHERE clauses and JOIN conditions. This helps us analyze a query execution and opens up the possibility of automatic query optimizations.

When we query pg_stat_statements like extensions, we only see the prepared SQL or the SQL without any bind variable values. As you cannot perform an EXPLAIN (to see the execution plan of a Query) without the quals, the only option available is to see if that query was logged in the PostgreSQL log file and then identify the parameters passed to it. Or you can maybe use some arbitrary values, but this requires some manual intervention and time. But, when you create this extension: pg_qualstats, it stores queries along with the actual quals based on the sample rate ( pg_qualstats.sample_rate) specified.

Creating the extension: pg_qualstats

For RedHat/CentOS, we can install it using the packages available in the PGDG repository. Once you have added the PGDG repo, simply run the following command:

Similarly for Ubuntu/Debian:

Once installed, you must add pg_qualstats to shared_preload_libraries. This requires a restart. As I am also using pg_stat_statements to get the queryid associated with each query recorded by pg_qualstats, I have the following setting in my postgresql.conf file:

Modifying the above parameter requires a restart of your PostgreSQL instance.

Some GUCs you should know

In order to start taking advantage of this extension, you may have to set some of the GUCs (Grand Unified Configuration) in your PostgreSQL server. These can be set using ALTER SYSTEM or by manually adding the associated entries in your postgresql.conf or postgresql.auto.conf files.

pg_qualstats.enabled: true or false (to enable or to disable pg_qualstats). ON by default.

pg_qualstats.track_constants: true or false (to enable tracking of each constant. False would reduce the number of entries to track predicates.)

pg_qualstats.max: The number of queries tracked. Defaults to 1000.

pg_qualstats.resolve_oids: Just store the oids or resolve them and store at query time. This takes additional space.

pg_qualstats.track_pg_catalog: Defaults to false. Whether or not the predicates of the objects in pg_catalog schema should be computed.

pg_qualstats.sample_rate: Default is -1. The fraction of queries to be sampled. -1 defaults to (1/max_connections). When set to 1, everything is sampled. Similarly when set to 0.1, one out of 10 queries are sampled.

Quals and Query Examples

In order to see this in action, let us use sysbench-tpcc to generate some SQL traffic, and then see some of the details captured.

Before running sysbench-tpcc, I have created all the required extensions as seen in the following log. In order to see the queryid (same as the queryid column of pg_stat_statements) associated with each qual captured, it is important to have the extension: pg_stat_statements created. Similarly, to create hypothetical indexes, we need to have the extension: hypopg created.

For this test, I have set  pg_qualstats.sample_rate to 1. Setting this to 1 captured every qual. We can, of course, reduce the sampling to ensure not everything is captured.

Installing and running sysbench-tpcc on RedHat/CentOS

After running the benchmark for 10 seconds, we are now ready to query the view: pg_qualstats_indexes that gives us a nice view of the columns on which there are no indexes.