I am very passionate about database observability, and I believe query performance observability is the most important insight you can get in your database. Why? Because if you look from an application developer’s point of view, once a database is provisioned and you can connect to it, responding to your queries promptly and correctly is essentially all that you need from the database. This applies both to the databases which you deploy on-prem or in cloud-based DBaaS offerings.
PostgreSQL has a fantastic extension for capturing query performance called pg_stat_statements which captures a lot of query execution details; however, it did not capture all the details we wanted to capture in order to provide deep query performance insights in Percona Monitoring and Management. So, we developed a new pg_stat_monitor plugin which has slightly different design goals and usage patterns, while providing everything you’re used to from pg_stat_statements, and more.
At this point, pg_stat_monitor is released as a separate project from Percona, so it is easier for us to experiment with new approaches and ideas. At the same time, we’re very much looking forward to working with the PostgreSQL community to see if there are any features that have enough consensus to be incorporated into pg_stat_statements, too.
- Time Bucketing: Instead of supplying one set of ever-increasing counts, pg_stat_monitor computes stats for a configured number of time intervals. This allows for much better data accuracy, especially in the case of high resolution or unreliable network.
- Multi-Dimensional grouping: While pg_stat_statements groups counters by (userid, dbid, queryid), pg_stat_monitor uses (userid, clientip, dbid, queryid). This allows you to drill down into the performance of queries coming from particular client addresses, which we at Percona have found to be very valuable in a number of cases.
- Capture Actual Parameters in the Queries: pg_stat_monitor allows you to choose if you want to see queries with placeholders for parameters or actual query examples. We found having the full query example is very helpful, as you can run EXPLAIN on it or easily play with modifying the query to make it run better, as well as making communication about the query clearer when discussing with other DBAs and application developers.
- Store Information about tables accessed by statement: This allows us to identify all queries which accessed a given table easily. Such an approach is more reliable than parsing queries to extract such information.
- Response time distribution histogram: While min/max/avg query execution statistics are great, especially when computed over a short time bucket, they are hard to understand when trying to look at them for a long period of time. For this reason, we have added a query response time histogram which can offer better insights.
It is important to note that we are able to achieve these additional features while having performance overhead comparable to the original pg_stat_statements extension.
pg_stat_monitor Planned Features
There are a number of features we want to implement but have not yet figured out how to achieve them with the PostgreSQL extension (and if it is possible at all without patching PostgreSQL code). If you have ideas on how to achieve any of these, let us know.
- Capturing All the Queries: Capture all queries (not only successful queries) so the queries which terminated with errors are not missed (or require another data source to capture). Once we have this, we can measure performance for successful and failed queries separately, so that an elevated failure rate can be seen as a different signal than different query performance.
- Errors and Warnings Histogram: Not all queries succeed and when they fail you really want to know why. We would like to see not only the number of times a given query failed but what the cause of failure was. The same query may fail due to a permission error or foreign key violation which requires an entirely different action to remediate. It is also very helpful to know which query triggers particular warnings.
- Accurate Wait Event Attribution to Queries: For a given query type it would be great to see where its response time comes from. Wait Events is one way to capture this and currently, query attribution can be done through a high-frequency sampling of the pg_stat_activity table. However, it is not accurate and not scalable with a large number of active backends.
pg_stat_monitor is now available as a Technical Preview and we’re releasing it to get feedback from the community on what we’re doing right and what we should do differently before we release pg_stat_monitor as a generally available version to be supported for years to come. Please check it out, drop us a note, file an issue, or make a pull request!
Our white paper “Why Choose PostgreSQL?” looks at the features and benefits of PostgreSQL and presents some practical usage examples. We also examine how PostgreSQL can be useful for companies looking to migrate from Oracle.