This PoC demonstrates how to install and configure pg_stat_monitor in order to extract useful and actionable metrics from a PostgreSQL database and display them on a Grafana dashboard.
In order to investigate the potential opportunities for implementing constructive and useful metrics derived from PostgreSQL into Grafana, it is necessary to generate loading using pgbench.
For our purposes, the Grafana datasource used in this PoC is also the Postgres data cluster that is generating the data to be monitored.

pg_stat_monitor is a Query Performance Monitoring tool for PostgreSQL. It collects various statistics data such as query statistics, query plan, SQL comments, and other performance insights. The collected data is aggregated and presented in a single view.
pg_stat_monitor takes its inspiration from pg_stat_statements. Unlike pg_stat_statements, which aggregates its metrics from the last time it was zeroed, pg_stat_monitor possesses the ability to bucket its output within a set number of aggregated results, thus saving user efforts from doing it themselves.
pg_stat_monitor tracks the following operations:
The simplest way to get pg_stat_monitor is to install it via Percona Distribution for PostgreSQL.
The following instructions demonstrate installing Percona Distribution for PostgreSQL and pg_stat_monitor on a CENTOS8 OS Linux distribution:
|
1 |
# Install The Percona Repository<br>dnf install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm<br>percona-release setup ppg14<br><br># Install The postgres Community Repository<br>dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm<br>dnf update -y<br>dnf install -y pg_stat_monitor_14<br><br># perform standard initialization and systemd configurations<br>/usr/pgsql-14/bin/postgresql-14-setup initdb<br><br># configure postgres to use pg_stat_monitor<br>echo "<br>shared_preload_libraries=pg_stat_monitor<br>" >> /var/lib/pgsql/14/data/postgresql.auto.conf<br><br># complete postgres configuration<br>systemctl enable postgresql-14<br>systemctl start postgresql-14<br> |
The pg_stat_monitor extension can be created in any database, but for the purposes of this PoC, it is placed in the database pgbench.
|
1 |
postgres=# create database pgbench;<br>postgres=# c pgbench<br>pgbench=# create extension pg_stat_monitor;<br><br>pgbench=# d<br> List of relations<br> Schema | Name | Type | Owner <br>--------+-----------------+------+----------<br> public | pg_stat_monitor | view | postgres |
|
1 |
View "public.pg_stat_monitor"<br> Column | Type | Collation | Nullable | Default <br>---------------------+--------------------------+-----------+----------+---------<br> bucket | bigint | | | <br> bucket_start_time | timestamp with time zone | | | <br> userid | oid | | | <br> username | text | | | <br> dbid | oid | | | <br> datname | text | | | <br> client_ip | inet | | | <br> pgsm_query_id | bigint | | | <br> queryid | bigint | | | <br> toplevel | boolean | | | <br> top_queryid | bigint | | | <br> query | text | | | <br> comments | text | | | <br> planid | bigint | | | <br> query_plan | text | | | <br> top_query | text | | | <br> application_name | text | | | <br> relations | text[] | | | <br> cmd_type | integer | | | <br> cmd_type_text | text | | | <br> elevel | integer | | | <br> sqlcode | text | | | <br> message | text | | | <br> calls | bigint | | | <br> total_exec_time | double precision | | | <br> min_exec_time | double precision | | | <br> max_exec_time | double precision | | | <br> mean_exec_time | double precision | | | <br> stddev_exec_time | double precision | | | <br> rows | bigint | | | <br> shared_blks_hit | bigint | | | <br> shared_blks_read | bigint | | | <br> shared_blks_dirtied | bigint | | | <br> shared_blks_written | bigint | | | <br> local_blks_hit | bigint | | | <br> local_blks_read | bigint | | | <br> local_blks_dirtied | bigint | | | <br> local_blks_written | bigint | | | <br> temp_blks_read | bigint | | | <br> temp_blks_written | bigint | | | <br> blk_read_time | double precision | | | <br> blk_write_time | double precision | | | <br> resp_calls | text[] | | | <br> cpu_user_time | double precision | | | <br> cpu_sys_time | double precision | | | <br> wal_records | bigint | | | <br> wal_fpi | bigint | | | <br> wal_bytes | numeric | | | <br> bucket_done | boolean | | | <br> plans | bigint | | | <br> total_plan_time | double precision | | | <br> min_plan_time | double precision | | | <br> max_plan_time | double precision | | | <br> mean_plan_time | double precision | | | <br> stddev_plan_time | double precision | | | |
pgbench is a simple program executing benchmark tests on PostgreSQL by running the same sequence of SQL commands over and over. pgbench is capable of executing multiple concurrent database sessions and can calculate the average transaction rate (TPS) at the end of a run. Although the default configuration simulates loading based loosely upon TPC-B, it is nevertheless easy to test other use cases by writing one’s own transaction script files.
While it is reasonable to create panels showing real-time load in order to explore better the types of queries that can be run against pg_stat_monitor, it is more practical to copy and query the data into tables after the benchmarking has completed its run.
Save the data generated from a recently completed benchmark run into an archive table:
|
1 |
select * into pg_stat_monitor_archive <br>from pg_stat_monitor <br>order by bucket_start_time asc |
|
1 |
Table "public.pg_stat_monitor_archive"<br> Column | Type | Collation | Nullable | Default <br>--------------------+--------------------------+-----------+----------+---------<br>bucket | bigint | | | <br>bucket_start_time | timestamp with time zone | | | <br>userid | oid | | | <br>username | text | | | <br>dbid | oid | | | <br>datname | text | | | <br>client_ip | inet | | | <br>pgsm_query_id | bigint | | | <br>queryid | bigint | | | <br>toplevel | boolean | | | <br>top_queryid | bigint | | | <br>query | text | | | <br>comments | text | | | <br>planid | bigint | | | <br>query_plan | text | | | <br>top_query | text | | | <br>application_name | text | | | <br>relations | text[] | | | <br>cmd_type | integer | | | <br>cmd_type_text | text | | | <br>elevel | integer | | | <br>sqlcode | text | | | <br>message | text | | | <br>calls | bigint | | | <br>total_exec_time | double precision | | | <br>min_exec_time | double precision | | | <br>max_exec_time | double precision | | | <br>mean_exec_time | double precision | | | <br>stddev_exec_time | double precision | | | <br>rows | bigint | | | <br>shared_blks_hit | bigint | | | <br>shared_blks_read | bigint | | | <br>shared_blks_dirtied | bigint | | | <br>shared_blks_written | bigint | | | <br>local_blks_hit | bigint | | | <br>local_blks_read | bigint | | | <br>local_blks_dirtied | bigint | | | <br>local_blks_written | bigint | | | <br>temp_blks_read | bigint | | | <br>temp_blks_written | bigint | | | <br>blk_read_time | double precision | | | <br>blk_write_time | double precision | | | <br>resp_calls | text[] | | | <br>cpu_user_time | double precision | | | <br>cpu_sys_time | double precision | | | <br>wal_records | bigint | | | <br>wal_fpi | bigint | | | <br>wal_bytes | numeric | | | <br>bucket_done | boolean | | | <br>plans | bigint | | | <br>total_plan_time | double precision | | | <br>min_plan_time | double precision | | | <br>max_plan_time | double precision | | | <br>mean_plan_time | double precision | | | <br>stddev_plan_time | double precision | | | <br> |
Extract this metric of interest, i.e., time vs total execution time:
|
1 |
select bucket_start_time, pgsm_query_id, queryid, total_exec_time<br>into pg_stat_monitor_qry<br>from pg_stat_monitor <br>order by bucket_start_time asc |
|
1 |
pgbench=# d pg_stat_monitor_qry<br> Table "public.pg_stat_monitor_qry"<br> Column | Type | Collation | Nullable | Default <br>-------------------+--------------------------+-----------+----------+---------<br> bucket_start_time | timestamp with time zone | | | <br> pgsm_query_id | bigint | | | <br> queryid | bigint | | | <br> total_exec_time | double precision | | | |
Extract this metric of interest, i.e., time vs shared_blk io:
|
1 |
select bucket_start_time, pgsm_query_id, queryid,<br> shared_blks_hit, shared_blks_read,<br> shared_blks_dirtied, shared_blks_written<br> into pg_stat_monitor_shared_blk_io<br> from pg_stat_monitor_archive<br> order by bucket_start_time asc; |
|
1 |
pgbench=# d pg_stat_monitor_shared_blk_io<br> Table "public.pg_stat_monitor_shared_blk_io"<br> Column | Type | Collation | Nullable | Default <br>---------------------+--------------------------+-----------+----------+---------<br>bucket_start_time | timestamp with time zone | | | <br>pgsm_query_id | bigint | | | <br>queryid | bigint | | | <br>shared_blks_hit | bigint | | | <br>shared_blks_read | bigint | | | <br>shared_blks_dirtied | bigint | | | <br>shared_blks_written | bigint | | | |
Note: this metric requires runtime parameter track_io_timing to be set on.
Extract this metric of interest, i.e., time vs. blk io:
|
1 |
select bucket_start_time, pgsm_query_id, queryid, blk_read_time, blk_write_time<br> into pg_stat_monitor_blk_io<br> from pg_stat_monitor_archive<br> order by bucket_start_time asc; |
Save a copy of all unique query IDs in order to parse out future queries from the view.
Column pgsm_query_id identifies the query in such a manner that one can still identify the same query even when generated on other platforms under different loading conditions with different data:
|
1 |
with a as (select distinct on (pgsm_query_id) *<br> from pg_stat_monitor_archive<br> where application_name='pgbench')<br>select cmd_type, cmd_type_text,pgsm_query_id, queryid,query as example_query<br> into pg_stat_monitor_uniq_id<br> from a<br> order by cmd_type; |
|
1 |
pgbench=# d pg_stat_monitor_uniq_id<br> Table "public.pg_stat_monitor_uniq_id"<br> Column | Type | Collation | Nullable | Default<br>---------------+---------+-----------+----------+---------<br> cmd_type | integer | | |<br> cmd_type_text | text | | |<br> pgsm_query_id | bigint | | |<br> queryid | bigint | | |<br> example_query | text | | | |
This is an example set of queries generated by pgbench. Note the numbers in column pgsm_query_id are always the same values irrespective of hosts or environments:
|
1 |
select cmd_type_text, pgsm_query_id, example_query <br>from pg_stat_monitor_uniq_id where cmd_type > 0; |
|
1 |
cmd_type_text | pgsm_query_id | example_query<br>---------------+----------------------+-----------------------------------------------------------------<br> SELECT | -7455620703706695456 | SELECT abalance FROM pgbench_accounts WHERE aid = 16416498<br> UPDATE | -510321339504955469 | UPDATE pgbench_accounts SET abalance = abalance + 2063 <br> WHERE aid = 1482568<br> UPDATE | 5276535447716615446 | UPDATE pgbench_branches SET bbalance = bbalance + 1384 <br> WHERE bid = 7<br> UPDATE | 3629195281782908951 | UPDATE pgbench_tellers SET tbalance = tbalance + -2966 <br> WHERE tid = 330<br> INSERT | -8751124061964589929 | INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) <br> VALUES (321, 56, 21104880, 4211, CURRENT_TIMESTAMP)<br> |
Two types of performance monitoring are profiled:
Although the results of the benchmarking can be viewed by querying the view pg_stat_monitor you will note, as demonstrated by the bash script and SQL statements below, that the contents of the view is immediately copied and saved into a collection of tables. This is because the data will disappear over time as pg_stat_monitor cycles through its allotted number of buckets.
A script executing a benchmarking run:
|
1 |
#!/bin/bash<br><br># REFERENCE<br># https://docs.percona.com/pg-stat-monitor/reference.html<br>#<br>set -e<br><br>export PGPASSWORD="MYPASSWORD" <br> PGHOST='MYHOST' <br> PGPORT=5434 <br> PGUSER=postgres<br>#<br># initialize benchmarking database<br>#<br>dropdb --if-exists pgbench<br>createdb pgbench<br>/usr/pgsql-12/bin/pgbench -i --foreign-keys -s 300 pgbench<br>psql pgbench -c 'create extension pg_stat_monitor'<br><br>#<br># configure pg_stat_monitor, requires system restart<br>#<br>psql postgres <<_eof_<br>-- set bucket time range, default is normally 60 seconds<br> alter system set pg_stat_monitor.pgsm_bucket_time = '1min';<br>-- set number of buckets, default is normally 10<br> alter system set pg_stat_monitor.pgsm_max_buckets = 75;<br>_eof_<br><br>systemctl restart postgresql@13-main<br><br>psql postgres <<_eof_<br>-- zero pg_stat_monitor stats<br> select * from pg_stat_monitor_reset();<br>_eof_<br><br>#<br># begin benchmarking run<br>#<br># 4500 seconds (75 minutes)<br>/usr/pgsql-12/bin/pgbench -U postgres -c 4 -j 2 -T 4500 -P 5 -b tpcb-like pgbench<br><br>#<br># copy and save the benchmarking run into tables<br>#<br>psql postgres <<_eof_<br> drop table if exists pg_stat_monitor_archive, <br> pg_stat_monitor_qry, <br> pg_stat_monitor_uniq_id;<br><br> select * into pg_stat_monitor_archive from pg_stat_monitor order by bucket_start_time;<br><br> select bucket_start_time, pgsm_query_id, queryid, total_exec_time<br> into pg_stat_monitor_qry<br> from pg_stat_monitor_archive<br> where application_name='pgbench';<br><br> with a as (select distinct on (pgsm_query_id) * <br> from pg_stat_monitor_archive <br> where application_name='pgbench') <br> select cmd_type, cmd_type_text,pgsm_query_id, queryid,query as example_query<br> into pg_stat_monitor_uniq_id<br> from a<br> order by cmd_type;;<br>_eof_<br><br><br>echo "DONE"<br> |
|
1 |
progress: 4435.0 s, 341.2 tps, lat 11.718 ms stddev 3.951<br>progress: 4440.0 s, 361.2 tps, lat 11.075 ms stddev 3.519<br>progress: 4445.0 s, 348.0 tps, lat 11.483 ms stddev 5.246<br>progress: 4450.0 s, 383.8 tps, lat 10.418 ms stddev 4.514<br>progress: 4455.0 s, 363.6 tps, lat 10.988 ms stddev 4.326<br>progress: 4460.0 s, 344.0 tps, lat 11.621 ms stddev 3.981<br>progress: 4465.0 s, 360.4 tps, lat 11.093 ms stddev 4.457<br>progress: 4470.0 s, 383.8 tps, lat 10.423 ms stddev 5.615<br>progress: 4475.0 s, 369.6 tps, lat 10.811 ms stddev 3.784<br>progress: 4480.0 s, 355.6 tps, lat 11.227 ms stddev 3.954<br>progress: 4485.0 s, 378.8 tps, lat 10.580 ms stddev 2.890<br>progress: 4490.0 s, 370.8 tps, lat 10.770 ms stddev 2.879<br>progress: 4495.0 s, 365.2 tps, lat 10.947 ms stddev 4.997<br>progress: 4500.0 s, 379.2 tps, lat 10.549 ms stddev 2.832 |
|
1 |
transaction type: <builtin: TPC-B (sort of)><br>scaling factor: 300<br>query mode: simple<br>number of clients: 4<br>number of threads: 2<br>duration: 4500 s<br>number of transactions actually processed: 1564704<br>latency average = 11.497 ms<br>latency stddev = 4.800 ms<br>tps = 347.711175 (including connections establishing)<br>tps = 347.711731 (excluding connections establishing) |
Five (5) SQL statements are used to create this panel:
|
1 |
-- SELECT --<br>select bucket_start_time,total_exec_time as "SELECT"<br>from pg_stat_monitor_qry<br>join pg_stat_monitor_uniq_id using (pgsm_query_id)<br>where a.cmd_type_text='SELECT'<br>order by 1 asc;<br> |
|
1 |
-- INSERT --<br>select bucket_start_time,total_exec_time as "INSERT"<br>from pg_stat_monitor_qry<br>join pg_stat_monitor_uniq_id using (pgsm_query_id)<br>where a.cmd_type_text='INSERT'<br>order by 1 asc; |
|
1 |
-- UPDATE 1 --<br>select bucket_start_time,total_exec_time as "UPDATE 1"<br>from pg_stat_monitor_qry<br>join pg_stat_monitor_uniq_id using (pgsm_query_id)<br>where cmd_type_text='UPDATE'<br>and pgsm_query_id = -510321339504955469<br>order by 1 asc; |
|
1 |
-- UPDATE 2 --<br>select bucket_start_time,total_exec_time as "UPDATE 2"<br>from pg_stat_monitor_qry<br>join pg_stat_monitor_uniq_id using (pgsm_query_id)<br>where cmd_type_text='UPDATE'<br>and pgsm_query_id = 5276535447716615446<br>order by 1 asc; |
|
1 |
-- UPDATE 3 --<br>select bucket_start_time,total_exec_time as "UPDATE 3"<br>from pg_stat_monitor_qry<br>join pg_stat_monitor_uniq_id using (pgsm_query_id)<br>where cmd_type_text='UPDATE'<br>and pgsm_query_id = 3629195281782908951<br>order by 1 asc; |
|
1 |
-- INSERT (ins_[hit|read|dirty|write]) --<br>select bucket_start_time,<br> shared_blks_hit as ins_hit, shared_blks_read as ins_read,<br> shared_blks_dirtied as ins_dirt, shared_blks_written as ins_writ<br>from pg_stat_monitor_shared_blk_io<br>join pg_stat_monitor_uniq_id using (pgsm_query_id)<br>where cmd_type_text='INSERT'<br>order by 1 asc; |
|
1 |
-- UPDATE 1 (update1_[hit|read|dirty|write]) --<br>select bucket_start_time,<br> shared_blks_hit as update1_hit, shared_blks_read as update1_read,<br> shared_blks_dirtied as update1_dirt, shared_blks_written as update1_writ<br>from pg_stat_monitor_shared_blk_io<br>join pg_stat_monitor_uniq_id using (pgsm_query_id)<br>where cmd_type_text='UPDATE'<br>and pgsm_query_id = -510321339504955469<br>order by 1 asc; |
|
1 |
-- UPDATE 2 (update2_[hit|read|dirty|write]) --<br>select bucket_start_time,<br> shared_blks_hit as update2_hit, shared_blks_read as update2_read,<br> shared_blks_dirtied as update2_dirt, shared_blks_written as update2_writ<br>from pg_stat_monitor_shared_blk_io<br>join pg_stat_monitor_uniq_id using (pgsm_query_id)<br>where cmd_type_text='UPDATE'<br>and pgsm_query_id = 5276535447716615446<br>order by 1 asc; |
|
1 |
-- UPDATE 3 (update3_[hit|read|dirty|write]) --<br>select bucket_start_time,<br> shared_blks_hit as update3_hit, shared_blks_read as update3_read,<br> shared_blks_dirtied as update3_dirt, shared_blks_written as update3_writ<br>from pg_stat_monitor_shared_blk_io<br>join pg_stat_monitor_uniq_id using (pgsm_query_id)<br>where cmd_type_text='UPDATE'<br>and pgsm_query_id = 3629195281782908951<br>order by 1 asc; |

Here are some example patterns that can be discerned:
These two panels show read/write IO performance to the persistent storage while benchmarking a live run.
|
1 |
-- SELECT --<br>select bucket_start_time,total_exec_time as "SELECT"<br>from pg_stat_monitor<br>join pg_stat_monitor_uniq_id a using (pgsm_query_id)<br>where a.cmd_type_text='SELECT'<br>order by 1 asc; |
|
1 |
-- INSERT --<br>select bucket_start_time,total_exec_time as "INSERT"<br>from pg_stat_monitor<br>join pg_stat_monitor_uniq_id a using (pgsm_query_id)<br>where a.cmd_type_text='INSERT'<br>order by 1 asc; |
|
1 |
-- UPDATE 1 --<br>select bucket_start_time,total_exec_time as "UPDATE 1"<br>from pg_stat_monitor<br>join pg_stat_monitor_uniq_id a using (pgsm_query_id)<br>where a.cmd_type_text='UPDATE'<br>and pgsm_query_id = -510321339504955469<br>order by 1 asc; |
|
1 |
-- UPDATE 2 --<br>select bucket_start_time,total_exec_time as "UPDATE 2"<br>from pg_stat_monitor<br>join pg_stat_monitor_uniq_id a using (pgsm_query_id)<br>where a.cmd_type_text='UPDATE'<br>and pgsm_query_id = 5276535447716615446<br>order by 1 asc; |
|
1 |
-- UPDATE 3 --<br>select bucket_start_time,total_exec_time as "UPDATE 3"<br>from pg_stat_monitor<br>join pg_stat_monitor_uniq_id a using (pgsm_query_id)<br>where a.cmd_type_text='UPDATE'<br>and pgsm_query_id = 3629195281782908951<br>order by 1 asc; |
|
1 |
-- time vs read/write blocks (blk_read_time, blk_write_time<br>-- track_io_timing is on<br>select bucket_start_time, blk_read_time, blk_write_time from public.pg_stat_monitor; |

It’s quite easy to observe that SQL statement UPDATE 1 represents the bulk of the read operations.
|
1 |
-- example SQL statement<br>UPDATE pgbench_accounts SET abalance = abalance + 2063 WHERE aid = 1482568 |
Interestingly, writes are not as significant as reads.
I’m excited about pg_stat_monitor. Not only can it be used in Grafana, but it’s easily implemented in any monitoring solution, including our own Percona Monitoring and Management. It’s also incorporated in our latest version of Percona Operator for PostgreSQL.
Pg_stat_monitor is an obvious, common sense improvement over pg_stat_statement’s greatest limitation i.e., its inability to bucket metrics over time intervals. And to be frankly honest, I can see the pg_stat_monitor extension eventually replacing pg_stat_statement as the defacto extension monitoring Postgres when it comes to real-time analysis.
Happy monitoring!
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.
Resources
RELATED POSTS