Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

PostgreSQL Tuple-Level Statistics With pgstattuple

October 9, 2023

Author

Share this Post:

Since Postgres table bloat degrades database performance, we can improve its performance by removing the table bloat. We can use the pgstattuple extension to identify the bloated tables.

This extension provides several functions for obtaining tuple-level statistics. Because the pgstattuple functions produce extensive page-level information, access to them is, by default, limited. Only the pg_stat_scan_tables role has the ability to EXECUTE the pgstattuple functions by default. The pgstattuple functions are not restricted to the superuser.

Using the pgstattuple function, we can list the tables with a high percentage of dead tuples and run a manual VACUUM to reclaim the space occupied by the dead tuples.

In this blog, we will discuss the pgstattuple extension, which provides various functions to obtain tuple-level statistics.

First, create the pgstattuple extension. The latest version of pgstattuple is 1.5.

postgres@ip-172-31-46-212:~$ psql
psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1))
Type "help" for help.
postgres=# select * from pg_available_extensions where name='pgstattuple';
    name     | default_version | installed_version |           comment
-------------+-----------------+-------------------+-----------------------------
 pgstattuple | 1.5             |                   | show tuple-level statistics
(1 row)
postgres=#
postgres=# create extension pgstattuple;
CREATE EXTENSION
postgres=#
postgres=# select * from pg_available_extensions where name='pgstattuple';
    name     | default_version | installed_version |           comment
-------------+-----------------+-------------------+-----------------------------
 pgstattuple | 1.5             | 1.5               | show tuple-level statistics
(1 row)
postgres=#

postgres@ip-172-31-46-212:~$ psql

psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1))

Type "help" for help.

postgres=# select * from pg_available_extensions where name='pgstattuple';

name | default_version | installed_version | comment

-------------+-----------------+-------------------+-----------------------------

pgstattuple | 1.5 | | show tuple-level statistics

(1 row)

postgres=#

postgres=# create extension pgstattuple;

CREATE EXTENSION

postgres=#

postgres=# select * from pg_available_extensions where name='pgstattuple';

name | default_version | installed_version | comment

-------------+-----------------+-------------------+-----------------------------

pgstattuple | 1.5 | 1.5 | show tuple-level statistics

(1 row)

postgres=#

By default, only the superuser has access to the pgstattuple functions; however, you can grant non-superuser access by granting the pg_stat_scan_tables role to the non-superuser.

postgres=# grant pg_stat_scan_tables to <nonsuperuser>;
GRANT ROLE

postgres=> du <nonsuperuser>
                 List of roles
 Role name | Attributes |       Member of
———–+————+———————–
 <nonsuperuser>        |            | {pg_stat_scan_tables}
postgres=>

postgres=# grant pg_stat_scan_tables to <nonsuperuser>;

GRANT ROLE

postgres=> du <nonsuperuser>

List of roles

Role name | Attributes | Member of

———–+————+———————–

<nonsuperuser> | | {pg_stat_scan_tables}

postgres=>

Next, before using the pgstattuple functions, let’s create a table and index for demonstration.

postgres=# create table workshop (jobno int);
CREATE TABLE

postgres=# insert into workshop values (generate_series(1,80000));
INSERT 0 80000

postgres=# create index workshop_index on workshop (jobno);
CREATE INDEX
postgres=#

postgres=# create table workshop (jobno int);

CREATE TABLE

postgres=# insert into workshop values (generate_series(1,80000));

INSERT 0 80000

postgres=# create index workshop_index on workshop (jobno);

CREATE INDEX

postgres=#

pgstattuple functions

pgstattuple(regclass): This function returns a relation’s physical length, percentage of “dead” tuples, and other info. This may help users to determine whether a vacuum is necessary or not. The argument is the target relation’s name (optionally schema-qualified) or OID.

For example:

SELECT * FROM pgstattuple('pg_catalog.pg_proc');

SELECT * FROM pgstattuple('workshop');

SELECT * FROM pgstattuple('pg_catalog.pg_proc');

SELECT * FROM pgstattuple('workshop');

pgstattuple only obtains a read lock on the relation. As a result, the pgstattuple output does not represent an instantaneous snapshot. The concurrent updates will change the output of the pgstattuple.

The example below shows the tuple statistics of the workshop table.

postgres=# SELECT * FROM pgstattuple('workshop');

-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 80000
tuple_len          | 2240000
tuple_percent      | 77.24
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 10056
free_percent       | 0.35

postgres=# SELECT * FROM pgstattuple('workshop');

-[ RECORD 1 ]------+--------

table_len | 2899968

tuple_count | 80000

tuple_len | 2240000

tuple_percent | 77.24

dead_tuple_count | 0

dead_tuple_len | 0

dead_tuple_percent | 0

free_space | 10056

free_percent | 0.35

Below are the pgstattuple output columns and their description.

Column	Type	Description
table_len	bigint	Physical relation length in bytes
tuple_count	bigint	Number of live tuples
tuple_len	bigint	Total length of live tuples in bytes
tuple_percent	float8	Percentage of live tuples
dead_tuple_count	bigint	Number of dead tuples
dead_tuple_len	bigint	Total length of dead tuples in bytes
dead_tuple_percent	float8	Percentage of dead tuples
free_space	bigint	Total free space in bytes
free_percent	float8	Percentage of free space

Let’s check the tuple statistics when we delete or update the rows. Below are the current tuple statistics for the reference.

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 80000
tuple_len          | 2240000
tuple_percent      | 77.24
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 10056
free_percent       | 0.35

postgres=# SELECT * FROM pgstattuple('workshop');

-[ RECORD 1 ]------+--------

table_len | 2899968

tuple_count | 80000

tuple_len | 2240000

tuple_percent | 77.24

dead_tuple_count | 0

dead_tuple_len | 0

dead_tuple_percent | 0

free_space | 10056

free_percent | 0.35

Delete the few ROWS from the table.

postgres=# DELETE FROM workshop WHERE jobno % 8 = 0;
DELETE 10000
postgres=#

postgres=# DELETE FROM workshop WHERE jobno % 8 = 0;

DELETE 10000

postgres=#

In the below output, the dead_tuple_count shows that Postgres marked the rows as deleted but did not remove them from the table, as the length of the table is the same after the deletion of the rows.

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 70000
tuple_len          | 1960000
tuple_percent      | 67.59
dead_tuple_count   | 10000
dead_tuple_len     | 280000
dead_tuple_percent | 9.66
free_space         | 10056
free_percent       | 0.35

postgres=# SELECT * FROM pgstattuple('workshop');

-[ RECORD 1 ]------+--------

table_len | 2899968

tuple_count | 70000

tuple_len | 1960000

tuple_percent | 67.59

dead_tuple_count | 10000

dead_tuple_len | 280000

dead_tuple_percent | 9.66

free_space | 10056

free_percent | 0.35

Now execute the VACUUM on the table; after running the plain VACUUM, we see that the:

1. free_space has increased. Vacuum has reclaimed the space and made it available for re-use at the database level.

1. Dead rows have been removed.

1. The table_len is the same as O.S. level space is not reclaimed.

postgres=# vacuum workshop;
VACUUM

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2899968
tuple_count        | 70000
tuple_len          | 1960000
tuple_percent      | 67.59
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 330412
free_percent       | 11.39

postgres=# vacuum workshop;

VACUUM

postgres=# SELECT * FROM pgstattuple('workshop');

-[ RECORD 1 ]------+--------

table_len | 2899968

tuple_count | 70000

tuple_len | 1960000

tuple_percent | 67.59

dead_tuple_count | 0

dead_tuple_len | 0

dead_tuple_percent | 0

free_space | 330412

free_percent | 11.39

After running the VACUUM FULL, we can see that table_len has decreased. It shows that O.S. level space is reclaimed after running the VACUUM FULL.

postgres=# vacuum full workshop;
VACUUM

postgres=# SELECT * FROM pgstattuple('workshop');
-[ RECORD 1 ]------+--------
table_len          | 2539520
tuple_count        | 70000
tuple_len          | 1960000
tuple_percent      | 77.18
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 10840
free_percent       | 0.43
postgres=#

postgres=# vacuum full workshop;

VACUUM

postgres=# SELECT * FROM pgstattuple('workshop');

-[ RECORD 1 ]------+--------

table_len | 2539520

tuple_count | 70000

tuple_len | 1960000

tuple_percent | 77.18

dead_tuple_count | 0

dead_tuple_len | 0

dead_tuple_percent | 0

free_space | 10840

free_percent | 0.43

postgres=#

pgstattuple queries to check table bloat

We can list the tables where the dead tuple percentage is high by using the query below.

postgres=#select relname,(pgstattuple(oid)).dead_tuple_percent from pg_class where relkind = 'r' order by dead_tuple_percent desc;
 relname                  | dead_tuple_percent
--------------------------+--------------------
 pg_init_privs            |               2.56
 pg_class                 |               1.69
 pg_statistic             |               1.45
 pg_authid                |               1.37
 pg_extension             |               1.28
 pg_type                  |               1.17
 pg_namespace             |               1.17

postgres=#select relname,(pgstattuple(oid)).dead_tuple_percent from pg_class where relkind = 'r' order by dead_tuple_percent desc;

relname | dead_tuple_percent

--------------------------+--------------------

pg_init_privs | 2.56

pg_class | 1.69

pg_statistic | 1.45

pg_authid | 1.37

pg_extension | 1.28

pg_type | 1.17

pg_namespace | 1.17

The below query will show you the tuple statistics of the tables in detail where the dead tuple percentage is high.

SELECT relname,oid,relowner,(pgstattuple(oid)).dead_tuple_percent ,(pgstattuple(oid)).* FROM pg_class WHERE relkind = 'r' order by dead_tuple_percent desc;

-[ RECORD 1 ]------+-------------------------
relname            | pg_init_privs
oid                | 3394
relowner           | 10
dead_tuple_percent | 2.56
table_len          | 24576
tuple_count        | 222
tuple_len          | 17316
tuple_percent      | 70.46
dead_tuple_count   | 9
dead_tuple_len     | 630
dead_tuple_percent | 2.56
free_space         | 4568
free_percent       | 18.59

-[ RECORD 2 ]------+-------------------------
relname            | pg_class
oid                | 1259
relowner           | 10
dead_tuple_percent | 1.69
table_len          | 114688
tuple_count        | 415
tuple_len          | 80473
tuple_percent      | 70.17
dead_tuple_count   | 11
dead_tuple_len     | 1937
dead_tuple_percent | 1.69
free_space         | 26976
free_percent       | 23.52

SELECT relname,oid,relowner,(pgstattuple(oid)).dead_tuple_percent ,(pgstattuple(oid)).* FROM pg_class WHERE relkind = 'r' order by dead_tuple_percent desc;

-[ RECORD 1 ]------+-------------------------

relname | pg_init_privs

oid | 3394

relowner | 10

dead_tuple_percent | 2.56

table_len | 24576

tuple_count | 222

tuple_len | 17316

tuple_percent | 70.46

dead_tuple_count | 9

dead_tuple_len | 630

dead_tuple_percent | 2.56

free_space | 4568

free_percent | 18.59

-[ RECORD 2 ]------+-------------------------

relname | pg_class

oid | 1259

relowner | 10

dead_tuple_percent | 1.69

table_len | 114688

tuple_count | 415

tuple_len | 80473

tuple_percent | 70.17

dead_tuple_count | 11

dead_tuple_len | 1937

dead_tuple_percent | 1.69

free_space | 26976

free_percent | 23.52

pgstatindex(regclass): This function returns a record showing information about a B-tree index.

For example:

postgres=# SELECT * FROM pgstatindex ('workshop_index');

-[ RECORD 1 ]------+--------
version            | 4
tree_level         | 1
index_size         | 1589248
root_block_no      | 3
internal_pages     | 1
leaf_pages         | 192
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 89.74
leaf_fragmentation | 0

postgres=# SELECT * FROM pgstatindex ('workshop_index');

-[ RECORD 1 ]------+--------

version | 4

tree_level | 1

index_size | 1589248

root_block_no | 3

internal_pages | 1

leaf_pages | 192

empty_pages | 0

deleted_pages | 0

avg_leaf_density | 89.74

leaf_fragmentation | 0

The output columns and their description.

Column	Type	Description
version	integer	B-tree version number
tree_level	integer	Tree level of the root page
index_size	bigint	Total index size in bytes
root_block_no	bigint	Location of root page (zero if none)
internal_pages	bigint	Number of “internal” (upper-level) pages
leaf_pages	bigint	Number of leaf pages
empty_pages	bigint	Number of empty pages
deleted_pages	bigint	Number of deleted pages

pgstatginindex(regclass): This function returns a record showing information about a GIN index.

For example:

SELECT * FROM pgstatginindex('table_gin_index');

1	SELECT * FROM pgstatginindex('table_gin_index');

pgstathashindex(regclass): This function returns a record showing information about a HASH index.

For example:

select * from pgstathashindex('table_hash_index');

1	select * from pgstathashindex('table_hash_index');

In this blog, we have seen how the pgstattuple can be used for tuple-level analysis and to find bloated tables or vacuuming candidates. https://www.postgresql.org/docs/current/pgstattuple.html has some additional information on the extension.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

0 0 votes

Article Rating

1 Comment

Oldest

Newest Most Voted

Yeshas Siddi MS

2 years ago

Hi Shivam,

Thank you for the blog, really helpful. I tried pgstattuple and noticed that the iops consumption goes through the roof as the size of the tables increases, which made me curious about pgstattuple_approx(regclass). This function does not do a full table scan, with the caveat that the table should be vacuumed frequently so that the visibility map provides up to date info about visible tuples.

Regards,
Yeshas Siddi M S