Since Postgres table bloat degrades database performance, we can improve its performance by removing the table bloat. We can use the pgstattuple extension to identify the bloated tables.
This extension provides several functions for obtaining tuple-level statistics. Because the pgstattuple functions produce extensive page-level information, access to them is, by default, limited. Only the pg_stat_scan_tables role has the ability to EXECUTE the pgstattuple functions by default. The pgstattuple functions are not restricted to the superuser.
Using the pgstattuple function, we can list the tables with a high percentage of dead tuples and run a manual VACUUM to reclaim the space occupied by the dead tuples.
In this blog, we will discuss the pgstattuple extension, which provides various functions to obtain tuple-level statistics.
First, create the pgstattuple extension. The latest version of pgstattuple is 1.5.
|
1 |
postgres@ip-172-31-46-212:~$ psql<br>psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1))<br>Type "help" for help.<br>postgres=# select * from pg_available_extensions where name='pgstattuple';<br> name | default_version | installed_version | comment<br>-------------+-----------------+-------------------+-----------------------------<br> pgstattuple | 1.5 | | show tuple-level statistics<br>(1 row)<br>postgres=#<br>postgres=# create extension pgstattuple;<br>CREATE EXTENSION<br>postgres=#<br>postgres=# select * from pg_available_extensions where name='pgstattuple';<br> name | default_version | installed_version | comment<br>-------------+-----------------+-------------------+-----------------------------<br> pgstattuple | 1.5 | 1.5 | show tuple-level statistics<br>(1 row)<br>postgres=# |
By default, only the superuser has access to the pgstattuple functions; however, you can grant non-superuser access by granting the pg_stat_scan_tables role to the non-superuser.
|
1 |
postgres=# grant pg_stat_scan_tables to <nonsuperuser>;<br>GRANT ROLE<br><br>postgres=> du <nonsuperuser><br> List of roles<br> Role name | Attributes | Member of<br>———–+————+———————–<br> <nonsuperuser> | | {pg_stat_scan_tables}<br>postgres=> |
Next, before using the pgstattuple functions, let’s create a table and index for demonstration.
|
1 |
postgres=# create table workshop (jobno int);<br>CREATE TABLE<br><br>postgres=# insert into workshop values (generate_series(1,80000));<br>INSERT 0 80000<br><br>postgres=# create index workshop_index on workshop (jobno);<br>CREATE INDEX<br>postgres=# |
pgstattuple(regclass): This function returns a relation’s physical length, percentage of “dead” tuples, and other info. This may help users to determine whether a vacuum is necessary or not. The argument is the target relation’s name (optionally schema-qualified) or OID.
For example:
|
1 |
SELECT * FROM pgstattuple('pg_catalog.pg_proc');<br><br>SELECT * FROM pgstattuple('workshop'); |
pgstattuple only obtains a read lock on the relation. As a result, the pgstattuple output does not represent an instantaneous snapshot. The concurrent updates will change the output of the pgstattuple.
The example below shows the tuple statistics of the workshop table.
|
1 |
postgres=# SELECT * FROM pgstattuple('workshop');<br><br>-[ RECORD 1 ]------+--------<br>table_len | 2899968<br>tuple_count | 80000<br>tuple_len | 2240000<br>tuple_percent | 77.24<br>dead_tuple_count | 0<br>dead_tuple_len | 0<br>dead_tuple_percent | 0<br>free_space | 10056<br>free_percent | 0.35 |
Below are the pgstattuple output columns and their description.
| Column | Type | Description |
| table_len | bigint | Physical relation length in bytes |
| tuple_count | bigint | Number of live tuples |
| tuple_len | bigint | Total length of live tuples in bytes |
| tuple_percent | float8 | Percentage of live tuples |
| dead_tuple_count | bigint | Number of dead tuples |
| dead_tuple_len | bigint | Total length of dead tuples in bytes |
| dead_tuple_percent | float8 | Percentage of dead tuples |
| free_space | bigint | Total free space in bytes |
| free_percent | float8 | Percentage of free space |
Let’s check the tuple statistics when we delete or update the rows. Below are the current tuple statistics for the reference.
|
1 |
postgres=# SELECT * FROM pgstattuple('workshop');<br>-[ RECORD 1 ]------+--------<br>table_len | 2899968<br>tuple_count | 80000<br>tuple_len | 2240000<br>tuple_percent | 77.24<br>dead_tuple_count | 0<br>dead_tuple_len | 0<br>dead_tuple_percent | 0<br>free_space | 10056<br>free_percent | 0.35 |
Delete the few ROWS from the table.
|
1 |
postgres=# DELETE FROM workshop WHERE jobno % 8 = 0;<br>DELETE 10000<br>postgres=# |
In the below output, the dead_tuple_count shows that Postgres marked the rows as deleted but did not remove them from the table, as the length of the table is the same after the deletion of the rows.
|
1 |
postgres=# SELECT * FROM pgstattuple('workshop');<br>-[ RECORD 1 ]------+--------<br>table_len | 2899968<br>tuple_count | 70000<br>tuple_len | 1960000<br>tuple_percent | 67.59<br>dead_tuple_count | 10000<br>dead_tuple_len | 280000<br>dead_tuple_percent | 9.66<br>free_space | 10056<br>free_percent | 0.35 |
Now execute the VACUUM on the table; after running the plain VACUUM, we see that the:
|
1 |
postgres=# vacuum workshop;<br>VACUUM<br><br>postgres=# SELECT * FROM pgstattuple('workshop');<br>-[ RECORD 1 ]------+--------<br>table_len | 2899968<br>tuple_count | 70000<br>tuple_len | 1960000<br>tuple_percent | 67.59<br>dead_tuple_count | 0<br>dead_tuple_len | 0<br>dead_tuple_percent | 0<br>free_space | 330412<br>free_percent | 11.39 |
After running the VACUUM FULL, we can see that table_len has decreased. It shows that O.S. level space is reclaimed after running the VACUUM FULL.
|
1 |
postgres=# vacuum full workshop;<br>VACUUM<br><br>postgres=# SELECT * FROM pgstattuple('workshop');<br>-[ RECORD 1 ]------+--------<br>table_len | 2539520<br>tuple_count | 70000<br>tuple_len | 1960000<br>tuple_percent | 77.18<br>dead_tuple_count | 0<br>dead_tuple_len | 0<br>dead_tuple_percent | 0<br>free_space | 10840<br>free_percent | 0.43<br>postgres=# |
We can list the tables where the dead tuple percentage is high by using the query below.
|
1 |
postgres=#select relname,(pgstattuple(oid)).dead_tuple_percent from pg_class where relkind = 'r' order by dead_tuple_percent desc;<br> relname | dead_tuple_percent<br>--------------------------+--------------------<br> pg_init_privs | 2.56<br> pg_class | 1.69<br> pg_statistic | 1.45<br> pg_authid | 1.37<br> pg_extension | 1.28<br> pg_type | 1.17<br> pg_namespace | 1.17 |
The below query will show you the tuple statistics of the tables in detail where the dead tuple percentage is high.
|
1 |
SELECT relname,oid,relowner,(pgstattuple(oid)).dead_tuple_percent ,(pgstattuple(oid)).* FROM pg_class WHERE relkind = 'r' order by dead_tuple_percent desc;<br><br>-[ RECORD 1 ]------+-------------------------<br>relname | pg_init_privs<br>oid | 3394<br>relowner | 10<br>dead_tuple_percent | 2.56<br>table_len | 24576<br>tuple_count | 222<br>tuple_len | 17316<br>tuple_percent | 70.46<br>dead_tuple_count | 9<br>dead_tuple_len | 630<br>dead_tuple_percent | 2.56<br>free_space | 4568<br>free_percent | 18.59<br><br>-[ RECORD 2 ]------+-------------------------<br>relname | pg_class<br>oid | 1259<br>relowner | 10<br>dead_tuple_percent | 1.69<br>table_len | 114688<br>tuple_count | 415<br>tuple_len | 80473<br>tuple_percent | 70.17<br>dead_tuple_count | 11<br>dead_tuple_len | 1937<br>dead_tuple_percent | 1.69<br>free_space | 26976<br>free_percent | 23.52 |
pgstatindex(regclass): This function returns a record showing information about a B-tree index.
For example:
|
1 |
postgres=# SELECT * FROM pgstatindex ('workshop_index');<br><br>-[ RECORD 1 ]------+--------<br>version | 4<br>tree_level | 1<br>index_size | 1589248<br>root_block_no | 3<br>internal_pages | 1<br>leaf_pages | 192<br>empty_pages | 0<br>deleted_pages | 0<br>avg_leaf_density | 89.74<br>leaf_fragmentation | 0 |
The output columns and their description.
| Column | Type | Description |
| version | integer | B-tree version number |
| tree_level | integer | Tree level of the root page |
| index_size | bigint | Total index size in bytes |
| root_block_no | bigint | Location of root page (zero if none) |
| internal_pages | bigint | Number of “internal” (upper-level) pages |
| leaf_pages | bigint | Number of leaf pages |
| empty_pages | bigint | Number of empty pages |
| deleted_pages | bigint | Number of deleted pages |
pgstatginindex(regclass): This function returns a record showing information about a GIN index.
For example:
|
1 |
SELECT * FROM pgstatginindex('table_gin_index'); |
pgstathashindex(regclass): This function returns a record showing information about a HASH index.
For example:
|
1 |
select * from pgstathashindex('table_hash_index'); |
In this blog, we have seen how the pgstattuple can be used for tuple-level analysis and to find bloated tables or vacuuming candidates. https://www.postgresql.org/docs/current/pgstattuple.html has some additional information on the extension.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.
Resources
RELATED POSTS