We have faced different levels of corruption related to databases in PostgreSQL. Our colleague has written multiple blogs on the subject; please refer to the below links for more:
In this blog, we will be discussing the scenario where a data file related to a table goes missing, maybe due to OS (hardware problem) or due to human interruption, which causes the deletion of some data file unintentionally at the OS level. Though it is not at all recommended to touch the /data/base/ directory and go through files under this /var/lib/postgresql/14/main/base/, however, sometimes it happens.
Our current database was running fine with the below structure:
|
1 |
List of databases<br> Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description<br>-----------+----------+----------+---------+---------+-----------------------+---------+------------+--------------------------------------------<br> percona | postgres | UTF8 | C.UTF-8 | C.UTF-8 | | 9561 kB | pg_default |<br> postgres | postgres | UTF8 | C.UTF-8 | C.UTF-8 | | 8553 kB | pg_default | default administrative connection database<br> template0 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | =c/postgres +| 8401 kB | pg_default | unmodifiable empty database<br> | | | | | postgres=CTc/postgres | | |<br> template1 | postgres | UTF8 | C.UTF-8 | C.UTF-8 | =c/postgres +| 8553 kB | pg_default | default template for new databases<br> | | | | | postgres=CTc/postgres | | |<br>(4 rows) |
Somehow we are getting the below error message in PostgreSQL logs:
|
1 |
2023-06-14 09:58:06.408 UTC [4056] LOG: listening on IPv4 address "127.0.0.1", port 5432<br>2023-06-14 09:58:06.412 UTC [4056] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"<br>2023-06-14 09:58:06.423 UTC [4057] LOG: database system was shut down at 2023-06-14 09:58:04 UTC<br>2023-06-14 09:58:06.432 UTC [4056] LOG: database system is ready to accept connections<br>2023-06-16 10:00:58.130 UTC [35062] postgres@percona ERROR: could not open file "base/16384/16391": No such file or directory<br>2023-06-16 10:00:58.130 UTC [35062] postgres@percona STATEMENT: select * from test limit 1;<br>2023-06-16 10:01:59.191 UTC [35224] postgres@percona ERROR: could not open file "base/16384/16391": No such file or directory<br>2023-06-16 10:01:59.191 UTC [35224] postgres@percona STATEMENT: select * from test limit 1;<br> |
Upon checking, we found it was due to one file ( base/16384/16391) being removed. So we need to check whether this base/16384/16391 file is available in /base location.
|
1 |
postgres@ip-172-xx-xx-xx:~/14/main$ ls -l base/16384/16391<br>ls: cannot access 'base/16384/16391': No such file or directory |
Also, we can check at the DB level by using this SQL Query:
|
1 |
percona=# SELECT relid, relname FROM pg_catalog.pg_statio_user_tables<br>WHERE relid = '16391';<br> relid | relname<br>-------+---------<br> 16391 | test<br>(1 row)<br> |
From the above, we have identified that the file for the table “test” with relid 16391 got deleted. We need to identify whether it was deleted manually by mistake or was due to hardware failure.
In case of hardware failure, first, we need to fix the hardware issue or migrate our database to new hardware and then perform a restore, as mentioned below.
To restore, we can follow either of below approaches:
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.
Download Percona Distribution for PostgreSQL Today!
Abdul Sayeed co-authored this article.