Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Using pg_repack to Rebuild PostgreSQL Database Objects Online

February 5, 2019

Author

Avinash Vallarapu

Insight for DBAs

PostgreSQL

Share this Post:

In this blog post, we’ll look at how to use pg_repack to rebuild PostgreSQL database objects online.

We’ve seen a lot of questions regarding the options available in PostgreSQL for rebuilding a table online. We created this blog post to explain the pg_repack extension, available in PostgreSQL for this requirement. pg_repack is a well-known extension that was created and is maintained as an open source project by several authors.

There are three main reasons why you need to use pg_repack in a PostgreSQL server:

1. Reclaim free space from a table to disk, after deleting a huge chunk of records

1. Rebuild a table to re-order the records and shrink/pack them to lesser number of pages. This may let a query fetch just one page ( or < n pages) instead of n pages from disk. In other words, less IO and more performance.

1. Reclaim free space from a table that has grown in size with a lot of bloat due to improper autovacuum settings.

You might have already read our previous articles that explained what bloat is, and discussed the internals of autovacuum. After reading these articles, you can see there is an autovacuum background process that removes dead tuples from a table and allows the space to be re-used by future updates/inserts on that table. Over a period of time, tables that take the maximum number of updates or deletes may have a lot of bloated space due to poorly tuned autovacuum settings. This leads to slow performing queries on these tables. Rebuilding the table is the best way to avoid this.

Why is just autovacuum not enough for tables with bloat?

We have discussed several parameters that change the behavior of an autovacuum process in this blog post. There cannot be more than autovacuum_max_workers number of autovacuum processes running in a database cluster at a time. At the same time, due to untuned autovacuum settings and no manual vacuuming of the database as a weekly or monthy jobs, many tables can be skipped from autovacuum. We have discussed in this post that the default autovacuum settings run autovacuum on a table with ten records more times than a table with a million records. So, it is very important to tune your autovacuum settings, set table-level customized autovacuum parameters and enable automated jobs to identify tables with huge bloat and run manual vacuum on them as scheduled jobs during low peak times (after thorough testing).

VACUUM FULL

VACUUM FULL is the default option available with a PostgreSQL installation that allows us to rebuild a table. This is similar to ALTER TABLE in MySQL. However, this command acquires an exclusive lock and locks reads and writes on a table.

VACUUM FULL tablename;

1	VACUUM FULL tablename;

pg_repack

pg_repack is an extension available for PostgreSQL that helps us rebuild a table online. This is similar to pt-online-schema-change for online table rebuild/reorg in MySQL. However, pg_repack works for tables with a Primary key or a NOT NULL Unique key only.

Installing pg_repack extension

In RedHat/CentOS/OEL from PGDG Repo

Obtain the latest PGDG repo from https://yum.postgresql.org/ and perform the following step:

# yum install pg_repack11 (This works for PostgreSQL 11)

Similarly, for PostgreSQL 10,
# yum install pg_repack10

# yum install pg_repack11 (This works for PostgreSQL 11)

Similarly, for PostgreSQL 10,

# yum install pg_repack10

In Debian/Ubuntu from PGDG repo

Add certificates, repo and install pg_repack:

Following certificate may change. Please validate before you perform these steps. 

# sudo apt-get install wget ca-certificates
# wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
# sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
# sudo apt-get update
# apt-get install postgresql-server-dev-11
# apt-get install postgresql-11-repack

Following certificate may change. Please validate before you perform these steps.

# sudo apt-get install wget ca-certificates

# wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -

# sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'

# sudo apt-get update

# apt-get install postgresql-server-dev-11

# apt-get install postgresql-11-repack

Loading and creating pg_repack extension

Step 1 :

You need to add pg_repack to shared_preload_libraries. For that, just set this parameter in postgresql.conf or postgresql.auto.conf file.

shared_preload_libraries = 'pg_repack'

1	shared_preload_libraries = 'pg_repack'

Setting this parameter requires a restart.

$ pg_ctl -D $PGDATA restart -mf

1	$ pg_ctl -D $PGDATA restart -mf

Step 2 :

In order to start using pg_repack, you must create this extension in each database where you wish to run it:

$ psql 
c percona
CREATE EXTENSION pg_repack;

$ psql

c percona

CREATE EXTENSION pg_repack;

Using pg_repack to Rebuild Tables Online

Similar to pt-online-schema-change, you can use the option --dry-run to see if this table can be rebuilt using pg_repack. When you rebuild a table using pg_repack, all its associated Indexes does get rebuild automatically. You can also use -t instead of --table as an argument to rebuild a specific table.

Success message you see when a table satisfies the requirements for pg_repack.

$ pg_repack --dry-run -d percona --table scott.employee
INFO: Dry run enabled, not executing repack
INFO: repacking table "scott.employee"

$ pg_repack --dry-run -d percona --table scott.employee

INFO: Dry run enabled, not executing repack

INFO: repacking table "scott.employee"

Error message when a table does not satisfy the requirements for pg_repack.

$ pg_repack --dry-run -d percona --table scott.sales
INFO: Dry run enabled, not executing repack
WARNING: relation "scott.sales" must have a primary key or not-null unique keys

$ pg_repack --dry-run -d percona --table scott.sales

INFO: Dry run enabled, not executing repack

WARNING: relation "scott.sales" must have a primary key or not-null unique keys

Now to execute the rebuild of a table: scott.employee ONLINE, you can use the following command. It is just the previous command without --dry-run.

$ pg_repack -d percona --table scott.employee
INFO: repacking table "scott.employee"

1 2	$ pg_repack -d percona --table scott.employee INFO: repacking table "scott.employee"

Rebuilding Multiple Tables using pg_repack

Use an additional --table for each table you wish to rebuild.

Dry Run

$ pg_repack --dry-run -d percona --table scott.employee --table scott.departments
INFO: Dry run enabled, not executing repack
INFO: repacking table "scott.departments"
INFO: repacking table "scott.employee"

$ pg_repack --dry-run -d percona --table scott.employee --table scott.departments

INFO: Dry run enabled, not executing repack

INFO: repacking table "scott.departments"

INFO: repacking table "scott.employee"

Execute

$ pg_repack -d percona --table scott.employee --table scott.departments
INFO: repacking table "scott.departments"
INFO: repacking table "scott.employee"

$ pg_repack -d percona --table scott.employee --table scott.departments

INFO: repacking table "scott.departments"

INFO: repacking table "scott.employee"

Rebuilding an entire Database using pg_repack

You can rebuild an entire database online using -d. Any table that is not eligible for pg_repack is skipped automatically.

Dry Run

$ pg_repack --dry-run -d percona 
INFO: Dry run enabled, not executing repack
INFO: repacking table "scott.departments"
INFO: repacking table "scott.employee"

$ pg_repack --dry-run -d percona

INFO: Dry run enabled, not executing repack

INFO: repacking table "scott.departments"

INFO: repacking table "scott.employee"

Execute

$ pg_repack -d percona 
INFO: repacking table "scott.departments"
INFO: repacking table "scott.employee"

$ pg_repack -d percona

INFO: repacking table "scott.departments"

INFO: repacking table "scott.employee"

Running pg_repack in parallel jobs

To perform a parallel rebuild of a table, you can use the option -j. Please ensure that you have sufficient free CPUs that can be allocated to run pg_repack in parallel.

$ pg_repack -d percona -t scott.employee -j 4
NOTICE: Setting up workers.conns
INFO: repacking table "scott.employee"

$ pg_repack -d percona -t scott.employee -j 4

NOTICE: Setting up workers.conns

INFO: repacking table "scott.employee"

Running pg_repack remotely

You can always run pg_repack from a Remote Machine. This helps in scenarios where we have PostgreSQL databases deployed on Amazon RDS. To run pg_repack from a remote machine, you must have the same version of pg_repack installed in the remote server as well as the database server (say AWS RDS).

0 0 votes

Article Rating

4 Comments

Oldest

Newest Most Voted

abfnt1

7 years ago

Nice article! Another example you may want to include is running pg_repack on a production database using the D, –no-kill-backend parameter (don’t kill other backends when timed out). List of parameters are at http://reorg.github.io/pg_repack/.

Ashutosh

7 years ago

Nice Post. One of the disadvantage of pg_repack that we found is – If you have to cancel repack execution in middle (say due to disk space), then it will leave behind all its triggers & indexes on live table, which needs to be cleaned manually.

Javier Zayas

6 years ago

Quick question. If i run a “re_pack -d database_name” do I need twice as much disk as the entire database, or do I just need twice as much disk as the largest table? I ask because I’m trying to figure out if I need to repack by table since I am limited on disk space and am wondering if the extension is smart enough to do a table at a time and clean up after itself as it goes