In this blog post, we will explore pg_dump / pg_restore, one of the most commonly used options for performing a PostgreSQL upgrade. It is important to understand the scenarios under which pg_dump and pg_restore utilities will be helpful.
This post is the second of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series where we’ll be exploring different methods available to upgrade your PostgreSQL databases.
pg_dump is a utility to perform a backup of single database. You cannot backup multiple databases unless you do so using separate commands in parallel. If your upgrade plan needs global objects to be copied over, pg_dump need to be supplemented by pg_dumpall . To know more about pg_dumpall , you may refer to our previous blog post.
pg_dump can produce dumps in multiple formats – plain text and custom format – each with own advantages. When you use pg_dump with custom format (-Fc), you must use pg_restore to restore the dump.
If the dump is taken using a plain-text format, pg_dump generates a script file of multiple SQL commands. It can be restored using psql.
A custom format dump, however, is compressed and is not human-readable.
A dump taken in plain text format may be slightly larger in size when compared to a custom format dump.
At times, you may wish to perform schema changes in your target PostgreSQL database before restore, for example, table partitioning. Or you may wish to restore only a selected list of objects from a dump file.
In such cases, you cannot restore a selected list of tables from a plain format dump of a database. If you take the database dump in custom format, you can use pg_restore, which will help you choose a specific set of tables for restoration.
Steps involved in upgrade
The most important point to remember is that both dump and restore should be performed using the latest binaries. For example, if we need to migrate from version 9.3 to version 11, we should be using the pg_dump binary of PostgreSQL 11 to connect to 9.3 .
When a server is equipped with two different versions of binaries, it is good practice to specify the full path of the pg_dump from the latest version as follows :
/usr/lib/postgresql/11/bin/pg_dump <connection_info_of_source_system> <options>
Getting the global dumps
In PostgreSQL, users/roles are global to the database cluster, and the same user can have privileges on objects in different databases. These are called “Globals” because they are applicable for all the databases within the instance. Creation of globals in the target system at the earliest opportunity is very important, because rest of the DDLs may contain GRANTs to these users/roles. It is good practice to dump the globals into a file, and to examine the file, before importing into destination system. This can be achieved using the following command :
/usr/lib/postgresql/11/bin/pg_dumpall -g -p 5432 > /tmp/globals_only.sql
Since this produces a plain SQL dump file, it can be fed to psql connected to the destination server. If there are no modifications required, the globals can be directly piped to the destination server using the command in the next example. Since this is a plain SQL dump file, it can be fed to psql for restore.
/usr/lib/postgresql/11/bin/pg_dumpall -g <source_connection_info> | psql -p <destination_connection_info>
The above command would work for an upgrade in a local server. You can add an additional argument -h for hostname in the <destination_connection_info> if you are performing an upgrade to a remote database server.
Schema Only Dumps
The next stage of the migration involves the creation of schema objects. At this point, you might want to move different database objects to different tablespaces, and partition a few of the tables. If such schema modifications are part of the plan, then we should extract the schema definition to a plain text file. Here’s an example command that can be used to achieve this :
/usr/lib/postgresql/11/bin/pg_dump -s -d databasename -p 5432 > /tmp/schema_only.sql
In general, the majority of the database objects won’t need any modifications. In such cases, it is good practice to dump the schema objects as such into the destination database using a PIPE , using a similar command to this:
/usr/lib/postgresql/11/bin/pg_dump -s -d databasename <source_connection> | psql -d database <destination_connection>
Once all the schema objects are created, we should be able to drop only those objects which need modification. We can then recreate them with their modified definition.
This is the stage when the majority of the data transfers between the database servers. If there is good bandwidth between source and destination, we should look to achieve maximum parallelism at this stage. In many situations, we could analyze the foreign key dependency hierarchy and import data in parallel batches for a group of tables. Data-only copying is possible using -a or --data-only flag of pg_dump .
Copying the data of individual tables
You might have to incorporate schema changes as part of an upgrade. In this case, you can copy the data of a few tables individually. We provide an example here:
/usr/lib/postgresql/11/bin/pg_dump <sourcedb_connection_info> -d <database> -a -t schema.tablename | psql <destinationdb_connection_info> <databasename>
There could be special situations where you need to append only a partial selection of the data. This happens especially on time-series data. In such cases, you can use copy commands with a WHERE clause cto extract and import specific data. You can see this in the following example :
/usr/lib/postgresql/11/bin/psql <sourcedb_connection_info> -c "COPY (select * from <table> where <filter condition>)” > /tmp/selected_table_data.sql
pg_dump/pg_restore may be useful if you need to perform a faster upgrade of PostgreSQL server with a modified schema and bloat-free relations. To see more about this method in action, please subscribe to our webinar here.