When you need to upgrade your PostgreSQL databases, there are a number of options available to you. In this post we’ll take a look at how you can upgrade PostgreSQL versions using pg_upgrade, a built-in tool that allows in-place upgrade of your software. Using pg_upgrade allows you, potentially, to minimize your downtime, an essential consideration for many organizations. It also allows you to perform a postgres upgrade with very minimal effort.
In our previous posts, we discussed various methods and tools that can help us perform a PostgreSQL upgrade – (1) pg_dumpall, (2) pg_dump and pg_restore with pg_dumpall, (3) logical replication and pglogical, and (4) slony. Methods 1 and 2 can involve additional downtime compared to the approaches taken in 3 and 4. Whilst performing an upgrade using logical replication or slony may be time consuming and require a lot of monitoring, it can be worth it if you can minimize downtime. If you have large databases that are busy with a lot of transactions, you may be better served using logical replication or slony.
This post is the fifth of our Upgrading or Migrating Your Legacy PostgreSQL to Newer PostgreSQL Versions series. These posts lead up to a live webinar, where we’ll be exploring different methods available to upgrade your PostgreSQL databases. If it’s beyond the live webinar date when you read this, you’ll find the recording at that same link.
pg_upgrade (formerly pg_migrator – until PostgreSQL 8.4) is a built-in tool that helps in upgrading a legacy PostgreSQL server to a newer version without the need of a dump and restore. The oldest version from when you can upgrade your PostgreSQL using pg_upgrade is 8.4.x. It is capable of performing faster upgrades by taking into consideration that system tables are the ones that undergo the most change between two major versions. The internal data storage format is less often affected.
In fact, in one of our tests we were able to perform an upgrade of a 2 TB database server from PostgreSQL 9.6.5 to 11.1 in less than 10 seconds. Now that is fast!
Overview of the process
To understand how it works, consider a PostgreSQL server running on 9.3.3 that needs to upgrade to PostgreSQL 11. You should install the latest binaries for your new PostgreSQL version on the server – let’s say PostgreSQL 11.2 – before you begin the upgrade process.
Preparation and consistency checks
Once you have installed the new PostgreSQL version, initialize a new data directory using the new binaries and start it on another port i.e. a different port to the one used by PostgreSQL 9.3.3, in our example. Use pg_upgrade to perform consistency checks between the two servers – PG 9.3.3 and PG 11.2 – running on two different ports. If you get any errors, such as a missing extension, you need to to fix these before you proceeding to the upgrade. Once the consistency check has been passed, you can proceed.
Here is how the log looks if you should get an error while performing consistency checks.
$ /usr/pgsql-11/bin/pg_upgrade -b /usr/pgsql-9.3/bin -B /usr/pgsql-11/bin -d /var/lib/pgsql/9.3/data -D /var/lib/pgsql/11/data_new -c
Performing Consistency Checks on Old Live Server
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for invalid "unknown" user columns ok
Checking for hash indexes ok
Checking for roles starting with "pg_" ok
Checking for incompatible "line" data type ok
Checking for presence of required libraries fatal
Your installation references loadable libraries that are missing from the
new installation. You can add these libraries to the new installation,
or remove the functions using them from the old installation. A list of
problem libraries is in the file:
$ cat loadable_libraries.txt
could not load library "$libdir/pg_repack": ERROR: could not access file "$libdir/pg_repack": No such file or directory
To proceed beyond the error, in this example you’d need to install this missing extension pg_repack for the new PostgreSQL version, and rerun the check to make sure that you receive no errors and all the checks are passed.
Carrying out the upgrade
Once passed, you can proceed in one of two ways. One option is to let pg_upgrade copy the datafiles of the old data directory to the new data directory initialized by the new PostgreSQL version. The second option is to let pg_upgrade use hard links instead of copying data files. Copying a database of several terabytes may be time consuming. Using the hard links method makes the process really quick as it does not involve copying files.
To use hard links with pg_upgrade, you pass an additional argument -k as you can see in the following command.
$ /usr/pgsql-11/bin/pg_upgrade -b /usr/pgsql-9.3/bin -B /usr/pgsql-11/bin -d /var/lib/pgsql/9.3/data -D /var/lib/pgsql/11/data_new -k
In the Unix file system, a file or a directory is a link to an inode (index node) that stores metadata (disk block location, attributes, etc) of the data stored in them. Each inode is identified by an integer or an inode number. When you use pg_upgrade with hard links, it internally creates another file/directory in the new data directory that links to the same inode as it was in the older data directory for that file/directory. So, it skips the physical copy of the objects, but creates each object and links them to the same inode.
This reduces the disk IO and avoids the need for additional space in the server. An important point to note is that this option works only when you are upgrading your PostgreSQL on the same file system. This means, for example, if you want to upgrade to a new or a faster disk during the database upgrade, the hard link option does not work. In that case, you would need to use the file copy method.
So far, we have seen a high level overview of how pg_upgrade with hard links help you to perform an upgrade with lowest possible downtime. Come see more in action during our Webinar. And don’t forget at Percona Live in Austin, May 28-30 2019, we’ll have two days of PostgreSQL content in a postgres dedicated track.