One of the routine operations when administering PostgreSQL is periodically updating the database system’s architecture. PostgreSQL does a good job of allowing one to update a schema, add types, functions, or triggers, alter a table by adding and removing columns, update column data types, and perform related operations in a reliable manner. However, there is no built-in mechanism to identify differences, let alone generate the necessary SQL, to easily accomplish updates from the development environment to the production environment.
So let’s talk about possible approaches to schema changes.
The easiest way to identify changes between schemas from one database to another is to compare schema dump manifests.
The following example demonstrates an approach one can take when looking for differences between schemas on different databases.
Example:
|
1 2 3 |
-- Create database schemas. CREATE DATABASE db01; CREATE DATABASE db02; |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
-- db01: version 1. CREATE TABLE t1 ( c1 int, c2 text, c4 date ); CREATE TABLE t2 ( c1 int, c2 varchar(3), c3 timestamp, c4 date ); |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
-- db02: version 2. CREATE TABLE t1 ( c1 serial PRIMARY KEY, c2 varchar(256), c3 date DEFAULT now() ); CREATE TABLE t2 ( c1 serial PRIMARY KEY, c2 varchar(3), c3 varchar(50), c4 timestamp with time zone DEFAULT now(), c5 int REFERENCES t1(c1) ); CREATE INDEX ON t2(c5); |
|
1 2 3 |
# Generate schema dumps. pg_dump -s db01 -Fc > db01.db pg_dump -s db02 -Fc > db02.db |
|
1 2 3 |
# Generate manifests. pg_restore -l db01.db > db01_manifest.ini pg_restore -l db02.db > db02_manifest.ini |
This snippet demonstrates looking for differences by comparing the MD5 checksums:
|
1 2 3 4 5 6 7 8 |
# EX 1: Generate checksums. md5sum \ <(tail +16 db01_manifest.ini | cut -d ' ' -f 4-) \ <(tail +16 db02_manifest.ini | cut -d ' ' -f 4-) # Output. $ 9d76c028259f2d8bed966308c256943e /dev/fd/63 $ ba124f9410ea623085c237dc4398388a /dev/fd/62 |
This next snippet diffs the differences between the two manifests, identifying only the objects and attributes that have changed. Notice that redundant information, the first 16 lines, is skipped:
|
1 2 3 4 |
# EX 2: Perform diff. diff \ <(tail +16 db01_manifest.ini | cut -d ' ' -f 4-) \ <(tail +16 db02_manifest.ini | cut -d ' ' -f 4-) |
This resultant diff shows the changes made between the two schemas:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
1a2,3 > SEQUENCE public t1_c1_seq postgres > SEQUENCE OWNED BY public t1_c1_seq postgres 2a5,12 > SEQUENCE public t2_c1_seq postgres > SEQUENCE OWNED BY public t2_c1_seq postgres > DEFAULT public t1 c1 postgres > DEFAULT public t2 c1 postgres > CONSTRAINT public t1 t1_pkey postgres > CONSTRAINT public t2 t2_pkey postgres > INDEX public t2_c5_idx postgres > FK CONSTRAINT public t2 t2_c5_fkey postgres |
The good news is that there are a number of existing tools that can reconcile differences between a proposed schema design and the target schema:
The following is an example implementation of the open source tool apgdiff.
Apgdiff can be found in the Postgres community repository. It compares two schema dump files and creates an SQL output file that is, for the most part, suitable for upgrades of old schemata:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Package: apgdiff Version: 2.7.0-1.pgdg18.04+1 Architecture: all Maintainer: Debian PostgreSQL Maintainers <team+postgresql@tracker.debian.org> Installed-Size: 173 Depends: default-jre-headless | java2-runtime-headless Homepage: https://www.apgdiff.com/ Priority: optional Section: database Filename: pool/main/a/apgdiff/apgdiff_2.7.0-1.pgdg18.04+1_all.deb Size: 154800 SHA256: 9a83fcf54aed00e1a28c3d00eabe1c166977af1e26e91035e15f88b5215b181b SHA1: ea713acb55898f07374dadd1bebb09ec2fa4b589 MD5sum: e70a97903cb23b8df8a887da4c54e945 |
The following example demonstrates how one can update differences between the development environment and the production database schema using apgdiff.
Example:
|
1 |
apt install -y apgdiff |
|
1 2 3 4 5 6 7 8 9 |
# EX 1: Dump as SQL statements. pg_dump -s db01 -Fp > db01.sql pg_dump -s db02 -Fp > db02.sql createdb db03 --template=db01 apgdiff --ignore-start-with db01.sql db02.sql > db01-db02.sql # "psql -1" encapsulates statements within a transaction. psql -1 -f db01-db02.sql db03 |
|
1 2 3 4 5 6 7 8 9 10 11 |
# EX 2: Use logical dumps. # Notice that the dumps are standard logical dumps and include data. pg_dump db01 -Fc > db01.db pg_dump db02 -Fc > db02.db createdb db03 --template=db01 # This invocation assumes the resultant diff does not require editing. apgdiff --ignore-start-with \ <(pg_restore -s -f - db01.db) \ <(pg_restore -s -f - db02.db) \ | psql -1 db03 |
There is more you can accomplish with these simple approaches. By incorporating variations of them, one can create fairly sophisticated shell scripts with little code and, with a little luck, not much effort.
Resources
RELATED POSTS
This is really amazing. I was struggling with it.