Good database maintenance includes not only standard operations such as adding, updating, and deleting records, but also periodic edits to the table schema. Operations such as adding, editing, and removing table columns are part of today’s lifecycle reality as new functionality is constantly added.
In quieter and less demanding times, one could get away with schema updates with minimal impact by performing the operation during low-load periods. However, as database systems have become more critical to the business bottom line, maintenance windows have become much smaller and more time-sensitive.
So the question is: how can one update a multi-terabyte table with near-zero downtime?
Looking deep into our Postgres bag of tricks, we look not at the most recent and advanced features, but instead leverage a very old capability that has been part of Postgres since it was first released as an open source project: its object-relational implementation of inheritance.
Before going into the details of the solution, let’s define what we are trying to solve.
Strictly speaking, there are two use cases that come to mind when using inheritance as the primary ETL data migration mechanism:
Life starts getting complicated when dealing with issues such as updating data types. We will discuss this at the end of the blog.
Unlike object-oriented programming languages, where the child inherits attributes from the parent, in Postgres the parent has the ability to inherit from the child. If only this were true in real life. Thus, a table column from a child has the potential to become available in the parent relation.
Consider the following snippet of code. Two parents and three children are created and populated with records:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
BEGIN; DROP TABLE IF EXISTS father, mother CASCADE; CREATE TABLE father(c1 int, c2 int, c3 int); CREATE TABLE mother(c1 int, c2 int, c4 int); -- Notice that although column "c4" from mother is not declared, -- it is added to these tables. CREATE TABLE son(c1 int, c2 int, c3 int) INHERITS (father, mother); CREATE TABLE daughter(c2 int, c4 int, c5 int) INHERITS (father, mother); -- This table inherits only columns from "father". CREATE TABLE cousin(c1 int, c2 int, c3 int, c6 int) INHERITS (father); COMMIT; |
|
1 2 3 4 5 6 7 |
BEGIN; INSERT INTO son VALUES (1, 1, 1, 1); INSERT INTO daughter VALUES (2, 2, 2, 2, 2); INSERT INTO cousin VALUES (3, 3, 3, 3); INSERT INTO father VALUES (10, 10, 10); INSERT INTO mother VALUES (11, 11, 11); COMMIT; |
Columns declared in the parents must exist in the child; they are merged. However, columns unique to the child are not necessarily propagated to the parents:
|
1 2 3 4 5 6 7 |
Table "public.father" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- c1 | integer | | | c2 | integer | | | c3 | integer | | | Number of child tables: 3 (Use \d+ to list them.) |
|
1 2 3 4 5 6 7 |
Table "public.mother" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- c1 | integer | | | c2 | integer | | | c4 | integer | | | Number of child tables: 2 (Use \d+ to list them.) |
|
1 2 3 4 5 6 7 8 9 |
Table "public.son" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- c1 | integer | | | c2 | integer | | | c3 | integer | | | c4 | integer | | | Inherits: father, mother |
|
1 2 3 4 5 6 7 8 9 10 |
Table "public.daughter" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- c1 | integer | | | c2 | integer | | | c3 | integer | | | c4 | integer | | | c5 | integer | | | Inherits: father, mother |
|
1 2 3 4 5 6 7 8 |
Table "public.cousin" Column | Type | Collation | Nullable | Default --------+---------+-----------+----------+--------- c1 | integer | | | c2 | integer | | | c3 | integer | | | c6 | integer | | | Inherits: father |
Even though records populated in the child can be seen by the parent, the reverse is not true. Records populated into the parent are not seen by the child:
|
1 2 3 4 5 6 7 |
db02=# SELECT * FROM father; c1 | c2 | c3 ----+----+---- 10 | 10 | 10 1 | 1 | 1 2 | 2 | 2 3 | 3 | 3 |
|
1 2 3 4 5 6 |
db02=# SELECT * FROM mother; c1 | c2 | c4 ----+----+---- 11 | 11 | 11 1 | 1 | 1 2 | 2 | 2 |
|
1 2 3 4 |
db02=# SELECT * FROM son; c1 | c2 | c3 | c4 ----+----+----+---- 1 | 1 | 1 | 1 |
|
1 2 3 4 |
db02=# SELECT * FROM daughter; c1 | c2 | c3 | c4 | c5 ----+----+----+----+---- 2 | 2 | 2 | 2 | 2 |
|
1 2 3 4 |
db02=# SELECT * FROM cousin; c1 | c2 | c3 | c6 ----+----+----+---- 3 | 3 | 3 | 3 |
Performing data migration under production conditions should take into consideration these four distinct query operations:
SELECT from both the target and source tables at the same time.UPDATE and/or DELETE records from both the target and source tables.INSERT new records into the target table.For the sake of discussion, we will demonstrate using one source table and one target table, each inheriting from a single parent:
|
1 2 3 |
CREATE TABLE parent(c1 int PRIMARY KEY, c2 int, c3 int); CREATE TABLE source(LIKE parent INCLUDING ALL) INHERITS (parent); CREATE TABLE target(LIKE parent INCLUDING ALL) INHERITS (parent); |
Inheritance makes the SELECT query a straightforward operation. This query checks all tables for the queried records:
|
1 |
EXPLAIN SELECT * FROM parent; |
|
1 2 3 4 5 6 |
QUERY PLAN -------------------------------------------------------------------------- Append (cost=0.00..81.21 rows=4081 width=12) -> Seq Scan on parent parent_1 (cost=0.00..0.00 rows=1 width=12) -> Seq Scan on source parent_2 (cost=0.00..30.40 rows=2040 width=12) -> Seq Scan on target parent_3 (cost=0.00..30.40 rows=2040 width=12) |
Similar to SELECT queries, one does not have to edit the existing application’s DML operations when performing UPDATE and DELETE. Notice how both the source and target tables are queried, along with the parent:
|
1 |
EXPLAIN UPDATE parent SET c2 = 0 WHERE c1 = 1; |
|
1 2 3 4 5 6 7 8 9 10 11 12 |
QUERY PLAN --------------------------------------------------------------------------------- Update on parent (cost=0.00..16.34 rows=3 width=18) Update on parent Update on source Update on target -> Seq Scan on parent (cost=0.00..0.00 rows=1 width=18) Filter: (c1 = 1) -> Index Scan using source_pkey on source (cost=0.15..8.17 rows=1 width=18) Index Cond: (c1 = 1) -> Index Scan using target_pkey on target (cost=0.15..8.17 rows=1 width=18) Index Cond: (c1 = 1) |
The thing to keep in mind about INSERT is that, without a redirect mechanism, all records are inserted into the parent.
Since everyone already knows about triggers, I thought it would be fun to use a rewrite rule instead:
|
1 2 3 4 |
CREATE RULE child_insert AS ON INSERT TO parent DO INSTEAD INSERT INTO target VALUES (NEW.*); |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Table "public.parent" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+---------+-----------+----------+---------+---------+--------------+------------- c1 | integer | | not null | | plain | | c2 | integer | | | | plain | | c3 | integer | | | | plain | | Indexes: "parent_pkey" PRIMARY KEY, btree (c1) Rules: child_insert AS ON INSERT TO parent DO INSTEAD INSERT INTO target (c1, c2, c3) VALUES (new.c1, new.c2, new.c3) Child tables: source, target |
Here is our validation. Notice how the INSERT is redirected from parent to target:
|
1 |
EXPLAIN INSERT INTO parent (c1, c2, c3) VALUES (1, 1, 1); |
|
1 2 3 4 |
QUERY PLAN ----------------------------------------------------- Insert on target (cost=0.00..0.01 rows=1 width=12) -> Result (cost=0.00..0.01 rows=1 width=12) |
It is time to introduce the last mechanism needed to perform the actual data migration. Essentially, the data is moved in batches. Otherwise, if you can afford the downtime of moving your records in one very large transaction, this dog-and-pony show is redundant.
In the real world, we need to anticipate multiple processes attempting simultaneous exclusive locks. If one or more records are locked by another operation, the following example demonstrates how you can simply skip over them:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
-- Insert a single record into table "source". INSERT INTO source (c1, c2, c3) VALUES (2, 2, 2); -- Move 1,000 records at a time from table "source" to "target". WITH a AS ( SELECT * FROM source FOR UPDATE SKIP LOCKED LIMIT 1000 ), b AS ( DELETE FROM source USING a WHERE c1 = a.c1 RETURNING source.c1, source.c2, source.c3 ) INSERT INTO target SELECT * FROM b; |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Insert on target (cost=27.92..41.52 rows=680 width=12) (actual time=0.082..0.082 rows=0 loops=1) CTE b -> Delete on source (cost=9.42..27.92 rows=680 width=6) (actual time=0.050..0.053 rows=1 loops=1) -> Bitmap Heap Scan on source (cost=9.42..27.92 rows=680 width=6) (actual time=0.021..0.023 rows=1 loops=1) Recheck Cond: (c1 >= 0) Heap Blocks: exact=1 -> Bitmap Index Scan on source_pkey (cost=0.00..9.25 rows=680 width=0) (actual time=0.010..0.011 rows=1 loops=1) Index Cond: (c1 >= 0) -> CTE Scan on b (cost=0.00..13.60 rows=680 width=12) (actual time=0.054..0.057 rows=1 loops=1) Planning Time: 0.237 ms Execution Time: 0.173 ms |
It is time to demonstrate a proof of concept using pgbench.
Initialize database db02:
|
1 2 3 |
dropdb --if-exists db02 createdb db02 pgbench -s 10 -i db02 |
|
1 2 3 4 5 6 7 |
List of relations Schema | Name | Type | Owner | Size | Description --------+------------------+-------+----------+---------+------------- public | pgbench_accounts | table | postgres | 128 MB | public | pgbench_branches | table | postgres | 40 kB | public | pgbench_history | table | postgres | 0 bytes | public | pgbench_tellers | table | postgres | 40 kB | |
Create the tables parent and child.
Note: In order to demonstrate data migration from a deprecated table, table pgbench_accounts is altered by adding OIDs.
|
1 2 3 4 5 6 7 |
CREATE TABLE parent (LIKE pgbench_accounts INCLUDING ALL) WITHOUT OIDS; CREATE TABLE child (LIKE pgbench_accounts INCLUDING ALL) INHERITS (parent) WITHOUT OIDS; ALTER TABLE pgbench_accounts SET WITH OIDS, INHERIT parent; ALTER TABLE pgbench_accounts RENAME TO pgbench_accounts_deprecated; ALTER TABLE parent RENAME TO pgbench_accounts; |
This query is at the heart of the solution. Any exclusively locked record that it tries to move is automatically skipped, and a new attempt can be made the next time this script is invoked.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
WITH a AS ( SELECT * FROM pgbench_accounts_deprecated ORDER BY 1 FOR UPDATE SKIP LOCKED LIMIT 10 ), b AS ( DELETE FROM pgbench_accounts_deprecated USING a WHERE pgbench_accounts_deprecated.aid = a.aid RETURNING pgbench_accounts_deprecated.aid, pgbench_accounts_deprecated.bid, pgbench_accounts_deprecated.abalance, pgbench_accounts_deprecated.filler ) INSERT INTO child SELECT * FROM b; |
|
1 |
EXPLAIN ANALYZE SELECT * FROM pgbench_accounts ORDER BY 1 LIMIT 13; |
|
1 2 3 4 5 6 7 8 9 10 |
QUERY PLAN -------------------------------------------------------------------------------------------------- Limit (cost=0.72..1.45 rows=13 width=97) (actual time=0.012..0.016 rows=13 loops=1) -> Merge Append (cost=0.72..56212.42 rows=1000211 width=97) (actual time=0.011..0.013 rows=13 loops=1) Sort Key: pgbench_accounts.aid -> Index Scan using parent_pkey on pgbench_accounts (cost=0.12..8.14 rows=1 width=352) (actual time=0.002..0.002 rows=0 loops=1) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts_deprecated (cost=0.42..43225.43 rows=1000000 width=97) (actual time=0.006..0.007 rows=3 loops=1) -> Index Scan using child_pkey on child (cost=0.14..51.30 rows=210 width=352) (actual time=0.002..0.003 rows=10 loops=1) Planning Time: 0.084 ms Execution Time: 0.030 ms |
The query has been incorporated into this script, moving 1,000 records every five seconds.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
#!/bin/bash set -e export PGHOST=/tmp export PGPORT=10011 export PGDATABASE=db02 export PGUSER=postgres SLEEP=5 REC=1000 SQL=" WITH a AS ( SELECT * FROM pgbench_accounts_deprecated ORDER BY 1 FOR UPDATE SKIP LOCKED LIMIT ${REC} ), b AS ( DELETE FROM pgbench_accounts_deprecated USING a WHERE pgbench_accounts_deprecated.aid = a.aid RETURNING pgbench_accounts_deprecated.aid, pgbench_accounts_deprecated.bid, pgbench_accounts_deprecated.abalance, pgbench_accounts_deprecated.filler ) INSERT INTO child SELECT * FROM b; WITH a(count_child) AS ( SELECT count(*) FROM child ), b(count_accounts) AS ( SELECT count(*) FROM pgbench_accounts_deprecated ) SELECT a.count_child, b.count_accounts FROM a, b; " while true; do echo "--- $(date): Executing query, moving ${REC} records now ... ---" psql <<< "${SQL}" echo "Sleeping: ${SLEEP} seconds ..." sleep "${SLEEP}" done |
While the aforementioned script is active, pgbench is running the benchmarking.
|
1 2 |
# Does not block. pgbench -c 2 -j 4 --protocol=simple -T 120 db02 |
This is a simple and powerful method, but there are limitations. Common columns between tables must be of the same data type.
For example, if column c1 in table source is of data type int, and you want to migrate the data into table target with the same column c1 but with data type bigint, then this method will not work. An alternative solution could take advantage of updatable views, which you can read more about here, along with appropriate triggers and rewrite rules.
Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.
Resources
RELATED POSTS