This blog was originally published in July 2023 and was updated in April 2024.
Partitioning is the concept of splitting large tables logically into smaller pieces for better performance of the database. In this article, we’ll discuss how to effectively use traditional partitioning methods in PostgreSQL to effectively split tables and improve database performance.

Range partitioning is a database partitioning method that is based on a specific range of columns with data like dates and Numeric values.
Here, as an example, I created a table with range partitioning and partition tables for each quarter on a Date column.
|
1 |
CREATE TABLE employees (id INT NOt NULL , fname VARCHAR(20) , lname VARCHAR (20) , dob DATE NOT NULL , <br>joined DATE NOT NULL) PARTITION BY RANGE (joined); |
|
1 |
CREATE TABLE employees_q1 PARTITION of employees for VALUES FROM ('2022-01-01') to ('2022-04-01');<br>CREATE TABLE employees_q2 PARTITION of employees for VALUES FROM ('2022-04-01') to ('2022-07-01');<br>CREATE TABLE employees_q3 PARTITION of employees for VALUES FROM ('2022-07-01') to ('2022-10-01');<br>CREATE TABLE employees_q4 PARTITION of employees for VALUES FROM ('2022-10-01') to ('2023-01-01'); |
Range partitions are seen below in the table structure.
|
1 |
d+ employees<br> Partitioned table "public.employees"<br>Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description<br>--------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------<br>id | integer | | not null | | plain | | |<br>fname | character varying(20) | | | | extended | | |<br>lname | character varying(20) | | | | extended | | |<br>dob | date | | not null | | plain | | |<br>joined | date | | not null | | plain | | |<br>Partition key: RANGE (joined)<br> Partitions: employees_q1 FOR VALUES FROM ('2022-01-01') TO ('2022-04-01'),<br> employees_q2 FOR VALUES FROM ('2022-04-01') TO ('2022-07-01'),<br> employees_q3 FOR VALUES FROM ('2022-07-01') TO ('2022-10-01'),<br> employees_q4 FOR VALUES FROM ('2022-10-01') TO ('2023-01-01')<br> |
Inserted some random data for entries with 365 days a year.
|
1 |
INSERT INTO employees (id ,fname,lname,dob ,joined) VALUES ( generate_series(1, 365) ,(array['Oswald', 'Henry', 'Bob', 'Vennie'])[floor(random() * 4 + 1)], <br>(array['Leo', 'Jack', 'Den', 'Daisy' ,'Woody'])[floor(random() * 5 + 1)], '1995-01-01'::date + trunc(random() * 366 * 3)::int,<br>generate_series('1/1/2022'::date, '12/31/2022'::date, '1 day')); |
Range partitioned data is seen as below distributed among its partitions.
|
1 |
SELECT employees_q1 , employees_q2 , employees_q3 , employees_q4 , employees_totalcnt from<br>( SELECT COUNT(*) FROM employees_q1 ) AS employees_q1,<br>( SELECT COUNT(*) FROM employees_q2 ) AS employees_q2, <br>( SELECT COUNT(*) FROM employees_q3 ) AS employees_q3,<br>( SELECT COUNT(*) FROM employees_q3 ) AS employees_q4 ,<br>( SELECT COUNT(*) FROM employees ) AS employees_totalcnt ;<br>employees_q1 | employees_q2 | employees_q3 | employees_q4 | employees_totalcnt<br>--------------+--------------+--------------+--------------+--------------------<br>(90) | (91) | (92) | (92) | (365)<br>(1 row) |
Here, I created a table without a partition and inserted the same data, similar to the partitioned table.
A query plan is seen better for DDL operations when performed on data with a single partition or fewer partitions.
|
1 |
CREATE TABLE employees_nopartition (id INT NOt NULL , fname VARCHAR(20) , lname VARCHAR (20) , dob DATE NOT NULL , joined DATE NOT NULL) ; |
|
1 |
INSERT INTO employees_nopartition (id ,fname,lname,dob ,joined) VALUES ( generate_series(1, 365) ,(array['Oswald', 'Henry', 'Bob', 'Vennie'])[floor(random() * 4 + 1)], <br>(array['Leo', 'Jack', 'Den', 'Daisy' ,'Woody'])[floor(random() * 5 + 1)], '1995-01-01'::date + trunc(random() * 366 * 3)::int,<br>generate_series('1/1/2022'::date, '12/31/2022'::date, '1 day')); |
|
1 |
EXPLAIN select * from employees_nopartition where joined >= '2022-05-12' and joined < '2022-06-10'; <br> QUERY PLAN <br>------------------------------------------------------------------------------ <br>Seq Scan on employees_nopartition (cost=0.00..8.47 rows=29 width=22) <br> Filter: ((joined >= '2022-05-12'::date) AND (joined < '2022-06-10'::date)) <br>(2 rows) |
Here we can see a better query plan when data is fetched from the partitioned table than data fetched from the non-partitioned table.
|
1 |
EXPLAIN select * from employees where joined >= '2022-05-12' and joined < '2022-06-10'; <br> QUERY PLAN <br>------------------------------------------------------------------------------ <br>Seq Scan on employees_q2 (cost=0.00..2.37 rows=29 width=22) <br> Filter: ((joined >= '2022-05-12'::date) AND (joined < '2022-06-10'::date)) <br>(2 rows) |
Ensure your databases are performing their best — today and tomorrow — with proactive database optimization and query tuning. Book a database assessment
List partitioning is a database partitioning method that is based on key value(s) or discrete values and partition can also be done with the expression of the column like (RANGE BY LIST(expression)), which is explained below:
For example, I created a table with a list partition and a few list-partitioned tables and inserted some random data with 1,000 rows.
|
1 |
CREATE TABLE sales (id INT NOT NULL , branch VARCHAR(3),type text, Amount int ) PARTITION BY LIST (branch); |
|
1 |
CREATE TABLE HYD_sales PARTITION of sales for VALUES IN ('HYD');<br>CREATE TABLE BLR_sales PARTITION of sales for VALUES IN ('BLR');<br>CREATE TABLE DEL_sales PARTITION of sales for VALUES IN ('DEL');<br>CREATE TABLE TPT_sales PARTITION of sales for VALUES IN ('TPT');<br> |
|
1 |
INSERT into sales (id , branch ,type , amount ) VALUES ( generate_series(1, 1000) , (array['HYD', 'BLR', 'DEL', 'TPT'])[floor(random() * 4 + 1)] , <br>(array['Laptops', 'Printers', 'Hardisks', 'Desktops' ,'Monitors'])[floor(random() * 5 + 1)], (random()*200000)::int ); |
List partitions are seen in the table definition below:
|
1 |
d+ sales<br> Partitioned table "public.sales"<br>Column | Type | Collation | Nullable | Default | Storage | Stats target | Description<br>--------+----------------------+-----------+----------+---------+----------+--------------+-------------<br>id | integer | | not null | | plain | |<br>branch | character varying(3) | | | | extended | |<br>type | text | | | | extended | |<br>amount | integer | | | | plain | |<br>Partition key: LIST (branch)<br> Partitions: blr_sales FOR VALUES IN ('BLR'),<br> del_sales FOR VALUES IN ('DEL'),<br> hyd_sales FOR VALUES IN ('HYD'),<br> tpt_sales FOR VALUES IN ('TPT') |
Partitioned data distributed among its partitions is seen below:
|
1 |
SELECT blr_sales , del_sales , hyd_sales,tpt_sales, total_cnt from<br>( SELECT COUNT(*) FROM blr_sales ) AS blr_sales, ( SELECT COUNT(*) FROM del_sales ) AS del_sales, <br>( SELECT COUNT(*) FROM hyd_sales ) AS hyd_sales, ( SELECT COUNT(*) FROM tpt_sales ) AS tpt_sales ,<br>( SELECT COUNT(*) FROM sales ) AS total_cnt;<br>blr_sales | del_sales | hyd_sales | tpt_sales | total_cnt<br>-----------+-----------+-----------+-----------+-----------<br>(262) | (258) | (228) | (252) | (1001)<br>(1 row) |
For example, I created a table with list partitioning using the expression of a column.
|
1 |
CREATE TABLE donors (id INT NOt NULL , name VARCHAR(20) , bloodgroup VARCHAR (15) , last_donated DATE , <br>contact_num VARCHAR(10)) PARTITION BY LIST (left(upper(bloodgroup),3)); |
|
1 |
CREATE TABLE A_positive PARTITION of donors for VALUES IN ('A+ ');<br>CREATE TABLE A_negative PARTITION of donors for VALUES IN ('A- ');<br>CREATE TABLE B_positive PARTITION of donors for VALUES IN ('B+ ');<br>CREATE TABLE B_negative PARTITION of donors for VALUES IN ('B- ');<br>CREATE TABLE AB_positive PARTITION of donors for VALUES IN ('AB+');<br>CREATE TABLE AB_negative PARTITION of donors for VALUES IN ('AB-');<br>CREATE TABLE O_positive PARTITION of donors for VALUES IN ('O+ ');<br>CREATE TABLE O_negative PARTITION of donors for VALUES IN ('O- '); |
List partitions are seen in the table definition below:
|
1 |
d+ donors<br> Partitioned table "public.donors"<br>Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description<br>--------------+-----------------------+-----------+----------+---------+----------+-------------+--------------+-------------<br>id | integer | | not null | | plain | | |<br>name | character varying(20) | | | | extended | | |<br>bloodgroup | character varying(15) | | | | extended | | |<br>last_donated | date | | | | plain | | |<br>contact_num | character varying(10) | | | | extended | | |<br>Partition key: LIST ("left"(upper((bloodgroup)::text), 3))<br> Partitions: a_negative FOR VALUES IN ('A- '),<br> a_positive FOR VALUES IN ('A+ '),<br> ab_negative FOR VALUES IN ('AB-'),<br> ab_positive FOR VALUES IN ('AB+'),<br> b_negative FOR VALUES IN ('B- '),<br> b_positive FOR VALUES IN ('B+ '),<br> o_negative FOR VALUES IN ('O- '),<br> o_positive FOR VALUES IN ('O+ ') |
Here, I inserted some random 100 rows.
|
1 |
INSERT INTO donors (id , name , bloodgroup , last_donated , contact_num) VALUES (generate_series(1, 100) ,'user_' || trunc(random()*100) ,<br>(array['A+ group', 'A- group', 'O- group', 'O+ group','AB+ group','AB- group','B+ group','B- group'])[floor(random() * 8 + 1)] , '2022-01-01'::date + trunc(random() * 366 * 1)::int,<br>CAST(1000000000 + floor(random() * 9000000000) AS bigint)); |
List partitioned data with expression distributed among its partitions is seen below:
|
1 |
SELECT a_negative , a_positive , ab_negative , ab_positive , b_negative ,b_positive ,o_negative , o_positive , total_cnt from<br>( SELECT COUNT(*) FROM a_negative ) AS a_negative, ( SELECT COUNT(*) FROM a_positive ) AS a_positive, <br>( SELECT COUNT(*) FROM ab_negative ) AS ab_negative, ( SELECT COUNT(*) FROM ab_positive ) AS ab_positive ,<br>( SELECT COUNT(*) FROM b_negative ) AS b_negative, ( SELECT COUNT(*) FROM b_positive ) AS b_positive , <br>( SELECT COUNT(*) FROM o_positive ) AS o_positive , ( SELECT COUNT(*) FROM o_negative ) AS o_negative,<br>( SELECT COUNT(*) FROM donors ) AS total_cnt;<br> a_negative | a_positive | ab_negative | ab_positive | b_negative | b_positive | o_negative | o_positive | total_cnt<br>------------+------------+-------------+-------------+------------+------------+------------+------------+-----------<br>(9) | (19) | (10) | (12) | (12) | (10) | (18) | (10) | (100)<br>(1 row) |
Here is an example shown with the table, which is created without partitions and inserted the same data similar to that of the partitioned table.
Below I created a table without a partition and inserted some random data with 1,000 rows to show query performance.
|
1 |
CREATE TABLE sales_nopartition (id INT NOT NULL , branch VARCHAR(3),type text, Amount int ); |
|
1 |
INSERT into sales_nopartition (id , branch ,type , amount ) VALUES ( generate_series(1, 1000) ,<br>(array['HYD', 'BLR', 'DEL', 'TPT'])[floor(random() * 4 + 1)] , <br>(array['Laptops', 'Printers', 'Hardisks', 'Desktops' ,'Monitors'])[floor(random() * 5 + 1)], (random()*200000)::int ); |
UPDATE Query Performance
|
1 |
EXPLAIN update sales_nopartition set type = 'Smart Watches' where branch = 'HYD';<br> QUERY PLAN<br>---------------------------------------------------------------------------<br>Update on sales_nopartition (cost=0.00..19.50 rows=229 width=50)<br> -> Seq Scan on sales_nopartition (cost=0.00..19.50 rows=229 width=50)<br> Filter: ((branch)::text = 'HYD'::text)<br>(3 rows)<br> |
|
1 |
EXPLAIN update sales set type = 'Smart Watches' where branch = 'HYD';<br> QUERY PLAN<br>------------------------------------------------------------------<br>Update on sales (cost=0.00..5.10 rows=248 width=50)<br>Update on hyd_sales<br> -> Seq Scan on hyd_sales (cost=0.00..5.10 rows=248 width=50)<br> Filter: ((branch)::text = 'HYD'::text)<br>(4 rows) |
DELETE Query Performance
|
1 |
EXPLAIN DELETE from sales_nopartition where branch='HYD';<br> QUERY PLAN<br>--------------------------------------------------------------------------<br>Delete on sales_nopartition (cost=0.00..19.50 rows=229 width=6)<br> -> Seq Scan on sales_nopartition (cost=0.00..19.50 rows=229 width=6)<br> Filter: ((branch)::text = 'HYD'::text)<br>(3 rows) |
|
1 |
EXPLAIN DELETE from sales where branch='HYD';<br> QUERY PLAN<br>-----------------------------------------------------------------<br>Delete on sales (cost=0.00..5.10 rows=248 width=6)<br>Delete on hyd_sales<br> -> Seq Scan on hyd_sales (cost=0.00..5.10 rows=248 width=6)<br> Filter: ((branch)::text = 'HYD'::text)<br>(4 rows) |
The above examples show the performance of DELETE and UPDATE operations with data fetched from a single partitioned table having a better query plan than the one with no partitions.
A hash partitioning table is defined as the table partitioned by specifying a modulus and a remainder for each partition.
For example, I created a table with hash partitioning and a few partitioned tables with modulus five.
|
1 |
CREATE TABLE students ( id int NOT NULL, name varchar(30) NOT NULL , course varchar(100) ,joined date ) PARTITION BY hash(id); |
|
1 |
CREATE TABLE student_0 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 0);<br>CREATE TABLE student_1 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 1);<br>CREATE TABLE student_2 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 2);<br>CREATE TABLE student_3 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 3);<br>CREATE TABLE student_4 PARTITION OF students FOR VALUES WITH (MODULUS 5,REMAINDER 4); |
The table structure looks like the one below with five created partitions:
|
1 |
d+ students<br> Partitioned table "public.students"<br>Column | Type | Collation | Nullable | Default | Storage | Stats target | Description<br>--------+------------------------+-----------+----------+---------+----------+--------------+-------------<br>id | integer | | not null | | plain | |<br>name | character varying(30) | | not null | | extended | |<br>course | character varying(100) | | | | extended | |<br>joined | date | | | | plain | |<br>Partition key: HASH (id)<br> Partitions: student_0 FOR VALUES WITH (modulus 5, remainder 0),<br> student_1 FOR VALUES WITH (modulus 5, remainder 1),<br> student_2 FOR VALUES WITH (modulus 5, remainder 2),<br> student_3 FOR VALUES WITH (modulus 5, remainder 3),<br> student_4 FOR VALUES WITH (modulus 5, remainder 4)<br> |
Here, I Inserted some random data with 100,000 rows.
|
1 |
INSERT into students (id , name , course ,joined ) VALUES (generate_series(1, 100000) , 'student_' || trunc(random()*1000) , <br>(array['Finance & Accounts', 'Business Statistics', 'Environmental Science'])[floor(random() * 3 + 1)],'2019-01-01'::date + trunc(random() * 366 * 3)::int); |
We see below the hash partitioned data among its partitioned tables.
|
1 |
SELECT relname,reltuples as rows FROM pg_class WHERE relname IN ('student_0','student_1','student_2','student_3','student_4') ORDER BY relname;<br>relname | rows<br>-----------+-------<br>student_0 | 19851<br>student_1 | 20223<br>student_2 | 19969<br>student_3 | 19952<br>student_4 | 20005<br>(5 rows) |
For this purpose, we use default partitions on range and list partitioned tables.
For both range and list partitions, data can be stored temporarily, which is out-of-range, by creating a default partition and later creating an appropriate partition.
Hash-partitioned tables may not have a default partition, as the creation of a default partition for hash partitioning does not make any sense and is not needed.
We see here what happens when I try to insert data for which a partition doesn’t exist and how the default partition helps in this case.
|
1 |
INSERT into sales VALUES ( 1001 , 'MYS' , 'Scanners' , 190000);<br>ERROR: no partition of relation "sales" found for row<br>DETAIL: Partition key of the failing row contains (branch) = (MYS). |
|
1 |
CREATE TABLE sales_default PARTITION of sales DEFAULT;<br>CREATE TABLE<br><br>INSERT into sales VALUES ( 1001 , 'MYS' , 'Scanners' , 190000);<br>INSERT 0 1 |
|
1 |
select * from sales_default ;<br>id | branch | type | amount<br>------+--------+----------+--------<br>1001 | MYS | Scanners | 190000<br>(1 row) |
So the data we inserted is sent to the default partition, and partitions can be created later based on the data in the default table and available partitions.
Here, we discussed default partitioning techniques in PostgreSQL using single columns and how to create multi-column partitioning. PostgreSQL Partition Manager(pg_partman) can also be used to create and manage partitions effectively. Further details will be explained in upcoming blogs.
Also, please find below the related blogs for reference:
PostgreSQL Performance Tuning: Optimizing Database Parameters for Maximum Efficiency
Performing ETL Using Inheritance in PostgreSQL
Partitioning in PostgreSQL With pg_partman (Serial-Based & Trigger-Based)
PostgreSQL Partitioning Made Easy Using pg_partman (TimeBased)
Our PostgreSQL Performance Tuning guide condenses years of database expertise into a practical guide for optimizing your PostgreSQL databases. Get it today:
Download now: Elevate your PostgreSQL Performance
In PostgreSQL, partitioning involves dividing a large table into smaller, more manageable pieces, known as partitions, based on certain criteria like ranges or list values. This can significantly improve query performance on large datasets and simplify data management. Indexing, on the other hand, involves creating a data structure that improves the speed of data retrieval operations on a table. While indexing speeds up query processing by providing quick lookups, partitioning optimizes performance by limiting the number of rows to scan.
Sharding and partitioning in PostgreSQL both deal with distributing data but in subtly different ways. Sharding typically refers to distributing data across multiple databases or servers, potentially across different physical locations, to balance load and improve performance. Partitioning, however, occurs within a single database and involves dividing a table into segments that are easier to manage and query. While both techniques can help with scaling, sharding is more about the horizontal distribution of data across a cluster, whereas partitioning is about organizing data within a database.
Partitioning a database is most appropriate when dealing with large tables that significantly impact query performance and maintenance tasks and is particularly beneficial when a database frequently executes queries that scan large segments of data based on specific keys or ranges, such as dates or geographic locations. Partitioning can also be advantageous when archival operations are routine, allowing for easier management of historical data by segregating older entries into separate partitions.