
This is one of the most common questions asked by developers writing SQL queries against PostgreSQL. There are multiple ways to structure subqueries or lookups, and PostgreSQL’s optimizer is quite effective at transforming queries for better performance.
Let’s walk through an example using the pgbench schema.
Note: pgbench is a benchmarking tool included with PostgreSQL. You can initialize sample data with:
|
1 |
pgbench -i -s 10 |
Update some sample data:
|
1 |
update pgbench_branches set bbalance=4500000 where bid in (4,7); |
Goal: Find the number of accounts per branch where branch balance is greater than zero.
|
1 2 3 |
SELECT count(aid),bid FROM pgbench_accounts WHERE bid IN (SELECT bid FROM pgbench_branches WHERE bbalance > 0) GROUP BY bid; |
|
1 2 3 |
SELECT count(aid),bid FROM pgbench_accounts WHERE bid = ANY(SELECT bid FROM pgbench_branches WHERE bbalance > 0) GROUP BY bid; |
|
1 2 3 4 5 6 7 8 |
SELECT count(aid),bid FROM pgbench_accounts WHERE EXISTS ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 AND pgbench_accounts.bid = pgbench_branches.bid ) GROUP BY bid; |
|
1 2 3 4 5 |
SELECT count(aid),a.bid FROM pgbench_accounts a JOIN pgbench_branches b ON a.bid = b.bid WHERE b.bbalance > 0 GROUP BY a.bid; |
PostgreSQL produces the same execution plan for all of these approaches.
|
1 2 3 4 |
HashAggregate -> Hash Join -> Seq Scan on pgbench_accounts -> Seq Scan on pgbench_branches (Filter: bbalance > 0) |
This means you can typically choose the syntax you prefer.
Goal: Find accounts per branch excluding branches with positive balances.
|
1 2 3 |
SELECT count(aid),bid FROM pgbench_accounts WHERE bid NOT IN (SELECT bid FROM pgbench_branches WHERE bbalance > 0) GROUP BY bid; |
|
1 2 3 |
SELECT count(aid),bid FROM pgbench_accounts WHERE bid <> ALL(SELECT bid FROM pgbench_branches WHERE bbalance > 0) GROUP BY bid; |
|
1 2 3 4 5 6 7 8 |
SELECT count(aid),bid FROM pgbench_accounts WHERE NOT EXISTS ( SELECT bid FROM pgbench_branches WHERE bbalance > 0 AND pgbench_accounts.bid = pgbench_branches.bid ) GROUP BY bid; |
|
1 2 3 4 5 6 |
SELECT count(aid),a.bid FROM pgbench_accounts a LEFT JOIN pgbench_branches b ON a.bid = b.bid AND b.bbalance > 0 WHERE b.bid IS NULL GROUP BY a.bid; |
NOT EXISTS and LEFT JOIN produce better execution plans (hash anti-joins), while NOT IN and <> ALL may generate subplans.
With small datasets, PostgreSQL optimizes NOT IN well using hashed subplans. But with large subqueries, performance degrades significantly:
|
1 2 3 4 |
CREATE TABLE t1 AS SELECT * FROM generate_series(0, 500000) id; CREATE TABLE t2 AS SELECT (random() * 4000000)::integer id FROM generate_series(0, 4000000); EXPLAIN SELECT id FROM t1 WHERE id NOT IN (SELECT id FROM t2); |
This results in expensive materialization and poor performance.
Different syntax can introduce implicit casts:
|
1 |
EXPLAIN ANALYZE SELECT * FROM emp WHERE gen = ANY(ARRAY['M','F']); |
This may cast values to text, adding overhead.
Using IN avoids unnecessary casting:
|
1 |
SELECT * FROM emp WHERE gen IN ('M','F'); |
General approach:
Never assume performance based on small datasets—test at scale.
Our white paper “Why Choose PostgreSQL?” explores features, benefits, and migration strategies.
Resources
RELATED POSTS