SQL Optimizations in PostgreSQL: IN vs EXISTS vs ANY/ALL vs JOIN

SQL optimizations in PostgreSQLThis is one of the most common questions asked by developers who write SQL queries against the PostgreSQL database. There are multiple ways in which a sub select or lookup can be framed in a SQL statement. PostgreSQL optimizer is very smart at optimizing queries, and many of the queries can be rewritten/transformed for better performance.

Let’s discuss the topic with an example, for which I am using schema created by pgbench.

Note: For those not familiar with pgbench, it is a micro benchmarking tool shipped with PostgreSQL. A sample pgbench schema can be initialized with some data as follows:

For this example, I have updated the branch balance of a couple of branches:

INclusion Queries

The SQL Challenge for this example is: Find out the number of accounts per branch from pgbench_accounts for those branches where branch level balance is greater than zero. This query can be written in four different ways, as per ANSI SQL Standards.

1.  Using IN Clause

2. Using ANY Clause

3. Using EXISTS Clause

4. Using INNER JOIN

While writing the query, one might assume that EXISTS and INNER JOIN might be better because they can use all the logic and optimization for joining two tables, while IN and ANY clauses need to deal with subqueries. However, PostgreSQL (at least PG 10 and above) is smart enough to produce the same execution plan for all four options!.

All of the above queries will be generating the same execution plan as follows:

Note: Suppress the parallel execution for better readability and a simple execution plan. Even with a parallel execution plan, all the queries are producing the same execution plan.

So can we conclude that we can write the query as we are comfortable and PostgreSQL’s intelligence will take care of the rest? Wait! Things can go differently if we take the exclusion scenario.

Exclusion Queries

The SQL challenge becomes: Find out the number of accounts per branch from pgbench_accounts EXCEPT for those branches where branch level balance is greater than zero.

So the four ways to write queries becomes:

1. Using NOT IN

2. Using <> ALL

3. Using NOT EXISTS

4. Using LEFT JOIN and IS NULL

The “NOT IN” and “<> ALL” produces an execution plan with sub-queries (SubPlan). They are respectively: