Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live.  Register

SQL Optimizations in PostgreSQL: IN vs EXISTS vs ANY/ALL vs JOIN

April 16, 2020
Author
Jobin Augustine
Share this Post:

SQL optimizations in PostgreSQL
This is one of the most common questions asked by developers writing SQL queries against PostgreSQL. There are multiple ways to structure subqueries or lookups, and PostgreSQL’s optimizer is quite effective at transforming queries for better performance.

Let’s walk through an example using the pgbench schema.

Note: pgbench is a benchmarking tool included with PostgreSQL. You can initialize sample data with:

Update some sample data:

Inclusion Queries

Goal: Find the number of accounts per branch where branch balance is greater than zero.

1. Using IN

2. Using ANY

3. Using EXISTS

4. Using INNER JOIN

PostgreSQL produces the same execution plan for all of these approaches.

This means you can typically choose the syntax you prefer.

Exclusion Queries

Goal: Find accounts per branch excluding branches with positive balances.

1. Using NOT IN

2. Using <> ALL

3. Using NOT EXISTS

4. Using LEFT JOIN

NOT EXISTS and LEFT JOIN produce better execution plans (hash anti-joins), while NOT IN and <> ALL may generate subplans.

Large Subquery Considerations

With small datasets, PostgreSQL optimizes NOT IN well using hashed subplans. But with large subqueries, performance degrades significantly:

This results in expensive materialization and poor performance.

Datatype Conversion Considerations

Different syntax can introduce implicit casts:

This may cast values to text, adding overhead.

Using IN avoids unnecessary casting:

Summary

  • PostgreSQL often optimizes different query styles into the same plan
  • EXISTS and JOIN are generally safer for exclusion queries
  • IN works well for small subqueries but can degrade with large datasets
  • Be aware of implicit datatype conversions
  • Always validate with EXPLAIN

General approach:

  1. Identify required tables
  2. Determine joins
  3. Minimize rows in joins

Never assume performance based on small datasets—test at scale.


Our white paper “Why Choose PostgreSQL?” explores features, benefits, and migration strategies.


Download PDF


Deploy Highly Available PostgreSQL from Percona

0 0 votes
Article Rating
Subscribe
Notify of
guest

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Stanislav Sumariuk
Stanislav Sumariuk
6 years ago

I will definitely revisit some of my queries. Thank you for this post.

Scott C.
Scott C.
6 years ago

The exclusion queries are not equivalent and can produce vastly different results. Just try it where the exclusion set contains 100 values, one of which is null.

Francisco Puga
Francisco Puga
5 years ago

Really nice post.

I know that is centered on performance but as some people points that NOT IN is a don’t [1], include a note about how NULLs are handled in the different options or at least writing a warn about it will be nice.

[1] https://wiki.postgresql.org/wiki/Don't_Do_This

harisai hari
5 years ago

I personally like the first scenario, no problem you can call me a lazy DBA. I always afraid of exclusion queries triggered by tall developers.

Rajat Gupta
Rajat Gupta
5 years ago

Trying to assess a situation where from a heavy table in PostgreSQL, data is fetched using an API and filter is applied making use of an IN clause and the values to filter are passed in a measure of few thousands. Will psql still use the in clause optimally? or some other construct suits well?

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved