Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live.  Register

Powering PostgreSQL 15 With Columnar Tables

December 14, 2022
Author
Robert Bernier
Share this Post:

Investigating Columnar Tables in PostgreSQL with the Citus Columnar Extension

Prologue

This blog is the result of my investigation into column-wise tables. Hopefully, you will find it useful in your own deliberations.

When I started writing, this was meant to be a “good news” blurb. I was optimistic that I would have truly impressive numbers to share. As it turns out, while there is potential for significant advantage, a detailed testing regime of the Citus columnar store extension should be carried out before implementing it in your own environment.

The Missing Sizzle

Sizzle: A phrase used to show affirmation or approval. It references the sound one hears when cooking certain delicious foods, such as bacon or fajitas, and transfers those feelings to non-edible objects or ideas.

There is a lot to take in with every new version of PostgreSQL, and I am often impressed by each release. Nevertheless, there has been one long-standing feature that I have always felt was missing. This weakness is being addressed by the industry. Ironically, some of these implementations are themselves Postgres derivatives.

What is this sizzling feature?

Column-wise tables.

When to Consider Column-Wise Tables

Consider using the Citus columnar extension under one or more of the following conditions:

Table Attributes

  • The tables are wide, with many columns.
  • The table size is at least in the tens of gigabytes.
  • The typical row is byte-heavy.

System Considerations

  • System space is at a premium.
  • It is worth your while to manage space utilization closely.
  • OLAP is a major component of overall activity, with many different kinds of SELECT statements.
  • Insert performance is not a priority.
  • Indexes are not feasible, or you need to reduce reliance on them.

Creating expression indexes on columnar tables is faster by orders of magnitude than on heap tables.

Caveat: You cannot perform UPDATE or DELETE operations on a columnar table.

OLTP vs. OLAP

Let’s get back to basics. In the database world, there are essentially two types of database operations.

Online Transaction Processing, or OLTP

Online transaction processing applications have high throughput and are insert- or update-intensive. These applications are used concurrently by hundreds of users. The key goals of OLTP applications are availability, speed, concurrency, and recoverability.

Online Analytical Processing, or OLAP

Online analytical processing applications enable users to analyze multidimensional data interactively from multiple perspectives. OLAP consists of three basic analytical operations:

  • Consolidation, or roll-up
  • Drill-down
  • Slicing and dicing

About Columnar Tables

As an RDBMS, PostgreSQL is geared more toward OLTP than OLAP operations. It handles data manipulation, such as inserting, updating, and deleting, very well. While PostgreSQL is capable of performing OLAP workloads, it is not always as efficient for those use cases.

The primary reason is a common characteristic among most relational database systems: data is stored and processed as individual records, also known as row-wise processing.

Suppose you are performing a deep-dive analysis involving a table with 50 columns. Further, suppose your query only needs one column. In a row-oriented table, the query may still need to read data from all 50 columns per row to access that single column. If the query processes 100 million rows, that is not trivial.

Now consider reorienting the table as column-wise. In this case, the query only reads the needed column instead of all 50. The result is a lighter query that requires less I/O and less processing power, while achieving the same result in significantly less time.

Introducing the Citus Columnar Extension

As a general-purpose database management system, PostgreSQL can be reconfigured for different purposes. It is possible to enhance PostgreSQL with columnar tables by using the Citus columnar table extension.

The Citus columnar extension is one part of a larger set of capabilities. When fully implemented, Citus can create a scalable, distributed Postgres database system.

The Citus columnar extension feature set includes:

Highly Compressed Tables

  • Reduces storage requirements.
  • Reduces the I/O needed to scan the table.

Projection Pushdown

Only the columns included in the SELECT statement are returned, further reducing I/O.

Chunk Group Filtering

Allows queries to skip over whole groups of unrelated data without processing them.

The complete Citus feature set, except for the columnar storage component, is not covered in this blog. It includes:

  • Distributed tables
  • Reference tables
  • A distributed query engine that routes and parallelizes SELECT, DML, and other operations
  • Columnar storage, which:
    • Compresses data
    • Speeds up scans
    • Supports fast projections
  • Querying from any node

Getting It

This is a bad-news, good-news, excellent-news situation.

Bad news: At the time of writing this blog, the only publicly available packages on the Postgres repositories that I could find were the cstore foreign data wrappers, which cover only version 12 and older versions of Postgres.

Good news: The Citus extension is available on the Citus Data download page.

Excellent news: As part of writing this blog, custom DEB and RPM packages were created for PostgreSQL 15 for CentOS 7, CentOS 8, Ubuntu 18.04, and Ubuntu 20.04. These are available for download from GitHub.

First-Time Installation

Assuming you opted for the easy way and are installing the packages made for this blog, follow these steps.

First, go to the PostgreSQL download page and configure your packaging system for your Linux distribution.

Then download the appropriate custom-made columnar package for your version of Linux:

Install the package. All dependent packages, including the Postgres server, will be automatically installed onto your host. Pay attention to CentOS 7, which also requires the epel-release repository.

Initialize the Postgres data cluster and configure it appropriately for your Linux distribution so you can log in to Postgres.

A Working Example of Columnar Tables

What follows are the results of my analysis. Of course, there is always more that can be said, but this should give you an overview of the possibilities of this extension.

Installing the extension into your database is standard fare:

Here are two tables, one using the heap access method and one using the columnar access method. These will be used for the initial investigation. Notice that it took less time to create the regular heap table than the columnar table.

Table t1 uses the heap access method:

Table t2 uses the columnar access method:

The number of records is 10 million:

The columnar compression does a great job of shrinking the table:

SQL Statements, Preliminary

Let’s begin by comparing basic administration and SELECT statements for heap and columnar tables.

Examining the results, much of the performance indicates that columnar tables either perform similarly to heap tables or, in many cases, take more time executing the same operations.

Using a psql session, the following SQL statements were executed and examined for performance differences.

Creating Tables

Adding Five Million Records

Creating Indexes

SQL Statement Query Plans

Part One: 10 Million Records, Two-Column Tables

Using the table definitions above, the following metrics were generated with the runtime parameter max_parallel_workers_per_gather = 4.

For these two tables, there is no clear performance benefit from a columnar table over a regular heap table.

Part Two: Five Million Records, 100-Column Tables

To get a better idea of performance differences, a second set of tables was created at a greater scale. This time, while the number of records was halved, the number of columns was increased from two to 100.

Even though most of the columns are copies of one another, the compression achieved by the columnar table is remarkable. The default size was reduced by a factor of 752x.

Examining the indexes shows them to be the same size:

Unlike the first set of query plans, these clearly demonstrate a significant performance improvement.

Changing max_parallel_workers_per_gather did not appear to change performance much.

Working with Indexes

General observations: B-tree indexes are similar in size between heap and columnar tables. Overall, their performance also appears similar, although the columnar table’s index is somewhat slower. This is likely due to the extra processing required to decompress the table’s values.

Expression Indexes

Creating an expression index on a columnar table is significantly faster than creating one on a heap table:

Runtime Parameter: max_parallel_workers_per_gather

Index performance varies considerably on heap tables depending on the value set for max_parallel_workers_per_gather.

The following results highlight that, depending on the type of table used, this can become important when hardware resources and server costs are a consideration.

About Table Constraints and Access Methods

Overall, indexes, constraints, and access methods are still evolving, with many features still to be implemented.

Let’s start with a major issue: DELETE and UPDATE are not supported.

Creating indexes on a columnar table is restricted to B-tree indexes:

Creating foreign key constraints is not implemented:

Partitioning

Columnar tables can be used as partitions. A partitioned table can be made up of any combination of row and columnar partitions.

An excellent use case is insert-once, read-only table partitions. In this scenario, you can leverage both columnar compression and better-performing OLAP-style queries for very large tables.

Caveats

As of version 11.1, the Citus columnar extension has several limitations:

  • Creating a columnar table takes more time than creating a standard heap table.
  • Query performance is equal to or slower than heap tables when using smaller tables.
  • UPDATE and DELETE operations are not possible in a columnar table.
  • Indexes are limited to B-tree indexes as of version 10.0.
  • Logical replication is not supported.
  • Additional limitations are documented in the extension documentation.

Author’s note: Regarding the custom packages created for this blog, the entire Citus suite is designed to enhance Postgres as a distributed database solution. Only the columnar table component was covered in this blog. The complete feature set is part of this package install and should work, but it has not been tested here.

Percona and PostgreSQL work better together. Try Percona Distribution for PostgreSQL today.

Conclusion

Despite its current limitations, there are use cases where this extension can make a meaningful difference. It also speaks well of the extension’s future as the team continues development and adds new capabilities. Watch for updates on its GitHub source repository.

0 0 votes
Article Rating
Subscribe
Notify of
guest

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Artur
Artur
3 years ago

Due to the GDPR, I had to resign from CITUS. DELETE could be a killer feature.

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved