PostgreSQL: Access ClickHouse, One of the Fastest Column DBMSs, With clickhousedb_fdw

PREVIOUS POST
NEXT POST

Database management systems are meant to house data but, occasionally, they may need to talk with another DBMS. For example, to access an external server which may be hosting a different DBMS. With heterogeneous environments becoming more and more common, a bridge between the servers is established. We call this bridge a “Foreign Data Wrapper” (FDW). PostgreSQL completed its support of SQL/MED (SQL Management of External Data) with release 9.3 in 2013. A foreign data wrapper is a shared library that is loaded by a PostgreSQL server. It enables the creation of foreign tables in PostgreSQL that act as proxies for another data source.

When you query a foreign table, Postgres passes the request to the associated foreign data wrapper. The FDW creates the connection and retrieves or updates the data in the external data store. Since PostgreSQL planner is involved in all of this process as well, it may perform certain operations like aggregate or joins on the data when retrieved from the data source. I cover some of these later in this post.

ClickHouse Database

ClickHouse is an open source column based database management system which claims to be 100–1,000x faster than traditional approaches, capable of processing of more than a billion rows in less than a second.

clickhousedb_fdw

clickhousedb_fdw is an open source project – Apache licensed – from Percona. Here’s the link for GitHub project repository:

https://github.com/Percona-Lab/clickhousedb_fdw

It is an FDW for ClickHouse that allows you to SELECT from, and INSERT INTO, a ClickHouse database from within a PostgreSQL v11 server.

The FDW supports advanced features like aggregate pushdown and joins pushdown. These significantly improve performance by utilizing the remote server’s resources for these resource intensive operations.

If you would like to follow this post and try the FDW between Postgres and ClickHouse, you can download and set up the ontime dataset for ClickHouse.  After following the instructions, the test that you have the desired data. The ClickHouse client is a client CLI for the ClickHouse Database.

Prepare Data for ClickHouse

Now the data is ready in ClickHouse, the next step is to set up PostgreSQL. We need to create a ClickHouse foreign server, user mapping, and foreign tables.

Install the clickhousedb_fdw extension

There are manual ways to install the clickhousedb_fdw, but clickhousedb_fdw uses PostgreSQL’s coolest extension install feature. By just entering a SQL command you can use the extension:

Performance Features

PostgreSQL has improved foreign data wrapper processing by added the pushdown feature. Push down improves performance significantly, as the processing of data takes place earlier in the processing chain. Push down abilities include:

  • Operator and function Pushdown
  • Predicate Pushdown
  • Aggregate Pushdown
  • Join Pushdown

Operator and function Pushdown

The function and operators send to Clickhouse instead of calculating and filtering at the PostgreSQL end.

Predicate Pushdown

Instead of filtering the data at PostgreSQL, clickhousedb_fdw send the predicate to Clikhouse Database.

Aggregate Pushdown

Aggregate push down is a new feature of PostgreSQL FDW. There are currently very few foreign data wrappers that support aggregate push down – clickhousedb_fdw is one of them. Planner decides which aggregates are pushed down and which aren’t. Here is an example for both cases.

Join Pushdown

Again, this is a new feature in PostgreSQL FDW, and our clickhousedb_fdw also supports join push down. Here’s an example of that.

Percona’s support for PostgreSQL

As part of our commitment to being unbiased champions of the open source database eco-system, Percona offers support for PostgreSQL – you can read more about that here. And as you can see, as part of our support commitment, we’re now developing our own open source PostgreSQL projects such as the clickhousedb_fdw. Subscribe to the blog to be amongst the first to know of PostgreSQL and other open source projects from Percona.

As an author of the new clickhousdb_fdw – as well as other  FDWs – I’d be really happy to hear of your use cases and your experience of using this feature.


Photo by Hidde Rensink on Unsplash

PREVIOUS POST
NEXT POST

Share this post

Comments (14)

  • Andrey Reply

    it would be great if someone made a similar extension for MySQL

    April 5, 2019 at 10:19 am
    • Steven Lee Reply

      please google Using ClickHouse like MySQL by ProxySQL

      April 21, 2019 at 8:48 pm
  • faibistes Reply

    What functions are supported in function pushdown? IIUC, it means that it translates postgresql functions and operators to clickhouse equivalents, if there is one. Am I right?

    April 12, 2019 at 7:30 am
    • Ibrar Ahmed Reply

      I tried to translate operators, but not the function. If function exists in PostgreSQL and clickhouse then it will work.

      April 12, 2019 at 1:59 pm
      • faibistes Reply

        Does it mean that it will fail if it uses a non-translatable function in a join or aggregate, or that it will do it on the postgresql side?

        April 13, 2019 at 4:53 am
        • Ibrar Ahmed Reply

          In this current state it will fail, I am working on the translation functions. It will fail like this function does not exists in Clickhouse. You need to call a function which is available in Clickhouse, otherwise it will fail.

          April 13, 2019 at 1:16 pm
  • Magnus Falch Reply

    CREATE EXTENSION clickhousedb_fdw;
    Which distros/repos will this work with currently?

    April 21, 2019 at 3:24 pm
    • Ibrar Ahmed Reply

      I have tested with this environment only
      PostgreSQL 11
      OS: Ubuntu and Centos

      April 22, 2019 at 1:24 pm
      • Magnus Brun Falch Reply

        Could you please provide some more details?
        I’ve tried Ubuntu 18.10/19.04 and Centos7. Couldn’t get it to run out of the box.
        Tried setting up a dev environment but no files matching the name of the .so file in the documentation were produced by the make install step.
        I’m working on a more exhaustive dependency list for Centos7 that would make getting started a lot easier for someone who hasn’t tinkered with building extensions for PostgreSQL before.

        June 4, 2019 at 2:17 am
        • Ibrar Ahmed Reply

          Please create an issue with detail error message, I will look at that.

          June 4, 2019 at 2:20 am
  • Steven Lee Reply

    Hi,
    I translated it into Chinese, and want to publish on my blog, hope you can allow it.
    and I have successfully compiled, installed and tested it, it is great.
    and I wrote a compiling method in details, http://blog.rubypdf.com/2019/04/22/how-to-compile-and-install-clickhousedb_fdw/

    April 21, 2019 at 8:45 pm
  • xiatian Reply

    why my test cannot push down,the follwing is my case:
    PostgreSQL 11
    OS: Centos

    postgres# explain (analyze,verbose) select count(year) from ontime;
    QUERY PLAN

    Aggregate (cost=0.00..0.01 rows=1 width=8) (actual time=31441.737..31441.737 rows=1 loops=1)
    Output: count(year)
    -> Foreign Scan on public.ontime (cost=0.00..0.00 rows=0 width=4) (actual time=13.313..30108.858 rows=17189046 loops=1)
    Output: year, quarter, month, dayofmonth, dayofweek, flightdate, uniquecarrier, airlineid, carrier, tailnum, flightnum,
    originairportid, originairportseqid, origincitymarketid, origin, origincityname, originstate, originstatefips, originstatename, o
    riginwac, destairportid, destairportseqid, destcitymarketid, dest, destcityname, deststate, deststatefips, deststatename, destwac
    , crsdeptime, deptime, depdelay, depdelayminutes, depdel15, departuredelaygroups, deptimeblk, taxiout, wheelsoff, wheelson, taxii
    n, crsarrtime, arrtime, arrdelay, arrdelayminutes, arrdel15, arrivaldelaygroups, arrtimeblk, cancelled, cancellationcode, diverte
    d, crselapsedtime, actualelapsedtime, airtime, flights, distance, distancegroup, carrierdelay, weatherdelay, nasdelay, securityde
    lay, lateaircraftdelay, firstdeptime, totaladdgtime, longestaddgtime, divairportlandings, divreacheddest, divactualelapsedtime, d
    ivarrdelay, divdistance, div1airport, div1airportid, div1airportseqid, div1wheelson, div1totalgtime, div1longestgtime, div1wheels
    off, div1tailnum, div2airport, div2airportid, div2airportseqid, div2wheelson, div2totalgtime, div2longestgtime, div2wheelsoff, di
    v2tailnum, div3airport, div3airportid, div3airportseqid, div3wheelson, div3totalgtime, div3longestgtime, div3wheelsoff, div3tailn
    um, div4airport, div4airportid, div4airportseqid, div4wheelson, div4totalgtime, div4longestgtime, div4wheelsoff, div4tailnum, div
    5airport, div5airportid, div5airportseqid, div5wheelson, div5totalgtime, div5longestgtime, div5wheelsoff, div5tailnum
    Remote SQL: SELECT year FROM “default”.ontime
    Planning Time: 0.111 ms
    Execution Time: 31441.813 ms
    (7 rows)

    Time: 31442.579 ms (00:31.443)

    May 24, 2019 at 2:42 am
    • Ibrar Ahmed Reply

      @xiatian, Please create a issue on the Github, I will respond on that.

      May 28, 2019 at 4:54 pm
  • Konstantin Reply

    is it possible to see datatype array(string) via clickhouse_fdw?

    May 31, 2019 at 8:30 am

Leave a Reply