ClickHouse and ColumnStore in the Star Schema Benchmark

ClickHouse and ColumnStoreThere were two new releases in the OpenSource Analytical Databases space, which made me want to evaluate how they perform in the Star Schema Benchmark.

I covered Star Schema Benchmarks a few times before:

What are the new releases:

MariaDB 10.5 comes with built-in ColumnStore and an interesting capability to replicate data from InnoDB to ColumnStore in real-time within the same instance, and they named it “hybrid transactional and analytical processing”.

For ClickHouse, there was not a single release, but continuous improvements for the last three years to support JOIN queries, support of updating data, and improved integration with MySQL.

For the ClickHouse testing, I will use the 20.4.6 version.


For the benchmark I will Star Schema Benchmark with scale factor 2500, which in sizes is:

  • 15 bln rows in lineorder table
  • 1.7T raw data for lineorder table

Star Schema Benchmark prescribes to execute 13 queries and I will measure execution time for each query.

Example of table schemas and queries you can find in my results repo:

Side note: ClickHouse documentation provides examples for Star Schema queries, however, they took a denormalization approach, which is valid, but does not allow to test how original queries with 3-way JOINs would perform, which is my goal in this experiment.

Load Data Results

The first result I want to see is how long it would take to load 15bln rows into the database and what the final table size will be.


time  clickhouse-client –query “INSERT INTO lineorder FORMAT CSV” < lineorder.tbl

real    283m44.217s
user    557m13.079s
sys     17m28.207s


time cpimport -m1 -s”,”  -E ‘”‘ sbtest lineorder lineorder.tbl

real    800m11.175s
user    767m50.866s
sys     22m22.703s



root@nodesm:/data/sdd/clickhouse/data# du -sh default
355G    default


root@nodesm:/data/sdd/cs# du -sh data1
625G    data1

ClickHouse and ColumnStore benchmark


Now I want to compare query execution times for ColumnStore and ClickHouse. I will use queries as close to the original as possible, but for ClickHouse I still need to adjust queries in order to be able to execute them.

ClickHouse and ColumnStore

We can see that ClickHouse outperforms, sometimes by a huge margin, in all queries but one – Query 3.1. Also, query 4.1 is particularly hard for ClickHouse.

Note on Queries in ClickHouse

I mentioned that to run queries in ClickHouse they still need modifications, so let’s take a look.

If three years ago (see the blog I mentioned earlier, ClickHouse in a General Analytical Workload – Based on a Star Schema Benchmark), a 3-way JOIN query like:

Had to be written as:

Now, the multi-table queries can be written as:

Pretty much a query with standard JOIN syntax.

Word of Caution:

Even though queries in ClickHouse look like normal JOIN queries, ClickHouse DOES NOT HAVE query optimizer. That means that tables will be joined in the order you have written them, and ClickHouse will not try to change table order for the optimal execution. This task is still on developers.

With human intervention, it is still possible to improve Query 3.1 for ClickHouse, if we write it in the following form (Thanks to Alexander Zaytsev from Altinity for the help!):

Execution time for query 3.1 is reduced from 7 min 33 sec to 5 min 44 sec.

It is possible that other queries also can be improved, but I want to keep them in the original form.

Update Queries

Update queries are not part of Star Schema Benchmark, but this question is quite regular.

Can we update data AND how well UPDATE queries are performing, because it can be a challenge for analytical databases?

For this, ClickHouse introduced a special UPDATE syntax, which looks like ALTER TABLE … UPDATE to highlight this is not a typical UPDATE statement.

Also it is worth noting that ALTER TABLE UPDATE in ClickHouse is executed in an asynchronous way, meaning that until it is finished you may see inconsistencies during the period while changes are applied.

So let’s execute the query:


Please note this query will update 15 bln rows, which is not a trivial task!

We can see how long it will take to apply the changes. For this, I will run the following query

in the loop to see when sum(LO_QUANTITY) is adjusted for new values.

The result: It took about 2 mins after ALTER TABLE query for SELECT query to reflect new values.


For the ColumnStore I will use the normal UPDATE syntax:

The query is performed in a synchronous way and it is transactional.

I waited 6 hours for the query to finish, and at that point, I had to kill it, as I could not wait any longer.

Again, please note the difference in executing updates:

ClickHouse – the update is run asynchronous and is NOT transactional, which likely will result in read queries returning inconsistent results while UPDATE is running.

ColumnStore – the update is synchronous and transactional. Likely this is why it takes that long to update 15bln rows.

Final words

So obviously, if it comes to preference, I would choose ClickHouse given query execution times, data load times, and new improved JOIN syntax.

The benefit of ColumnStore is that it comes in a single MySQL-like package (MariaDB-flavor) and there is an easy way to integrate data flow from InnoDB to ColumnStore.

ClickHouse makes improvements for integrations with MySQL (see ClickHouse and MySQL – Better Together, but there is still room for improvements!


For reference – the hardware I used is:

Architecture:        x86_64
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  2
Core(s) per socket:  12
Socket(s):           2
NUMA node(s):        2
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
and SATA SSD storage and 128GB of RAM

You can find queries and the results on our GitHub repository:

Share this post

Comments (12)

  • Alexey Milovidov Reply

    > Also it is worth noting that ALTER TABLE UPDATE in ClickHouse is executed in an asynchronous way, meaning that until it is finished you may see inconsistencies during the period while changes are applied.

    There is a setting mutations_sync. When set to 1, the query will wait for mutation to finish.

    July 27, 2020 at 3:33 pm
  • Artem Reply

    > Now, the multi-table queries can be written as

    You could leave the query unchanged. ClickHouse tries to rewrite commas with CROSS JOINs and CROSS JOINs with INNER ones itself. But it’s better to check if there’s no CROSS JOINs left. You could do it with such queries:

    set enable_debug_queries = 1;
    analyze ;

    You could find some CROSS JOINs not rewritten to INNER ones cause ClickHouse can’t change table order in JOIN for now and can not split some complex WHERE clauses to extract ON section for INNER JO