Migrate to Percona software for MySQL – an open source, production-ready, and enterprise-grade MySQL alternative.
This blog shares some column store database benchmark results and compares the query performance of MariaDB ColumnStore v.1.0.7 (based on InfiniDB), ClickHouse, and Apache Spark.
I’ve already written about ClickHouse.
The purpose of the benchmark is to evaluate how these three solutions perform on a single large server with many CPU cores and large amounts of RAM. All systems are massively parallel (MPP) databases, designed to utilize many cores for SELECT queries.
This post focuses on Wikipedia page counts. Other datasets will be covered separately.
All tests were run on a single server.
| Dataset | ColumnStore | ClickHouse | MySQL | Spark (Parquet) | Spark (ORC) |
|---|---|---|---|---|---|
| Wikistat | 374.24 GB | 211.3 GB | n/a (>2 TB) | 395 GB | 273 GB |
| Query metrics | 61.23 GB | 28.35 GB | 520 GB | ||
| Store Orders | 9.3 GB | 4.01 GB | 46.55 GB |

| Query | Spark | ClickHouse | ColumnStore |
|---|---|---|---|
| count(*) | 5.37 | 2.14 | 30.77 |
| group by month | 205.75 | 16.36 | 259.09 |
| top 100 pages | 750.35 | 171.22 | 1640.7 |

| Query | Spark | ClickHouse | ColumnStore |
|---|---|---|---|
| count(*) | 21.93 | 8.01 | 139.01 |
| group by month | 217.88 | 16.65 | 420.77 |
| top 100 pages | 887.43 | 182.56 | 1703.19 |

ClickHouse uses primary keys to scan only relevant data chunks:
|
1 2 3 4 |
SELECT count(*), toMonth(date) AS mon FROM wikistat WHERE toYear(date)=2008 AND toMonth(date)=1 GROUP BY mon; |
ColumnStore requires rewriting queries using date ranges for partition elimination:
|
1 2 3 4 |
SELECT count(*), month(date) AS mon FROM wikistat WHERE date BETWEEN '2008-01-01' AND '2008-01-31' GROUP BY mon; |
Large GROUP BY operations require significant memory due to hash table usage:
|
1 2 3 4 5 |
SELECT path, count(*), sum(hits) FROM wikistat GROUP BY path ORDER BY sum_hits DESC LIMIT 100; |
ColumnStore does not support disk spill for GROUP BY, so memory tuning may be required:
|
1 |
<NumBlocksPct>10</NumBlocksPct> |
| Feature | Spark | ClickHouse | ColumnStore |
|---|---|---|---|
| INSERT | Yes | Yes | Yes |
| UPDATE | No | No | Yes |
| DELETE | No | No | Yes |
| Window functions | Yes | No | Yes |
| Solution | Advantages | Disadvantages |
|---|---|---|
| MariaDB ColumnStore |
|
|
| ClickHouse |
|
|
| Apache Spark |
|
|
ClickHouse is the clear winner in this benchmark, showing significantly better performance and compression.
However, ColumnStore provides a MySQL-compatible interface, making it a strong option for migrations from MySQL.
|
1 2 3 4 5 6 7 8 9 |
CREATE TABLE wikistat ( date date, time datetime, project varchar(20), subproject varchar(2), path varchar(1024), hits bigint, size bigint ) ENGINE=Columnstore; |
|
1 |
select count(*) from wikistat; |
|
1 2 3 4 |
select count(*), month(date) as mon from wikistat where date between '2008-01-01' and '2008-10-31' group by mon; |
|
1 2 3 4 5 |
SELECT path, count(*), sum(hits) FROM wikistat GROUP BY path ORDER BY sum_hits DESC LIMIT 100; |
Resources
RELATED POSTS