Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark

March 17, 2017
Author
Alexander Rubin
Share this Post:

MariaDB no longer meeting your needs?

Migrate to Percona software for MySQL – an open source, production-ready, and enterprise-grade MySQL alternative.

Learn More

This blog shares some column store database benchmark results and compares the query performance of MariaDB ColumnStore v.1.0.7 (based on InfiniDB), ClickHouse, and Apache Spark.

I’ve already written about ClickHouse.

The purpose of the benchmark is to evaluate how these three solutions perform on a single large server with many CPU cores and large amounts of RAM. All systems are massively parallel (MPP) databases, designed to utilize many cores for SELECT queries.

Datasets

  1. Wikipedia page counts (~26 billion rows for 2008)
  2. Query analytics data from Percona Monitoring and Management
  3. Online shop orders

This post focuses on Wikipedia page counts. Other datasets will be covered separately.

Databases, Versions, and Storage Engines

  • MariaDB ColumnStore v1.0.7 (ColumnStore engine)
  • Yandex ClickHouse v1.1.54164 (MergeTree engine)
  • Apache Spark v2.1.0 (Parquet and ORC)

All tests were run on a single server.

Hardware

  • CPU: 2 physical, 32 cores (64 threads)
  • RAM: 256 GB
  • Disk: Samsung SSD 960 PRO 1TB (NVMe)

Data Sizes

Dataset ColumnStore ClickHouse MySQL Spark (Parquet) Spark (ORC)
Wikistat 374.24 GB 211.3 GB n/a (>2 TB) 395 GB 273 GB
Query metrics 61.23 GB 28.35 GB 520 GB
Store Orders 9.3 GB 4.01 GB 46.55 GB

Query Performance

Warm Cache

Query Spark ClickHouse ColumnStore
count(*) 5.37 2.14 30.77
group by month 205.75 16.36 259.09
top 100 pages 750.35 171.22 1640.7

Cold Cache

Query Spark ClickHouse ColumnStore
count(*) 21.93 8.01 139.01
group by month 217.88 16.65 420.77
top 100 pages 887.43 182.56 1703.19

Partitioning and Primary Keys

ClickHouse uses primary keys to scan only relevant data chunks:

ColumnStore requires rewriting queries using date ranges for partition elimination:

Working with Large Datasets

Large GROUP BY operations require significant memory due to hash table usage:

ColumnStore does not support disk spill for GROUP BY, so memory tuning may be required:

SQL Support

Feature Spark ClickHouse ColumnStore
INSERT Yes Yes Yes
UPDATE No No Yes
DELETE No No Yes
Window functions Yes No Yes

Comparison

Solution Advantages Disadvantages
MariaDB ColumnStore
  • MySQL compatibility
  • Supports UPDATE/DELETE
  • Window functions
  • Slower SELECT queries
  • No GROUP BY disk spill
  • No MySQL replication
ClickHouse
  • Fastest performance
  • Better compression
  • Disk-based GROUP BY
  • No MySQL protocol
Apache Spark
  • Flexible storage
  • ML integration
  • Slower queries
  • No MySQL protocol

Conclusion

ClickHouse is the clear winner in this benchmark, showing significantly better performance and compression.

However, ColumnStore provides a MySQL-compatible interface, making it a strong option for migrations from MySQL.

Table Structure

Queries

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Far
Enough.

Said no pioneer ever.
MySQL, PostgreSQL, InnoDB, MariaDB, MongoDB and Kubernetes are trademarks for their respective owners.
© 2026 Percona All Rights Reserved