Massive Parallel Log Processing with ClickHouse

In this blog, I’ll look at how to use ClickHouse for parallel log processing.

Percona is seen primarily for our expertise in MySQL and MongoDB (at this time), but neither is quite suitable to perform heavy analytical workloads. There is a need to analyze data sets, and a very popular task is crunching log files. Below I’ll show how ClickHouse can be used to efficiently perform this task. ClickHouse is attractive because it has multi-core parallel query processing, and it can even execute a single query using multiple CPUs in the background.

I am going to check how ClickHouse utilizes multiple CPU cores and threads. I will use a server with two sockets, equipped with “Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz” in each. That gives a total of 28 CPU cores / 56 CPU threads.

To analyze workload, I’ll use an Apache log file from one of Percona’s servers. The log has 1.56 billion rows, and uncompressed it takes 274G. When inserted into ClickHouse, the table on disk takes 9G.

How do we insert the data into ClickHouse? There is a lot of scripts to transform Apache log format to CSV, which ClickHouse can accept. As for the base, I used this one:

https://gist.github.com/sepehr/fff4d777509fa7834531

and my modification you can find here:

https://github.com/vadimtk/clickhouse-misc/blob/master/apachelog-to-csv.pl

The ClickHouse table definition:

To test how ClickHouse scales on multiple CPU cores/threads, I will execute the same query by allocating from 1 to 56 CPU threads for ClickHouse processes. This can be done as:

where $i is (N CPUs-1).

We must also take into account that not all queries are equal. Some are easier to execute in parallel than others. So I will test three different queries. In the end, we can’t get around Amdahl’s Law!

The first query should be easy to execute in parallel:

Speedup:

CPUsTime, secSpeedup to 1 CPU
1823.6461
2413.8321.990291
3274.5483.000007
4205.9613.999039
5164.9974.991885
6137.4555.992114
7118.0796.975381
8103.0157.995399
992.018.951701
1082.8539.941052
1175.33410.93326
1269.2311.89724
1363.84812.90011
1459.38813.8689
1555.43314.85841
1652.15815.79136
1749.05416.7906
1846.33117.77743
1943.98518.72561
2041.79519.70681
2139.76320.71388
2238.03121.65723
2336.34722.66063
2434.91723.58868
2533.62624.49432
2632.4225.40549
2731.2126.39045
2830.13527.33187
2929.94727.50346
3029.70927.72379
3129.28328.1271
3228.97928.42217
3328.80728.59187
3428.47728.9232
3528.14629.26334
3627.92129.49916
3727.61329.8282
3827.36630.09742
3927.0630.43777
4026.81730.71358
4126.64430.913
4226.39431.2058
4326.21531.41888
4425.99431.686
4525.76231.97135
4625.55432.23159
4725.24332.62869
4825.10232.81197
4924.94633.01716
5024.66833.38925
5124.53733.56751
5224.27833.92561
5324.03534.26861
5423.83934.55036
5523.73434.70321
5623.58734.91949

 

It’s much more interesting to chart these results:

From the chart, we can see that the query scales linearly up to 28 cores. After that, it continues to scale up to 56 threads (but with a lesser slope). I think this is related to the CPU architecture (remember we have 28 physical cores and 56 CPU “threads”). Let’s look at the results again. With one available CPU, the query took 823.6 sec to execute. With all available CPUs, it took 23.6 sec. So the total speedup is 34.9 times.

But let’s consider a query that allows a lesser degree of parallelism. For example, this one:

This query uses aggregation that counts unique URIs, which I am sure limits the counting process to a single shared structure. So some part of the execution is limited to a single process. I won’t show the full results for all 1 to 56 CPUs, but for one CPU the execution time is 177.715 sec, and for 56 CPUs the execution time is 11.564 sec. The total speedup is 15.4 times.

The speedup chart looks like this:

As we suspected, this query allows less parallelism. What about even heavier queries? Let’s consider this one:

In that query, we build a derived table (to resolve the subquery) and I expect it will limit the parallelism even further. And it does: with one CPU the query takes 183.063 sec to execute. With 56 CPUs it takes 28.572 sec. So the speedup is only 6.4 times.

The chart is:

Conclusions

ClickHouse can capably utilize multiple CPU cores available on the server, and query execution is not limited by a single CPU (like in MySQL). The degree of parallelism is defined by the complexity of the query, and in the best case scenario, we see linear scalability with the number of CPU cores. For the scaling on multiple servers you can see my previous post:

https://www.percona.com/blog/2017/06/22/clickhouse-general-analytical-workload-based-star-schema-benchmark/

However, if query execution is serial, it limits the speedup (as described in Amdahl’s Law).

One example is a 1.5 billion record Apache log, and we can see that ClickHouse can execute complex analytical queries within tens of seconds.

Share this post

Leave a Reply