EmergencyEMERGENCY? Get 24/7 Help Now!

Nested Data Structures in ClickHouse

 | August 30, 2017 |  Posted In: Column Store Database, MySQL, Percona Monitoring and Management

PREVIOUS POST
NEXT POST

Nested Data StructuresIn this blog post, we’ll look at nested data structures in ClickHouse and how this can be used with PMM to look at queries.

Nested structures are not common in Relational Database Management Systems. Usually, it’s just flat tables. Sometimes it would be convenient to store unstructured information in structured databases.

We are working to adapt ClickHouse as a long term storage for Percona Monitoring and Management (PMM), and particularly to store detailed information about queries. One of the problems we are trying to solve is to count the different errors that cause a particular query to fail.

For example, for date 2017-08-17 the query:

was executed 1000 times. 25 times it failed with error code “1212”, and eight times it failed with error code “1250”. Of course, the traditional way to store this in relational data would be to have a table "Date, QueryID, ErrorCode, ErrorCnt" and then perform a JOIN to this table. Unfortunately, columnar databases don’t perform well with multiple joins, and often the recommendation is to have de-normalized tables.

We can create a column for each possible ErrorCode, but this is not an optimal solution. There could be thousands of them, and most of the time they would be empty.

In this case, ClickHouse proposes Nested data structures. For our case, these can be defined as:

This solution has obvious questions: How do we insert data into this table? How do we extract it?

Let’s start with INSERT. Insert can look like:

which means that the inserted query during 2017-08-17 gave error 1220 five times, error 1230 six times and error 1212 two times.

Now, during a different date, it might produce different errors:

Let’s take a look at ways to SELECT data. A very basic SELECT:

If we want to use a more familiar tabular output, we can use the ARRAY JOIN extension:

However, usually we want to see the aggregation over multiple periods, which can be done with traditional aggregation functions:

If we want to get really creative and return only one row per QueryID, we can do that as well:

Conclusion

ClickHouse provides flexible ways to store data in a less structured manner and variety of functions to extract and aggregate it – despite being a columnar database.

Happy data warehousing!

PREVIOUS POST
NEXT POST
Vadim Tkachenko

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products. He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition.

7 Comments

  • Been storing this type of data in Elasticsearch for years…..well 3-4 years. It’s fast, flexible, and does not require relational setups. Seriously worth a look, especially with tge machine learning plugin, which can pick up anomalies for you and warn you when they happen. Add the graphing capability of Kibana, and you are set.

    Just my viewpoint 🙂

    • For relational data you will find ClickHouse significantly faster. Here are some third party benchmark which compares ClickHouse and Elastic for some SQL queries on the same hardware http://tech.marksblogg.com/benchmarks.html

  • How do you increment ErrorCnt? Every time when a query fails you want to increment the appropriate ErrorCnt, right? With a nested structure how do you do that?

Leave a Reply