ClickHouse and MySQL – Better Together

clickhouse and mysqlIt’s been a while since I wrote about ClickHouse, there are a lot of new features that are worth mentioning, and for this, I recommend to follow the Altinity Blog but today I want to look at the improved integration of ClickHouse and MySQL.

There are two features:

  1. Using MySQL protocol and MySQL client to connect to ClickHouse
  2. Use MySQL tables to select and join with ClickHouse tables

Using MySQL Protocol

By enabling MySQL protocol in ClickHouse server, you will allow for the MySQL command line tool or applications that typically connect to MySQL to connect to ClickHouse and execute queries.

With this, keep in mind:

  • It does not automatically allow you to use MySQL SQL syntax, you still need to write queries in ClickHouse-dialect.
  • The same issue with datatypes. ClickHouse uses Int32/Int64 instead of INT/BIGINT and String instead of VARCHAR, so most likely management tools like MySQL Workbench won’t work.

How do you enable MySQL protocol? Actually it is easy, you add to config:

And after a ClickHouse restart you can connect with mysql command line:

And after that, we can work as we usually work from MySQL command line. Some examples with ontime databases:

Select count from ontime tables (I loaded data only for the Jan-2017 to Nov-2019 timeframe)

And now we can run some analytical queries, e.g., what ten airports had the most departures for the given time frame:

And what the most popular routes are:

Well, I can do this all day long, but let me just show the last query which highlights the difference in syntax.

Assume we want to get the data from the previous query but per year (ten most popular routes per year). The most elegant solution to get this is to use WINDOW functions, which are supported in MySQL 8 but not yet supported in ClickHouse. However, ClickHouse has a workaround for this one special case. We can get the result we need using the LIMIT BY extension (not to be confused with LIMIT).

For ClickHouse we write:

JOIN with MySQL Tables.

The feature to get data from MySQL using dictionaries in ClickHouse was implemented long ago, but it was not convenient, leading to using non-standard SQL extensions. Since then, two new features were implemented in ClickHouse:

  • Support of JOIN syntax
  • Support of external tables

This allows us to run more familiar queries with the mix of MySQL and ClickHouse tables. Before we jump to an example, let’s review why this is needed.

The typical data analytics design assumes there are big fact tables with references to dimension tables (aka dictionaries if using ClickHouse lexicon). For a detailed example, see Star Schema.

Dimension tables may be changed/updated more frequently, and ClickHouse does not quite like it, as it operates in more append-only like mode. So we may want to store Dimension tables in OLTP databases, like MySQL. This, by the way, also helps with GDPR policies for personalized data removal. In case you store all personal data in a denormalized fact table, it becomes very complicated to remove it. In contrast, if the personal data is stored in a dimension table, it is quite easy to anonymize it.

So let’s go back to our example.

Ontime table (fact tables) has a reference field AirlineID, which is not very helpful, as we want to see the airline name in our report.

The data for the AirlineID – AirlineName lookup (dimension) table, can be found on the BTS website.

It looks like this:

We load this data into a MySQL table:

Now, how do we connect this table to ClickHouse? For this, in ClickHouse we create a table with “MySQL table engine”:

Clickhouse -> (and we can connect to it with mysql client tool, see part one).

Now we can execute a query which joins ClickHouse fact table, ontime, and MySQL dimension table airlines, to report the top ten airlines with the most departures:

To summarize what just happened: We executed a query in ClickHouse using normal human-readable JOIN syntax (compare to queries in ClickHouse in a General Analytical Workload (Based on a Star Schema Benchmark) and join tables stored in different servers: MySQL and ClickHouse.

If I am to illustrate:

 

Share this post

Comments (5)

  • Timur Solodovnikov Reply

    Is it possible to use MySQL protocol for DML(bulk inserts)?

    February 3, 2020 at 1:53 pm
  • Adnan Reply

    Thanks for the informative article, i already got hand on Clickhouse with mysql, clickhouse also provide database engine=MySQL so you can have full database from MySQL to Clickhouse. its good that clickhouse keep releasing better updates every time.

    February 3, 2020 at 3:18 pm
  • Kursat Sahin Reply

    It gives an nNo module named ‘MySQLdb’ in SuperSet. What can be the reason ?

    March 30, 2020 at 3:13 pm
  • vipulsharda Reply

    Can I access the clickhouse using mysql from another machines as well?

    July 1, 2020 at 2:33 am
  • vadimtk Reply

    vipulsharda,

    You absolutely can access from remote machine using mysql or an application to works over mysql protocol

    July 1, 2020 at 8:24 am

Leave a Reply