It’s been a while since I wrote about ClickHouse, there are a lot of new features that are worth mentioning, and for this, I recommend to follow the Altinity Blog but today I want to look at the improved integration of ClickHouse and MySQL.
There are two features:
- Using MySQL protocol and MySQL client to connect to ClickHouse
- Use MySQL tables to select and join with ClickHouse tables
Using MySQL Protocol
By enabling MySQL protocol in ClickHouse server, you will allow for the MySQL command line tool or applications that typically connect to MySQL to connect to ClickHouse and execute queries.
With this, keep in mind:
- It does not automatically allow you to use MySQL SQL syntax, you still need to write queries in ClickHouse-dialect.
- The same issue with datatypes. ClickHouse uses Int32/Int64 instead of INT/BIGINT and String instead of VARCHAR, so most likely management tools like MySQL Workbench won’t work.
How do you enable MySQL protocol? Actually it is easy, you add to config:
1 2 3 4 |
/etc/clickhouse-server/config.xml: ... <mysql_port>9001</mysql_port> ... |
And after a ClickHouse restart you can connect with mysql command line:
1 |
mysql -h127.0.0.1 -P9001 -udefault -padmin |
And after that, we can work as we usually work from MySQL command line. Some examples with ontime databases:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
mysql> use default; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> show tables; +--------+ | name | +--------+ | ontime | +--------+ 1 row in set (0.00 sec) Read 1 rows, 31.00 B in 0.000 sec., 2038 rows/sec., 61.71 KiB/sec. |
Select count from ontime tables (I loaded data only for the Jan-2017 to Nov-2019 timeframe)
1 2 3 4 5 6 7 8 |
select count(*) from ontime; +----------+ | count() | +----------+ | 19684341 | +----------+ 1 row in set (0.00 sec) Read 1 rows, 4.01 KiB in 0.000 sec., 2150 rows/sec., 8.42 MiB/sec. |
And now we can run some analytical queries, e.g., what ten airports had the most departures for the given time frame:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
mysql> SELECT Origin,count(*) cnt FROM ontime GROUP BY Origin ORDER BY cnt DESC LIMIT 10; +--------+---------+ | Origin | cnt | +--------+---------+ | ATL | 1117414 | | ORD | 911572 | | DFW | 739334 | | DEN | 689847 | | LAX | 637070 | | CLT | 562160 | | SFO | 507377 | | PHX | 488398 | | IAH | 467543 | | LAS | 463627 | +--------+---------+ 10 rows in set (0.03 sec) Read 19684341 rows, 93.86 MiB in 0.027 sec., 726992466 rows/sec., 3.39 GiB/sec. |
And what the most popular routes are:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
mysql> SELECT Origin,Dest,count(*) cnt FROM ontime GROUP BY Origin,Dest ORDER BY cnt DESC LIMIT 10; +--------+-------+-------+ | Origin | Dest | cnt | +--------+-------+-------+ | SFO | LAX | 44419 | | LAX | SFO | 44204 | | LGA | ORD | 39276 | | ORD | LGA | 39141 | | LAX | JFK | 37309 | | JFK | LAX | 37256 | | LAX | LAS | 33780 | | LAS | LAX | 33757 | | OGG | HNL | 29619 | | HNL | OGG | 29616 | +--------+-------+-------+ 10 rows in set (0.05 sec) Read 19684341 rows, 187.73 MiB in 0.053 sec., 372125496 rows/sec., 3.47 GiB/sec. |
Well, I can do this all day long, but let me just show the last query which highlights the difference in syntax.
Assume we want to get the data from the previous query but per year (ten most popular routes per year). The most elegant solution to get this is to use WINDOW functions, which are supported in MySQL 8 but not yet supported in ClickHouse. However, ClickHouse has a workaround for this one special case. We can get the result we need using the LIMIT BY extension (not to be confused with LIMIT).
For ClickHouse we write:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
mysql> SELECT Year,Origin,Dest,count(*) cnt FROM ontime GROUP BY Year,Origin,Dest ORDER BY Year asc,cnt DESC LIMIT 5 BY Year; +------+--------+-------+-------+ | Year | Origin | Dest | cnt | +------+--------+-------+-------+ | 2017 | SFO | LAX | 15786 | | 2017 | LAX | SFO | 15564 | | 2017 | JFK | LAX | 12758 | | 2017 | LAX | JFK | 12752 | | 2017 | LAX | LAS | 11032 | | 2018 | ORD | LGA | 15281 | | 2018 | LGA | ORD | 15279 | | 2018 | SFO | LAX | 15158 | | 2018 | LAX | SFO | 15143 | | 2018 | LAX | JFK | 12791 | | 2019 | ORD | LGA | 13696 | | 2019 | LGA | ORD | 13691 | | 2019 | |