This blog post is the first in a series on Percona’s MongoDB 3.4 bundle release. This release includes Percona Server for MongoDB, Percona Monitoring and Management, and Percona Toolkit. In this post, we’ll look at the features included in the release.
We have a lot of great MongoDB content coming your way in the next few weeks. However, I wanted first to give you a quick list of the major things to be on the look out for.
This new bundled release ensures a robust, secure database that you can adapt to changing business requirements. It helps demonstrate how organizations can use MongoDB (and Percona Server for MongoDB), PMM and Percona Toolkit together to benefit from the cost savings and agility provided by free and proven open source software.
Percona Server for MongoDB 3.4 delivers all the latest MongoDB 3.4 Community Edition features, additional Enterprise features and a greater choice of storage engines.
Some of these new features include:
For all of these topics, you will see more blogs in the next few weeks that cover them in detail. Some people have asked what Percona’s MongoDB commitment looks like. Hopefully, this series of blogs help show how improving open source databases is central to the Percona vision. We are here to make the world better for developers, DBAs and other MongoDB users.
Percona announces the availability of Percona Toolkit 3.0.1 on February 20, 2017. This is the first general availability (GA) release in the 3.0 series with a focus on padding MongoDB tools:
Downloads are available from the Percona Software Repositories.
NOTE: If you are upgrading using Percona’s yum repositories, make sure that the you enable the basearch repo, because Percona Toolkit 3.0 is not available in the noarch repo.
Percona Toolkit is a collection of advanced command-line tools that perform a variety of MySQL and MongoDB server and system tasks too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.
This release includes changes from the previous 3.0.0 RC and the following additional changes:
Percona announces the release of Percona Monitoring and Management 1.1.1 on February 20, 2017. This is the first general availability (GA) release in the PMM 1.1 series with a focus on providing alternative deployment options for PMM Server:
NOTE: The AMIs and VirtualBox images above are still experimental. For production, it is recommended to run Docker images.
There are no changes compared to previous 1.1.0 Beta release, except small fixes for MongoDB metrics dashboards.
A live demo of PMM is available at pmmdemo.percona.com.
We welcome your feedback and questions on our PMM forum.
About Percona Monitoring and Management
Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.
PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.
Percona announces the release of Percona Server for MongoDB 3.4.2-1.2 on February 20, 2017. It is the first general availability (GA) release in the 3.4 series. Download the latest version from the Percona web site or the Percona Software Repositories.
Percona Server for MongoDB is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:
Percona Server for MongoDB requires no changes to MongoDB applications or code.
Percona Server for MongoDB 3.4.2-1.2 release notes are available in the official documentation.
Recently I worked on a ticket where a customer performed a point-in-time recovery PITR using a large set of binary logs. Normally we handle this by applying the last backup, then re-applying all binary logs created since the last backup. In the middle of the procedure, their new server crashed. We identified the binary log position and tried to restart the PITR from there. However, using the option --start-position, the restore failed with the error “The BINLOG statement of type Table_map was not preceded by a format description BINLOG statement.” This is a known bug and is reported as MySQL Bug #72804: “BINLOG statement can no longer be used to apply Query events.”
I created a small test to demonstrate a workaround that we implemented (and worked).
First, I ran a large import process that created several binary logs. I used a small value in max_binlog_size and tested using the database “employees” (a standard database used for testing).Then I dropped the database.mysql> set sql_log_bin=0; Query OK, 0 rows affected (0.33 sec) mysql> drop database employees; Query OK, 8 rows affected (1.25 sec)
To demonstrate the recovery process, I joined all the binary log files into one SQL file and started an import.sveta@Thinkie:~/build/ps-5.7/mysql-test$ ../bin/mysqlbinlog var/mysqld.1/data/master.000001 var/mysqld.1/data/master.000002 var/mysqld.1/data/master.000003 var/mysqld.1/data/master.000004 var/mysqld.1/data/master.000005 > binlogs.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$ GENERATE_ERROR.sh binlogs.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$ mysql < binlogs.sql ERROR 1064 (42000) at line 9020: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'inserting error
I intentionally generated a syntax error in the resulting file with the help of the GENERATE_ERROR.sh script (which just inserts a bogus SQL statement in a random row). The error message clearly showed where the import stopped: line 9020. I then created a file that cropped out the part that had already been imported (lines 1- 9020), and tried to import this new file.sveta@Thinkie:~/build/ps-5.7/mysql-test$ tail -n +9021 binlogs.sql >binlogs_rest.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$ mysql < binlogs_rest.sql ERROR 1609 (HY000) at line 134: The BINLOG statement of type `Table_map` was not preceded by a format description BINLOG statement.
Again, the import failed with exactly the same error as the customer. The reason for this error is that the BINLOG statement – which applies changes from the binary log – expects that the format description event gets run in the same session as the binary log import, but before it. The format description existed initially at the start of the import that failed at line 9020. The later import (from line 9021 on) doesn’t contain this format statement.
Fortunately, this format is the same for the same version! We can simply take it from the beginning the SQL log file (or the original binary file) and put into the file created after the crash without lines 1-9020.
With MySQL versions 5.6 and 5.7, this event is located in the first 11 rows:sveta@Thinkie:~/build/ps-5.7/mysql-test$ head -n 11 binlogs.sql | cat -n 1 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; 2 /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; 3 DELIMITER /*!*/; 4 # at 4 5 #170128 17:58:11 server id 1 end_log_pos 123 CRC32 0xccda074a Start: binlog v 4, server v 5.7.16-9-debug-log created 170128 17:58:11 at startup 6 ROLLBACK/*!*/; 7 BINLOG ' 8 g7GMWA8BAAAAdwAAAHsAAAAAAAQANS43LjE2LTktZGVidWctbG9nAAAAAAAAAAAAAAAAAAAAAAAA 9 AAAAAAAAAAAAAAAAAACDsYxYEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA 10 AUoH2sw= 11 '/*!*/;
The first six rows are meta information, and rows 6-11 are the format event itself. The only thing we need to export into our resulting file is these 11 lines:sveta@Thinkie:~/build/ps-5.7/mysql-test$ head -n 11 binlogs.sql > binlogs_rest_with_format.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$ cat binlogs_rest.sql >> binlogs_rest_with_format.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$ mysql < binlogs_rest_with_format.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$
After this, the import succeeded!
Late last year we started a poll on what backend programming languages are being used by the open source community. The three components of the backend – server, application, and database – are what makes a website or application work. Below are the results of Percona’s poll on backend programming languages in use by the community:Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
One of the best-known and earliest web service stacks is the LAMP stack, which spelled out refers to Linux, Apache, MySQL and PHP/Perl/Python. We can see that this early model is still popular when it comes to the backend.
PHP still remains a very common choice for a backend programming language, with Python moving up the list as well. Perl seems to be fading in popularity, despite being used a lot in the MySQL world.
Finally, Go is a language to look out for. Go is an open source programming language created by Google. It first appeared in 2009, and is already more popular than Perl or Ruby according to this poll.
Thanks to the community for participating in our poll. You can take our latest poll on what database engine are you using to store time series data here.
The MariaDB Corporation is organizing a conference called M17 on the East Coast in April. Some Perconians (Peter Zaitsev, Vadim Tchachenko, Sveta Smirnova, Alex Rubin, Colin Charles) decided to submit some interesting talks for that conference. Percona also offered to sponsor the conference.
As of this post, the talks haven’t been accepted, and we were politely told that we couldn’t sponsor.
Some of the proposed talks were:
At Percona, we think MariaDB Server is an important part of the MySQL ecosystem. This is why the Percona Live Open Source Database Conference 2017 in Santa Clara has a MariaDB mini-track, consisting of talks from various Percona and MariaDB experts:
If any of these topics look enticing, come to the conference. We have MariaDB at Percona Live.
To make your decision easier, we’ve created a special promo code that gets you $75 off a full conference pass! Just use MariaDB@PL17 at checkout.
In the meantime, we will continue to write and discuss MariaDB, and any other open source database technologies. The power of the open source community is the free exchange of ideas, healthy competition and open dialog within the community.
Here are some more past presentations that are also relevant:
With Oracle clearly entering the “open source high availability solutions” arena with the release of their brand new Group Replication solution, I believe it is time to review the quality of the first GA (production ready) release.
TL;DR: Having examined the technology, it is my conclusion that Oracle seems to have released the GA version of Group Replication too early. While the product is definitely “working prototype” quality, the release seems rushed and unfinished. I found a significant number of issues, and I would personally not recommend it for production use.
It is obvious that Oracle is trying hard to ship technology to compete with Percona XtraDB Cluster, which is probably why they rushed to claim Group Replication GA quality.
If you’re all set to follow along and test Group Replication yourself, simplify the initial setup by using this Docker image. We can review some of the issues you might face together.
For the record, I tested the version based on MySQL 5.7.17 release.No automatic provisioning
First off, the first thing you’ll find is there is NO way to automatically setup of a new node.
If you need to setup new node or recover an existing node from a fatal failure, you’ll need to manually provision the slave.
Of course, you can clone a slave using Percona XtraBackup or LVM by employing some self-developed scripts. But given the high availability nature of the product, one would expect Group Replication to automatically re-provision any failed node.Bug: stale reads on nodes
Please see this bug:
One line summary: while any secondary nodes are “catching up” to whatever happened on a first node (it takes time to apply changes on secondary nodes), reads on a secondary node could return stale data (as shown in the bug report).
This behavior brings us back to the traditional asynchronous replication slave behavior (i.e., Group Replication’s predecessor).
It also contradicts the Group Replication documentation, which states: “There is a built-in group membership service that keeps the view of the group consistent and available for all servers at any given point in time.” (See https://dev.mysql.com/doc/refman/5.7/en/group-replication.html.)
I might also mention here that Percona XtraDB Cluster prevents stale reads (see https://www.percona.com/doc/percona-xtradb-cluster/5.7/wsrep-system-index.html#wsrep_sync_wait).Bug: nodes become unusable after a big transaction, refusing to execute further transactions
There are two related bugs:
One line summary: after running a big transaction, any secondary nodes become unusable and refuse to perform any further transactions.Obscure error messages
It is not uncommon to see cryptic error messages while testing Group Replication. For example:mysql> commit; ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.
This is fairly useless and provides little help until I check the mysqld error log. The log provides a little bit more information:2017-02-09T02:05:36.996776Z 18 [ERROR] Plugin group_replication reported: '[GCS] Gcs_packet's payload is too big. Only the packets smaller than 2113929216 bytes can be compressed.' Discussion:
The items highlighted above might not seem too bad at first, and you could assume that your workload won’t be affected. However, stale reads and node dysfunctions basically prevent me from running a more comprehensive evaluation.My recommendation:
If you care about your data, then I recommend not using Group Replication in production. Currently, it looks like it might cause plenty of headaches, and it is easy to get inconsistent results.
For the moment, Group Replication appears an advanced – but broken – traditional MySQL asynchronous replication.
I understand Oracle’s dilemma. Usually people are hesitant to test a product that is not GA. So in order to get feedback from users, Oracle needs to push the product to GA. Oracle must absolutely solve the issues above during future QA cycles.
Our most recent release of Percona Server for MySQ (Percona Server for MySQL 5.7.17) comes with Group Replication plugins. Unfortunately, since this technology is very new, it requires some fairly complicated steps to setup and get running. To help with that process, I’ve prepare Docker images that simplify its setup procedures.
You can find the image here: https://hub.docker.com/r/perconalab/pgr-57/.
To start the first node (bootstrap the group):docker run -d -p 3306 --net=clusternet -e MYSQL_ROOT_PASSWORD=passw0rd -e CLUSTER_NAME=cluster1 perconalab/pgr-57
To add nodes into the group after:docker run -d -p 3306 --net=clusternet -e MYSQL_ROOT_PASSWORD=passw0rd -e CLUSTER_NAME=cluster1 -e CLUSTER_JOIN=CONTAINER_ID_FROM_THE_FIRST_STEP perconalab/pgr-57
You can also get a full script that starts “N” number of nodes, here: https://github.com/Percona-Lab/percona-docker/blob/master/pgr-57/start_node.sh
The Percona Live Open Source Database Conference 2017 will once again host crash courses on MySQL and MongoDB. Read below to get an outstanding discount on either the MySQL or MongoDB crash course (or both).
The database community constantly tells us how hard it is to find someone with MySQL and MongoDB DBA skills who can help with the day-to-day management of their databases. This is especially difficult when companies don’t have a full-time requirement for a DBA. Developers, system administrators and IT staff spend too much time trying to solve basic database problems that keep them from doing their other job duties. Eventually, the little problems or performance inefficiencies that start to pile up lead to big problems.
In answer to this growing need, Percona Live is once again hosting crash courses for developers, systems administrators and other technical resources. A crash course is a one-day training session on either MySQL 101 or MongoDB 101.
Don’t let the name fool you: these courses are led by Percona database experts who will show you the fundamentals of MySQL or MongoDB tools and techniques.
And it’s not just for DBAs: developers are encouraged to attend to hone their database skills.
Below is a list of the topics covered in each course this year:MySQL 101 Topics MongoDB 101 Topics
Attendees will return ready to quickly and correctly take care of the day-to-day and week-to-week management of your MySQL or MongoDB environment.
The schedule and cost for the 101 courses (without a full-conference pass) are:
(Tickets to the 101 sessions do not grant access to the main Percona Live breakout sessions. Full Percona Live conferences passes grant admission to the 101 sessions. 101 Crash Course attendees will have full access to Percona Live keynote speakers the exhibit hall and receptions.)
As a special promo, the first 101 people to purchase the single 101 talks receive a $299.00 discount off the ticket price! Each session only costs $101! Get both sessions for a mere $202 and save $498.00! Register now using the following codes for your discount:
Click here to register.
Register for Percona Live 2017 now! Advanced Registration lasts until March 5, 2017. Percona Live is a very popular conference: this year’s Percona Live Europe sold out, and we’re looking to do the same for Percona Live 2017. Don’t miss your chance to get your ticket at its most affordable price. Click here to register.
Percona Live 2017 sponsorship opportunities are available now. Click here to find out how to sponsor.
Percona today announced a Percona Server for MongoDB 3.4 solution bundle of updated products. This release enables any organization to create a robust, secure database environment that can be adapted to changing business requirements.
Percona Server for MongoDB 3.4, Percona Monitoring and Management 1.1, and Percona Toolkit 3.0 offer more features and benefits, with enhancements for both MongoDB® and MySQL® database environments. When these Percona products are used together, organizations gain all the cost and agility benefits provided by free, proven open source software that delivers all the latest MongoDB Community Edition 3.4 features, additional Enterprise features, and a greater choice of storage engines. Along with improved insight into the database environment, the solution provides enhanced control options for optimizing a wider range of database workloads with greater reliability and security.
The solution will be generally available the week of Feb. 20.New Features and Benefits Summary
Percona Server for MongoDB 3.4
Percona Monitoring and Management 1.1
Percona Toolkit 3.0
For more information, see Percona’s press release.
For this blog post, I’ve decided to try ClickHouse: an open source column-oriented database management system developed by Yandex (it currently powers Yandex.Metrica, the world’s second-largest web analytics platform).
In my previous set of posts, I tested Apache Spark for big data analysis and used Wikipedia page statistics as a data source. I’ve used the same data as in the Apache Spark blog post: Wikipedia Page Counts. This allows me to compare ClickHouse’s performance to Spark’s.
I’ve spent some time testing ClickHouse for relatively large volumes of data (1.2Tb uncompressed). Here is a list of ClickHouse advantages and disadvantages that I saw:
Here is a full list of ClickHouse features
Full list of ClickHouse limitationsGroup by: in-memory vs. on-disk
Running out of memory is one of the potential problems you may encounter when working with large datasets in ClickHouse:SELECT min(toMonth(date)), max(toMonth(date)), path, count(*), sum(hits), sum(hits) / count(*) AS hit_ratio FROM wikistat WHERE (project = 'en') GROUP BY path ORDER BY hit_ratio DESC LIMIT 10 ↖ Progress: 1.83 billion rows, 85.31 GB (68.80 million rows/s., 3.21 GB/s.) ██████████▋ 6%Received exception from server: Code: 241. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception: Memory limit (for query) exceeded: would use 9.31 GiB (attempt to allocate chunk of 1048576 bytes), maximum: 9.31 GiB: (while reading column hits):
By default, ClickHouse limits the amount of memory for group by (it uses a hash table for group by). This is easily fixed – if you have free memory, increase this parameter:SET max_memory_usage = 128000000000; #128G
If you don’t have that much memory available, ClickHouse can “spill” data to disk by setting this:set max_bytes_before_external_group_by=20000000000; #20G set max_memory_usage=40000000000; #40G
According to the documentation, if you need to use max_bytes_before_external_group_by it is recommended to set max_memory_usage to be ~2x of the size of max_bytes_before_external_group_by.
(The reason for this is that the aggregation is performed in two phases: (1) reading and building an intermediate data, and (2) merging the intermediate data. The spill to disk can only happen during the first phase. If there won’t be spill, ClickHouse might need the same amount of RAM for stage 1 and 2.)Benchmarks: ClickHouse vs. Spark
Both ClickHouse and Spark can be distributed. However, for the purpose of this test I’ve run a single node for both ClickHouse and Spark. The results are quite impressive.
Benchmark summarySize / compression Spark v. 2.0.2 ClickHouse Data storage format Parquet, compressed: snappy Internal storage, compressed Size (uncompressed: 1.2TB) 395G 212G
Test Spark v. 2.0.2 ClickHouse Diff Query 1: count (warm) 7.37 sec (no disk IO) 6.61 sec ~same Query 2: simple group (warm) 792.55 sec (no disk IO) 37.45 sec 21x better Query 3: complex group by 2522.9 sec 398.55 sec 6.3x better
ClickHouse vs. MySQL
I wanted to see how ClickHouse compared to MySQL. Obviously, we can’t compare some workloads. For example:
Usually big data systems provide us with real-time queries. Systems based on map/reduce (i.e., Hive on top of HDFS) are just too slow for real-time queries, as it takes a long time to initialize the map/reduce job and send the code to all nodes.
Potentially, you can use ClickHouse for real-time queries. It does not support secondary indexes, however. This means it will probably scan lots of rows, but it can do it very quickly.
To do this test, I’m using the data from the Percona Monitoring and Management system. The table I’m using has 150 columns, so it is good for column storage. The size in MySQL is ~250G:mysql> show table status like 'query_class_metrics'G *************************** 1. row *************************** Name: query_class_metrics Engine: InnoDB Version: 10 Row_format: Compact Rows: 364184844 Avg_row_length: 599 Data_length: 218191888384 Max_data_length: 0 Index_length: 18590056448 Data_free: 6291456 Auto_increment: 416994305
Scanning the whole table is significantly faster in ClickHouse. Retrieving just ten rows by key is faster in MySQL (especially from memory).
But what if we only need to scan limited amount of rows and do a group by? In this case, ClickHouse may be faster. Here is the example (real query used to create sparklines):
MySQLSELECT (1480888800 - UNIX_TIMESTAMP(start_ts)) / 11520 as point, FROM_UNIXTIME(1480888800 - (SELECT point) * 11520) AS ts, COALESCE(SUM(query_count), 0) / 11520 AS query_count_per_sec, COALESCE(SUM(Query_time_sum), 0) / 11520 AS query_time_sum_per_sec, COALESCE(SUM(Lock_time_sum), 0) / 11520 AS lock_time_sum_per_sec, COALESCE(SUM(Rows_sent_sum), 0) / 11520 AS rows_sent_sum_per_sec, COALESCE(SUM(Rows_examined_sum), 0) / 11520 AS rows_examined_sum_per_sec FROM query_class_metrics WHERE query_class_id = 7 AND instance_id = 1259 AND (start_ts >= '2014-11-27 00:00:00' AND start_ts < '2014-12-05 00:00:00') GROUP BY point; ... 61 rows in set (0.10 sec) # Query_time: 0.101203 Lock_time: 0.000407 Rows_sent: 61 Rows_examined: 11639 Rows_affected: 0 explain SELECT ... *************************** 1. row *************************** id: 1 select_type: PRIMARY table: query_class_metrics partitions: NULL type: range possible_keys: agent_class_ts,agent_ts key: agent_class_ts key_len: 12 ref: NULL rows: 21686 filtered: 100.00 Extra: Using index condition; Using temporary; Using filesort *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: NULL partitions: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL filtered: NULL Extra: No tables used 2 rows in set, 2 warnings (0.00 sec)
It is relatively fast.
ClickHouse (some functions are different, so we will have to rewrite the query):SELECT intDiv(1480888800 - toRelativeSecondNum(start_ts), 11520) AS point, toDateTime(1480888800 - (point * 11520)) AS ts, SUM(query_count) / 11520 AS query_count_per_sec, SUM(Query_time_sum) / 11520 AS query_time_sum_per_sec, SUM(Lock_time_sum) / 11520 AS lock_time_sum_per_sec, SUM(Rows_sent_sum) / 11520 AS rows_sent_sum_per_sec, SUM(Rows_examined_sum) / 11520 AS rows_examined_sum_per_sec, SUM(Rows_affected_sum) / 11520 AS rows_affected_sum_per_sec FROM query_class_metrics WHERE (query_class_id = 7) AND (instance_id = 1259) AND ((start_ts >= '2014-11-27 00:00:00') AND (start_ts < '2014-12-05 00:00:00')) GROUP BY point; 61 rows in set. Elapsed: 0.017 sec. Processed 270.34 thousand rows, 14.06 MB (15.73 million rows/s., 817.98 MB/s.)
As we can see, even though ClickHouse scans more rows (270K vs. 11K – over 20x more) it is faster to execute the ClickHouse query (0.10 seconds in MySQL compared to 0.01 second in ClickHouse). The column store format helps a lot here, as MySQL has to read all 150 columns (stored inside InnoDB pages) and ClickHouse only needs to read seven columns.
Wikipedia trending article of the month
Inspired by the article about finding trending topics using Google Books n-grams data, I decided to implement the same algorithm on top of the Wikipedia page visit statistics data. My goal here is to find the “article trending this month,” which has significantly more visits this month compared to the previous month. As I was implementing the algorithm, I came across another ClickHouse limitation: join syntax is limited. In ClickHouse, you can only do join with the “using” keyword. This means that the fields you’re joining need to have the same name. If the field name is different, we have to use a subquery.
Below is an example.
First, create a temporary table to aggregate the visits per month per page:CREATE TABLE wikistat_by_month ENGINE = Memory AS SELECT path, mon, sum(hits) / total_hits AS ratio FROM ( SELECT path, hits, toMonth(date) AS mon FROM wikistat WHERE (project = 'en') AND (lower(path) NOT LIKE '%special%') AND (lower(path) NOT LIKE '%page%') AND (lower(path) NOT LIKE '%test%') AND (lower(path) NOT LIKE '%wiki%') AND (lower(path) NOT LIKE '%index.html%') ) AS a ANY INNER JOIN ( SELECT toMonth(date) AS mon, sum(hits) AS total_hits FROM wikistat WHERE (project = 'en') AND (lower(path) NOT LIKE '%special%') AND (lower(path) NOT LIKE '%page%') AND (lower(path) NOT LIKE '%test%') AND (lower(path) NOT LIKE '%wiki%') AND (lower(path) NOT LIKE '%index.html%') GROUP BY toMonth(date) ) AS b USING (mon) GROUP BY path, mon, total_hits ORDER BY ratio DESC Ok. 0 rows in set. Elapsed: 543.607 sec. Processed 53.77 billion rows, 2.57 TB (98.91 million rows/s., 4.73 GB/s.)
Second, calculate the actual list:SELECT path, mon + 1, a_ratio AS ratio, a_ratio / b_ratio AS increase FROM ( SELECT path, mon, ratio AS a_ratio FROM wikistat_by_month WHERE ratio > 0.0001 ) AS a ALL INNER JOIN ( SELECT path, CAST((mon - 1) AS UInt8) AS mon, ratio AS b_ratio FROM wikistat_by_month WHERE ratio > 0.0001 ) AS b USING (path, mon) WHERE (mon > 0) AND (increase > 2) ORDER BY mon ASC, increase DESC LIMIT 100 ┌─path───────────────────────────────────────────────┬─plus(mon, 1)─┬──────────────────ratio─┬───────────increase─┐ │ Heath_Ledger │ 2 │ 0.0008467223172121601 │ 6.853825241458039 │ │ Cloverfield │ 2 │ 0.0009372609760313347 │ 3.758937474560766 │ │ The_Dark_Knight_(film) │ 2 │ 0.0003508532447770276 │ 2.8858100355450484 │ │ Scientology │ 2 │ 0.0003300109101992719 │ 2.52497180013816 │ │ Barack_Obama │ 3 │ 0.0005786473399980557 │ 2.323409928527576 │ │ Canine_reproduction │ 3 │ 0.0004836300843539438 │ 2.0058985801174662 │ │ Iron_Man │ 6 │ 0.00036261003907049 │ 3.5301196568303888 │ │ Iron_Man_(film) │ 6 │ 0.00035634745198422497 │ 3.3815325090507193 │ │ Grand_Theft_Auto_IV │ 6 │ 0.0004036713142943461 │ 3.2112732008504885 │ │ Indiana_Jones_and_the_Kingdom_of_the_Crystal_Skull │ 6 │ 0.0002856570195547951 │ 2.683443198030021 │ │ Tha_Carter_III │ 7 │ 0.00033954377342889735 │ 2.820114216429247 │ │ EBay │ 7 │ 0.0006575000133427979 │ 2.5483158977946787 │ │ Bebo │ 7 │ 0.0003958340022793501 │ 2.3260912792668162 │ │ Facebook │ 7 │ 0.001683658379576915 │ 2.16460972864883 │ │ Yahoo!_Mail │ 7 │ 0.0002190640575012259 │ 2.1075879062784737 │ │ MySpace │ 7 │ 0.001395608643577507 │ 2.103263660621813 │ │ Gmail │ 7 │ 0.0005449834079575953 │ 2.0675919337716757 │ │ Hotmail │ 7 │ 0.0009126863121737026 │ 2.052471735190232 │ │ Google │ 7 │ 0.000601645849087389 │ 2.0155448612416644 │ │ Barack_Obama │ 7 │ 0.00027336526076130943 │ 2.0031305241832302 │ │ Facebook │ 8 │ 0.0007778115183044431 │ 2.543477658022576 │ │ MySpace │ 8 │ 0.000663544314346641 │ 2.534512981232934 │ │ Two-Face │ 8 │ 0.00026975137404447024 │ 2.4171743959768803 │ │ YouTube │ 8 │ 0.001482456447101451 │ 2.3884527929836152 │ │ Hotmail │ 8 │ 0.00044467667764940547 │ 2.2265750216262954 │ │ The_Dark_Knight_(film) │ 8 │ 0.0010482536106662156 │ 2.190078096294301 │ │ Google │ 8 │ 0.0002985028319919154 │ 2.0028812075734637 │ │ Joe_Biden │ 9 │ 0.00045067411455437264 │ 2.692262662620829 │ │ The_Dark_Knight_(film) │ 9 │ 0.00047863754833213585 │ 2.420864550676665 │ │ Sarah_Palin │ 10 │ 0.0012459220318907518 │ 2.607063205782761 │ │ Barack_Obama │ 12 │ 0.0034487235202817087 │ 15.615409029600414 │ │ George_W._Bush │ 12 │ 0.00042708730873936023 │ 3.6303098900144937 │ │ Fallout_3 │ 12 │ 0.0003568429236849597 │ 2.6193094036745155 │ └────────────────────────────────────────────────────┴──────────────┴────────────────────────┴────────────────────┘ 34 rows in set. Elapsed: 1.062 sec. Processed 1.22 billion rows, 49.03 GB (1.14 billion rows/s., 46.16 GB/s.)
Their response time is really good, considering the amount of data it needed to scan (the first query scanned 2.57 TB of data).Conclusion
The ClickHouse column-oriented database looks promising for data analytics, as well as for storing and processing structural event data and time series data. ClickHouse can be ~10x faster than Spark for some workloads.Appendix: Benchmark details
Query 1select count(*) from wikistat
ClickHouse::) select count(*) from wikistat; SELECT count(*) FROM wikistat ┌─────count()─┐ │ 26935251789 │ └─────────────┘ 1 rows in set. Elapsed: 6.610 sec. Processed 26.88 billion rows, 53.77 GB (4.07 billion rows/s., 8.13 GB/s.)
Spark:spark-sql> select count(*) from wikistat; 26935251789 Time taken: 7.369 seconds, Fetched 1 row(s)
Query 2select count(*), month(dt) as mon from wikistat where year(dt)=2008 and month(dt) between 1 and 10 group by month(dt) order by month(dt);
ClickHouse::) select count(*), toMonth(date) as mon from wikistat where toYear(date)=2008 and toMonth(date) between 1 and 10 group by mon; SELECT count(*), toMonth(date) AS mon FROM wikistat WHERE (toYear(date) = 2008) AND ((toMonth(date) >= 1) AND (toMonth(date) <= 10)) GROUP BY mon ┌────count()─┬─mon─┐ │ 2100162604 │ 1 │ │ 1969757069 │ 2 │ │ 2081371530 │ 3 │ │ 2156878512 │ 4 │ │ 2476890621 │ 5 │ │ 2526662896 │ 6 │ │ 2489723244 │ 7 │ │ 2480356358 │ 8 │ │ 2522746544 │ 9 │ │ 2614372352 │ 10 │ └────────────┴─────┘ 10 rows in set. Elapsed: 37.450 sec. Processed 23.37 billion rows, 46.74 GB (623.97 million rows/s., 1.25 GB/s.)
Spark:spark-sql> select count(*), month(dt) as mon from wikistat where year(dt)=2008 and month(dt) between 1 and 10 group by month(dt) order by month(dt); 2100162604 1 1969757069 2 2081371530 3 2156878512 4 2476890621 5 2526662896 6 2489723244 7 2480356358 8 2522746544 9 2614372352 10 Time taken: 792.552 seconds, Fetched 10 row(s)
Query 3SELECT path, count(*), sum(hits) AS sum_hits, round(sum(hits) / count(*), 2) AS hit_ratio FROM wikistat WHERE project = 'en' GROUP BY path ORDER BY sum_hits DESC LIMIT 100;
ClickHouse::) SELECT :-] path, :-] count(*), :-] sum(hits) AS sum_hits, :-] round(sum(hits) / count(*), 2) AS hit_ratio :-] FROM wikistat :-] WHERE (project = 'en') :-] GROUP BY path :-] ORDER BY sum_hits DESC :-] LIMIT 100; SELECT path, count(*), sum(hits) AS sum_hits, round(sum(hits) / count(*), 2) AS hit_ratio FROM wikistat WHERE project = 'en' GROUP BY path ORDER BY sum_hits DESC LIMIT 100 ┌─path────────────────────────────────────────────────┬─count()─┬───sum_hits─┬─hit_ratio─┐ │ Special:Search │ 44795 │ 4544605711 │ 101453.41 │ │ Main_Page │ 31930 │ 2115896977 │ 66266.74 │ │ Special:Random │ 30159 │ 533830534 │ 17700.54 │ │ Wiki │ 10237 │ 40488416 │ 3955.11 │ │ Special:Watchlist │ 38206 │ 37200069 │ 973.67 │ │ YouTube │ 9960 │ 34349804 │ 3448.78 │ │ Special:Randompage │ 8085 │ 28959624 │ 3581.9 │ │ Special:AutoLogin │ 34413 │ 24436845 │ 710.11 │ │ Facebook │ 7153 │ 18263353 │ 2553.24 │ │ Wikipedia │ 23732 │ 17848385 │ 752.08 │ │ Barack_Obama │ 13832 │ 16965775 │ 1226.56 │ │ index.html │ 6658 │ 16921583 │ 2541.54 │ … 100 rows in set. Elapsed: 398.550 sec. Processed 26.88 billion rows, 1.24 TB (67.45 million rows/s., 3.10 GB/s.)
Spark:spark-sql> SELECT > path, > count(*), > sum(hits) AS sum_hits, > round(sum(hits) / count(*), 2) AS hit_ratio > FROM wikistat > WHERE (project = 'en') > GROUP BY path > ORDER BY sum_hits DESC > LIMIT 100; ... Time taken: 2522.903 seconds, Fetched 100 row(s)
For general inquiries, please send us your question and someone will contact you.