Please join Sveta Smirnova, Percona’s Principal Technical Services Engineer, as she presents Troubleshooting MySQL Access Privileges Issues on
February 23, 2017 at 11:00 am PST / 2:00 pm EST (UTC-8).
Do you have registered users who can’t connect to the MySQL server? Strangers modifying data to which they shouldn’t have access?
MySQL supports a rich set of user privilege options and allows you to fine tune access to every object in the server. The latest versions support authentication plugins that help to create more access patterns.
However, finding errors in such a big set of options can be problematic. This is especially true for environments with hundreds of users, all with different privileges on multiple objects. In this webinar, I will show you how to decipher error messages and unravel the complicated setups that can lead to access errors. We will also cover network errors that mimic access privileges errors.
In this webinar, we will discuss:
To register for this webinar please click here.
This post is part of the series of Percona’s MongoDB 3.4 bundle release blogs. In mid-2016, Percona Monitoring and Management (PMM) added support for RocksDB with MongoDB, also known as “MongoRocks.” In this blog, we will go over the Percona Monitoring and Management (PMM) 1.1.0 version of the MongoDB RocksDB dashboard, how PMM is useful in the day-to-day monitoring of MongoDB and what we plan to add and extend.
Percona Monitoring and Management (PMM)
Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB, developed by Percona on top of open-source technology. Behind the scenes, the graphing features this article covers use Prometheus (a popular time-series data store), Grafana (a popular visualization tool), mongodb_exporter (our MongoDB database metric exporter) plus other technologies to provide database and operating system metric graphs for your database instances.
The mongodb_exporter tool, which provides our monitoring platform with MongoDB metrics, uses RocksDB status output and optional counters to provide detailed insight into RocksDB performance. Percona’s MongoDB 3.4 release enables RocksDB’s optional counters by default. On 3.2, however, you must set the following in /etc/mongod.conf to enable this: storage.rocksdb.counters: true .
This article shows a live demo of our MongoDB RocksDB graphs: https://pmmdemo.percona.com/graph/dashboard/db/mongodb-rocksdb.
RocksDB is a storage engine available since version 3.2 in Percona’s fork of MongoDB: Percona Server for MongoDB.
The first thing to know about monitoring RocksDB is compaction. RocksDB stores its data on disk using several tiered levels of immutable files. Changes written to disk are written to the first RocksDB level (Level0). Later the internal compactions merge the changes down to the next RocksDB level when Level0 fills. Each level before the last is essentially deltas to the resting data set that soon merges down to the bottom.
We can see the effect of the tiered levels in our “RocksDB Compaction Level Size” graph, which reflects the size of each level in RocksDB on-disk:
Note that most of the database data is in the final level “L6” (Level 6). Levels L0, L4 and L5 hold relatively smaller amounts of data changes. These get merged down to L6 via compaction.
More about this design is explained in detail by the developers of MongoRocks, here: https://www.percona.com/live/plam16/sessions/everything-you-wanted-know-about-mongorocks.
Most importantly, RocksDB compactions try to happen in the background. They generally do not “block” the database. However, the additional resource usage of compactions can potentially cause some spikes in latency, making compaction important to watch. When compactions occur, between levels L4 and L5 for example, L4 and L5 are read and merged with the result being written out as a new L5.
The memtable in MongoRocks is a 64mb in-memory table. Changes initially get written to the memtable. Reads check the memtable to see if there are unwritten changes to consider. When the memtable has filled to 100%, RocksDB performs a compaction of the memtable data to Level0, the first on-disk level in RocksDB.
In PMM we have added a single-stat panel for the percentage of the memtable usage. This is very useful in indicating when you can expect a memtable-to-level0 compaction to occur:
Above we can see the memtable is 125% used, which means RocksDB is late to finish (or start) a compaction due to high activity. Shortly after taking this screenshot above, however, our test system began a compaction of the memtable and this can be seen at the drop in active memtable entries below:
Following this compaction further through PMM’s graphs, we can see from the (very useful) “RocksDB Compaction Time” graph that this compaction took 5 seconds.
In the graph above, I have singled-out “L0” to show Level0’s compaction time. However, any level can be selected either per-graph (by clicking on the legend-item) or dashboard-wide (by using the RocksDB Level drop-down at the top of the page).
In terms of throughput, we can see from our “RocksDB Write Activity” graph (Read Activity is also graphed) that this compaction required about 33MBps of disk write activity:
On top of additional resource consumption such as the write activity above, compactions cause caches to get cleared. One example is the OS cache due to new level files being written. These factors can cause some increases to read latencies, demonstrated in this example below by the bump in L4 read latency (top graph) caused by the L4 compaction (bottom graph):
This pattern above is one area to check if you see latency spikes in RocksDB.
When RocksDB is unable to perform compaction promptly, it uses a feature called “stalls” to try and slow down the amount of data coming into the engine. In my experience, stalls almost always mean something below RocksDB is not up to the task (likely the storage system).
Here is the “RocksDB Stall Time” graph of a host experiencing frequent stalls:
PMM can graph the different types of RocksDB stalls in the “RocksDB Stalls” graph. In our case here, we have 0.3-0.5 stalls per second due to “level0_slowdown” and “level0_slowdown_with_compaction.” This happens when Level0 stalls the engine due to slow compaction performance below its level.
Another metric reflecting the poor compaction performance is the pending compactions in “RocksDB Pending Operations”:
As I mentioned earlier, this almost always means something below RocksDB itself cannot keep up. In the top-right of PMM, we have OS-level metrics in a drop-down, I recommend you look at “Disk Performance” in these scenarios:
On the “Disk Performance” dashboard you can see the “sda” disk has an average write time of 212ms, and a max of 1100ms (1.1 seconds). This is fairly slow.
Further, on the same dashboard I can see the CPU is waiting on disk I/O 98.70% of the time on average. This explains why RocksDB needs to stall to hold back some of the load!
The disks seem too busy to keep up! Looking at the “Mongod – Document Activity” graph, it explains the cause of the high disk usage: 10,000-60,000 inserts per second:
Here we can draw the conclusion that this volume of inserts on this system configuration causes some stalling in RocksDB.
RocksDB Block Cache
The RocksDB Block Cache is the in-heap cache RocksDB uses to cache uncompressed pages. Generally, deployments benefit from dedicating most of their memory to the Linux file system cache vs. the RocksDB Block Cache. We recommend using only 20-30% of the host RAM for block cache.
PMM can take away some of the guesswork with the “RocksDB Block Cache Hit Ratio” graph, showing the efficiency of the block cache:
It is difficult to define a “good” and “bad” number for this metric, as the number varies for every deployment. However, one important thing to look for is significant changes in this graph. In this example, the Block Cache has a page in cache 3000 times for every 1 time it does not.
If you wanted to test increasing your block cache, this graph becomes very useful. If you increase your block cache and do not see an improvement in the hit ratio after a lengthy period of testing, this usually means more block cache memory is not necessary.
RocksDB Read Latency Graphs
PMM graphs Read Latency metrics for RocksDB in several different graphs, one dedicated to Level0:
And three other graphs display Average, 99th Percentile and Maximum latencies for each RocksDB level. Here is an example from the 99th Percentile latency metrics:
Percona Monitoring and Management needs to add some more metrics that explain the performance of the engine. The rate of deletes/tombstones in the system affects RocksDB’s performance. Currently, this metric is not something our system can easily gather like other engine metrics. Percona Monitoring and Management can’t easily graph the efficiency of the Bloom filter yet, either. These are currently open feature requests to the MongoRocks (and likely RocksDB) team(s) to add in future versions.
Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.New features:
Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.
This release is the current GA (Generally Available) stable release in the 2.3 series.New Features
Other bugs fixed: #1655278.
The purpose of this blog post is to demonstrate current best-practices for an in-place Percona Monitoring and Management (PMM) upgrade. Following this method allows you to retain data previously collected by PMM in your MySQL or MongoDB environment, while upgrading to the latest version.Step 1: Housekeeping
Before beginning this process, I recommend that you use a package manager that installs directly from Percona’s official software repository. The install instructions vary by distro, but for Ubuntu users the commands are:wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.debsudo dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.deb Step 2: PMM Server Upgrade
Now that we have ensured we’re using Percona’s official software repository, we can continue with the upgrade. To check which version of PMM server is running, execute the following command on your PMM server host:docker ps
This command shows a list of all running Docker containers. The version of PMM server you are running is found in the image description.
Once you’ve verified you are on an older version, it’s time to upgrade!
The first step is to stop and remove your docker pmm-server container with the following command:docker stop pmm-server && docker rm pmm-server
Please note that this command may take several seconds to complete.
The next step is to create and run the image with the new version tag. In this case, we are installing version 1.1.0. Please make sure to verify the correct image name in the install instructions.
Run the command below to create and run the new image.docker run -d -p 80:80 --volumes-from pmm-data --name pmm-server --restart always percona/pmm-server:1.1.0
We can confirm our new image is running with the following command:docker ps
As you can see, the latest version of PMM server is installed. The final step in the process is to update the PMM client on each host to be monitored.Step 3: PMM Client Upgrade
The GA version of Percona Monitoring and Management supports in-place upgrades. Instructions can be found in our documentation. On the client side, update the local apt cache, and upgrade to the new version of pmm-client by running the following commands:apt-get updateapt-get install pmm-client
Congrats! We’ve successfully upgraded to the latest PMM version. As you can tell from the graph below, there is a slight gap in our polling data due to the downtime necessary to upgrade the version. However, we have verified that the data that existed prior to the upgrade is still available and new data is being gathered.Conclusion
I hope this blog post has given you the confidence to do an in-place Percona Monitoring and Management upgrade. As always, please submit your feedback on our forums with regards to any PMM-related suggestions or questions. Our goal is to make PMM the best-available open-source MySQL and MongoDB monitoring tool.
Join Percona’s MongoDB Practice Manager David Murphy on Wednesday, February 22, 2017 at 10:00 am PST / 1:00 pm EST (UTC-8) as he reviews and discusses the Percona Server for MongoDB, Percona Monitoring and Management (PMM) and Percona Toolkit product bundle release.
The webinar covers how this new bundled release ensures a robust, secure database that can be adapted to changing business requirements. It demonstrates how MongoDB, PMM and Percona Toolkit are used together so that organizations benefit from the cost savings and agility provided by free and proven open source software.
Percona Server for MongoDB 3.4 delivers all the latest MongoDB 3.4 Community Edition features, additional Enterprise features and a greater choice of storage engines.
Along with improved insight into the database environment, the solution provides enhanced control options for optimizing a wider range of database workloads with greater reliability and security.
Some of the features that will be discussed are:
David joined Percona in October 2015 as Practice Manager for MongoDB. Prior to that, David joined the ObjectRocket by Rackspace team as the Lead DBA in Sept 2013. With the growth involved with a any recently acquired startup, David’s role covered a wide range from evangelism, research, run book development, knowledge base design, consulting, technical account management, mentoring and much more.
Prior to the world of MongoDB, David was a MySQL and NoSQL architect at Electronic Arts. There, he worked with some of the largest titles in the world like FIFA, SimCity, and Battle Field providing tuning, design, and technology choice responsibilities. David maintains an active interest in database speaking and exploring new technologies.
This post is part of a series of Percona’s MongoDB 3.4 bundle release blogs. In this blog, we’ll look at the process for installing Percona Monitoring and Management (PMM) for the first time.Installing Percona Monitoring and Management
Percona Monitoring and Management (PMM) is Percona’s open source tool for monitoring databases. You can use it with either MongoDB and MySQL databases.
PMM requires the installation of a server and client component on each database server to be monitored. You can install the server component on a local or remote server, and monitor any MySQL or MongoDB instance (including Amazon RDS environments).What is it?
PMM provides a graphical view of the status of monitored databases. You can use it to perform query analytics and metrics review. The graphical component relies on Grafana, and uses Prometheus for information processing. It includes a Query Analytics module that allows you to analyze queries over a period of time and it uses Orchestrator for replication. Since the integration of these items is the most difficult part, the server is distributed as a preconfigured Docker image.
PMM works with any variety of MongoDB or MySQL.How do you install it?
As mentioned, there is a server component to PMM. You can install it on any server in your database environment, but be aware that if the server on which it is installed goes down or runs out of space, monitoring also fails. The server should have at least 5G of available disk space for each monitored client.
You must install the client component on each monitored database server. It is available as a package for a variety of Linux distributions.
To install the server
Next, install the client on each server to be monitored. It is installed based on the Linux distribution of the server.
Last, you connect the client(s) to the PMM server and monitoring begins.
The Query Analytics tool monitors and reviews queries run in the environment. It displays the current top 10 most time intensive queries. You can click on a query to view a detailed analysis of the query.
The Metrics Monitor gives you a historical view of queries. PMM separates time-based graphs by theme for additional clarity.
PMM includes Orchestrator for replication management and visualization. You must enable it separately, then you can view the metrics on the Discover page in Orchestrator.Ongoing administration and maintenance
PMM includes an administration tool that adds, removes or monitors services. The pmm-admin tool requires a user with root or sudo access.
You can enable HTTP password protection to add authentication when accessing the PMM Server web interface or use SSL encryption to secure traffic between PMM Client and PMM Server.
You can edit the config files located in <your Docker container>/etc/grafana to set up alerting. Log files are stored in <your Docker container>/var/log.
This blog post is the first in a series on Percona’s MongoDB 3.4 bundle release. This release includes Percona Server for MongoDB, Percona Monitoring and Management, and Percona Toolkit. In this post, we’ll look at the features included in the release.
We have a lot of great MongoDB content coming your way in the next few weeks. However, I wanted first to give you a quick list of the major things to be on the look out for.
This new bundled release ensures a robust, secure database that you can adapt to changing business requirements. It helps demonstrate how organizations can use MongoDB (and Percona Server for MongoDB), PMM and Percona Toolkit together to benefit from the cost savings and agility provided by free and proven open source software.
Percona Server for MongoDB 3.4 delivers all the latest MongoDB 3.4 Community Edition features, additional Enterprise features and a greater choice of storage engines.
Some of these new features include:
For all of these topics, you will see more blogs in the next few weeks that cover them in detail. Some people have asked what Percona’s MongoDB commitment looks like. Hopefully, this series of blogs help show how improving open source databases is central to the Percona vision. We are here to make the world better for developers, DBAs and other MongoDB users.
Percona announces the availability of Percona Toolkit 3.0.1 on February 20, 2017. This is the first general availability (GA) release in the 3.0 series with a focus on padding MongoDB tools:
Downloads are available from the Percona Software Repositories.
NOTE: If you are upgrading using Percona’s yum repositories, make sure that the you enable the basearch repo, because Percona Toolkit 3.0 is not available in the noarch repo.
Percona Toolkit is a collection of advanced command-line tools that perform a variety of MySQL and MongoDB server and system tasks too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.
This release includes changes from the previous 3.0.0 RC and the following additional changes:
Percona announces the release of Percona Monitoring and Management 1.1.1 on February 20, 2017. This is the first general availability (GA) release in the PMM 1.1 series with a focus on providing alternative deployment options for PMM Server:
NOTE: The AMIs and VirtualBox images above are still experimental. For production, it is recommended to run Docker images.
There are no changes compared to previous 1.1.0 Beta release, except small fixes for MongoDB metrics dashboards.
A live demo of PMM is available at pmmdemo.percona.com.
We welcome your feedback and questions on our PMM forum.
About Percona Monitoring and Management
Percona Monitoring and Management is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.
PMM is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.
Percona announces the release of Percona Server for MongoDB 3.4.2-1.2 on February 20, 2017. It is the first general availability (GA) release in the 3.4 series. Download the latest version from the Percona web site or the Percona Software Repositories.
Percona Server for MongoDB is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:
Percona Server for MongoDB requires no changes to MongoDB applications or code.
Percona Server for MongoDB 3.4.2-1.2 release notes are available in the official documentation.
Recently I worked on a ticket where a customer performed a point-in-time recovery PITR using a large set of binary logs. Normally we handle this by applying the last backup, then re-applying all binary logs created since the last backup. In the middle of the procedure, their new server crashed. We identified the binary log position and tried to restart the PITR from there. However, using the option --start-position, the restore failed with the error “The BINLOG statement of type Table_map was not preceded by a format description BINLOG statement.” This is a known bug and is reported as MySQL Bug #72804: “BINLOG statement can no longer be used to apply Query events.”
I created a small test to demonstrate a workaround that we implemented (and worked).
First, I ran a large import process that created several binary logs. I used a small value in max_binlog_size and tested using the database “employees” (a standard database used for testing).Then I dropped the database.mysql> set sql_log_bin=0; Query OK, 0 rows affected (0.33 sec) mysql> drop database employees; Query OK, 8 rows affected (1.25 sec)
To demonstrate the recovery process, I joined all the binary log files into one SQL file and started an import.sveta@Thinkie:~/build/ps-5.7/mysql-test$ ../bin/mysqlbinlog var/mysqld.1/data/master.000001 var/mysqld.1/data/master.000002 var/mysqld.1/data/master.000003 var/mysqld.1/data/master.000004 var/mysqld.1/data/master.000005 > binlogs.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$ GENERATE_ERROR.sh binlogs.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$ mysql < binlogs.sql ERROR 1064 (42000) at line 9020: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'inserting error
I intentionally generated a syntax error in the resulting file with the help of the GENERATE_ERROR.sh script (which just inserts a bogus SQL statement in a random row). The error message clearly showed where the import stopped: line 9020. I then created a file that cropped out the part that had already been imported (lines 1- 9020), and tried to import this new file.sveta@Thinkie:~/build/ps-5.7/mysql-test$ tail -n +9021 binlogs.sql >binlogs_rest.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$ mysql < binlogs_rest.sql ERROR 1609 (HY000) at line 134: The BINLOG statement of type `Table_map` was not preceded by a format description BINLOG statement.
Again, the import failed with exactly the same error as the customer. The reason for this error is that the BINLOG statement – which applies changes from the binary log – expects that the format description event gets run in the same session as the binary log import, but before it. The format description existed initially at the start of the import that failed at line 9020. The later import (from line 9021 on) doesn’t contain this format statement.
Fortunately, this format is the same for the same version! We can simply take it from the beginning the SQL log file (or the original binary file) and put into the file created after the crash without lines 1-9020.
With MySQL versions 5.6 and 5.7, this event is located in the first 11 rows:sveta@Thinkie:~/build/ps-5.7/mysql-test$ head -n 11 binlogs.sql | cat -n 1 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; 2 /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; 3 DELIMITER /*!*/; 4 # at 4 5 #170128 17:58:11 server id 1 end_log_pos 123 CRC32 0xccda074a Start: binlog v 4, server v 5.7.16-9-debug-log created 170128 17:58:11 at startup 6 ROLLBACK/*!*/; 7 BINLOG ' 8 g7GMWA8BAAAAdwAAAHsAAAAAAAQANS43LjE2LTktZGVidWctbG9nAAAAAAAAAAAAAAAAAAAAAAAA 9 AAAAAAAAAAAAAAAAAACDsYxYEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA 10 AUoH2sw= 11 '/*!*/;
The first six rows are meta information, and rows 6-11 are the format event itself. The only thing we need to export into our resulting file is these 11 lines:sveta@Thinkie:~/build/ps-5.7/mysql-test$ head -n 11 binlogs.sql > binlogs_rest_with_format.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$ cat binlogs_rest.sql >> binlogs_rest_with_format.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$ mysql < binlogs_rest_with_format.sql sveta@Thinkie:~/build/ps-5.7/mysql-test$
After this, the import succeeded!
Late last year we started a poll on what backend programming languages are being used by the open source community. The three components of the backend – server, application, and database – are what makes a website or application work. Below are the results of Percona’s poll on backend programming languages in use by the community:Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
One of the best-known and earliest web service stacks is the LAMP stack, which spelled out refers to Linux, Apache, MySQL and PHP/Perl/Python. We can see that this early model is still popular when it comes to the backend.
PHP still remains a very common choice for a backend programming language, with Python moving up the list as well. Perl seems to be fading in popularity, despite being used a lot in the MySQL world.
Finally, Go is a language to look out for. Go is an open source programming language created by Google. It first appeared in 2009, and is already more popular than Perl or Ruby according to this poll.
Thanks to the community for participating in our poll. You can take our latest poll on what database engine are you using to store time series data here.
The MariaDB Corporation is organizing a conference called M17 on the East Coast in April. Some Perconians (Peter Zaitsev, Vadim Tchachenko, Sveta Smirnova, Alex Rubin, Colin Charles) decided to submit some interesting talks for that conference. Percona also offered to sponsor the conference.
As of this post, the talks haven’t been accepted, and we were politely told that we couldn’t sponsor.
Some of the proposed talks were:
At Percona, we think MariaDB Server is an important part of the MySQL ecosystem. This is why the Percona Live Open Source Database Conference 2017 in Santa Clara has a MariaDB mini-track, consisting of talks from various Percona and MariaDB experts:
If any of these topics look enticing, come to the conference. We have MariaDB at Percona Live.
To make your decision easier, we’ve created a special promo code that gets you $75 off a full conference pass! Just use MariaDB@PL17 at checkout.
In the meantime, we will continue to write and discuss MariaDB, and any other open source database technologies. The power of the open source community is the free exchange of ideas, healthy competition and open dialog within the community.
Here are some more past presentations that are also relevant:
With Oracle clearly entering the “open source high availability solutions” arena with the release of their brand new Group Replication solution, I believe it is time to review the quality of the first GA (production ready) release.
TL;DR: Having examined the technology, it is my conclusion that Oracle seems to have released the GA version of Group Replication too early. While the product is definitely “working prototype” quality, the release seems rushed and unfinished. I found a significant number of issues, and I would personally not recommend it for production use.
It is obvious that Oracle is trying hard to ship technology to compete with Percona XtraDB Cluster, which is probably why they rushed to claim Group Replication GA quality.
If you’re all set to follow along and test Group Replication yourself, simplify the initial setup by using this Docker image. We can review some of the issues you might face together.
For the record, I tested the version based on MySQL 5.7.17 release.No automatic provisioning
First off, the first thing you’ll find is there is NO way to automatically setup of a new node.
If you need to setup new node or recover an existing node from a fatal failure, you’ll need to manually provision the slave.
Of course, you can clone a slave using Percona XtraBackup or LVM by employing some self-developed scripts. But given the high availability nature of the product, one would expect Group Replication to automatically re-provision any failed node.Bug: stale reads on nodes
Please see this bug:
One line summary: while any secondary nodes are “catching up” to whatever happened on a first node (it takes time to apply changes on secondary nodes), reads on a secondary node could return stale data (as shown in the bug report).
This behavior brings us back to the traditional asynchronous replication slave behavior (i.e., Group Replication’s predecessor).
It also contradicts the Group Replication documentation, which states: “There is a built-in group membership service that keeps the view of the group consistent and available for all servers at any given point in time.” (See https://dev.mysql.com/doc/refman/5.7/en/group-replication.html.)
I might also mention here that Percona XtraDB Cluster prevents stale reads (see https://www.percona.com/doc/percona-xtradb-cluster/5.7/wsrep-system-index.html#wsrep_sync_wait).Bug: nodes become unusable after a big transaction, refusing to execute further transactions
There are two related bugs:
One line summary: after running a big transaction, any secondary nodes become unusable and refuse to perform any further transactions.Obscure error messages
It is not uncommon to see cryptic error messages while testing Group Replication. For example:mysql> commit; ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.
This is fairly useless and provides little help until I check the mysqld error log. The log provides a little bit more information:2017-02-09T02:05:36.996776Z 18 [ERROR] Plugin group_replication reported: '[GCS] Gcs_packet's payload is too big. Only the packets smaller than 2113929216 bytes can be compressed.' Discussion:
The items highlighted above might not seem too bad at first, and you could assume that your workload won’t be affected. However, stale reads and node dysfunctions basically prevent me from running a more comprehensive evaluation.My recommendation:
If you care about your data, then I recommend not using Group Replication in production. Currently, it looks like it might cause plenty of headaches, and it is easy to get inconsistent results.
For the moment, Group Replication appears an advanced – but broken – traditional MySQL asynchronous replication.
I understand Oracle’s dilemma. Usually people are hesitant to test a product that is not GA. So in order to get feedback from users, Oracle needs to push the product to GA. Oracle must absolutely solve the issues above during future QA cycles.
Our most recent release of Percona Server for MySQ (Percona Server for MySQL 5.7.17) comes with Group Replication plugins. Unfortunately, since this technology is very new, it requires some fairly complicated steps to setup and get running. To help with that process, I’ve prepare Docker images that simplify its setup procedures.
You can find the image here: https://hub.docker.com/r/perconalab/pgr-57/.
To start the first node (bootstrap the group):docker run -d -p 3306 --net=clusternet -e MYSQL_ROOT_PASSWORD=passw0rd -e CLUSTER_NAME=cluster1 perconalab/pgr-57
To add nodes into the group after:docker run -d -p 3306 --net=clusternet -e MYSQL_ROOT_PASSWORD=passw0rd -e CLUSTER_NAME=cluster1 -e CLUSTER_JOIN=CONTAINER_ID_FROM_THE_FIRST_STEP perconalab/pgr-57
You can also get a full script that starts “N” number of nodes, here: https://github.com/Percona-Lab/percona-docker/blob/master/pgr-57/start_node.sh
The Percona Live Open Source Database Conference 2017 will once again host crash courses on MySQL and MongoDB. Read below to get an outstanding discount on either the MySQL or MongoDB crash course (or both).
The database community constantly tells us how hard it is to find someone with MySQL and MongoDB DBA skills who can help with the day-to-day management of their databases. This is especially difficult when companies don’t have a full-time requirement for a DBA. Developers, system administrators and IT staff spend too much time trying to solve basic database problems that keep them from doing their other job duties. Eventually, the little problems or performance inefficiencies that start to pile up lead to big problems.
In answer to this growing need, Percona Live is once again hosting crash courses for developers, systems administrators and other technical resources. A crash course is a one-day training session on either MySQL 101 or MongoDB 101.
Don’t let the name fool you: these courses are led by Percona database experts who will show you the fundamentals of MySQL or MongoDB tools and techniques.
And it’s not just for DBAs: developers are encouraged to attend to hone their database skills.
Below is a list of the topics covered in each course this year:MySQL 101 Topics MongoDB 101 Topics
Attendees will return ready to quickly and correctly take care of the day-to-day and week-to-week management of your MySQL or MongoDB environment.
The schedule and cost for the 101 courses (without a full-conference pass) are:
(Tickets to the 101 sessions do not grant access to the main Percona Live breakout sessions. Full Percona Live conferences passes grant admission to the 101 sessions. 101 Crash Course attendees will have full access to Percona Live keynote speakers the exhibit hall and receptions.)
As a special promo, the first 101 people to purchase the single 101 talks receive a $299.00 discount off the ticket price! Each session only costs $101! Get both sessions for a mere $202 and save $498.00! Register now using the following codes for your discount:
Click here to register.
Register for Percona Live 2017 now! Advanced Registration lasts until March 5, 2017. Percona Live is a very popular conference: this year’s Percona Live Europe sold out, and we’re looking to do the same for Percona Live 2017. Don’t miss your chance to get your ticket at its most affordable price. Click here to register.
Percona Live 2017 sponsorship opportunities are available now. Click here to find out how to sponsor.
Percona today announced a Percona Server for MongoDB 3.4 solution bundle of updated products. This release enables any organization to create a robust, secure database environment that can be adapted to changing business requirements.
Percona Server for MongoDB 3.4, Percona Monitoring and Management 1.1, and Percona Toolkit 3.0 offer more features and benefits, with enhancements for both MongoDB® and MySQL® database environments. When these Percona products are used together, organizations gain all the cost and agility benefits provided by free, proven open source software that delivers all the latest MongoDB Community Edition 3.4 features, additional Enterprise features, and a greater choice of storage engines. Along with improved insight into the database environment, the solution provides enhanced control options for optimizing a wider range of database workloads with greater reliability and security.
The solution will be generally available the week of Feb. 20.New Features and Benefits Summary
Percona Server for MongoDB 3.4
Percona Monitoring and Management 1.1
Percona Toolkit 3.0
For more information, see Percona’s press release.
For this blog post, I’ve decided to try ClickHouse: an open source column-oriented database management system developed by Yandex (it currently powers Yandex.Metrica, the world’s second-largest web analytics platform).
In my previous set of posts, I tested Apache Spark for big data analysis and used Wikipedia page statistics as a data source. I’ve used the same data as in the Apache Spark blog post: Wikipedia Page Counts. This allows me to compare ClickHouse’s performance to Spark’s.
I’ve spent some time testing ClickHouse for relatively large volumes of data (1.2Tb uncompressed). Here is a list of ClickHouse advantages and disadvantages that I saw:
Here is a full list of ClickHouse features
Full list of ClickHouse limitationsGroup by: in-memory vs. on-disk
Running out of memory is one of the potential problems you may encounter when working with large datasets in ClickHouse:SELECT min(toMonth(date)), max(toMonth(date)), path, count(*), sum(hits), sum(hits) / count(*) AS hit_ratio FROM wikistat WHERE (project = 'en') GROUP BY path ORDER BY hit_ratio DESC LIMIT 10 ↖ Progress: 1.83 billion rows, 85.31 GB (68.80 million rows/s., 3.21 GB/s.) ██████████▋ 6%Received exception from server: Code: 241. DB::Exception: Received from localhost:9000, 127.0.0.1. DB::Exception: Memory limit (for query) exceeded: would use 9.31 GiB (attempt to allocate chunk of 1048576 bytes), maximum: 9.31 GiB: (while reading column hits):
By default, ClickHouse limits the amount of memory for group by (it uses a hash table for group by). This is easily fixed – if you have free memory, increase this parameter:SET max_memory_usage = 128000000000; #128G
If you don’t have that much memory available, ClickHouse can “spill” data to disk by setting this:set max_bytes_before_external_group_by=20000000000; #20G set max_memory_usage=40000000000; #40G
According to the documentation, if you need to use max_bytes_before_external_group_by it is recommended to set max_memory_usage to be ~2x of the size of max_bytes_before_external_group_by.
(The reason for this is that the aggregation is performed in two phases: (1) reading and building an intermediate data, and (2) merging the intermediate data. The spill to disk can only happen during the first phase. If there won’t be spill, ClickHouse might need the same amount of RAM for stage 1 and 2.)Benchmarks: ClickHouse vs. Spark
Both ClickHouse and Spark can be distributed. However, for the purpose of this test I’ve run a single node for both ClickHouse and Spark. The results are quite impressive.
Benchmark summarySize / compression Spark v. 2.0.2 ClickHouse Data storage format Parquet, compressed: snappy Internal storage, compressed Size (uncompressed: 1.2TB) 395G 212G
Test Spark v. 2.0.2 ClickHouse Diff Query 1: count (warm) 7.37 sec (no disk IO) 6.61 sec ~same Query 2: simple group (warm) 792.55 sec (no disk IO) 37.45 sec 21x better Query 3: complex group by 2522.9 sec 398.55 sec 6.3x better
ClickHouse vs. MySQL
I wanted to see how ClickHouse compared to MySQL. Obviously, we can’t compare some workloads. For example:
Usually big data systems provide us with real-time queries. Systems based on map/reduce (i.e., Hive on top of HDFS) are just too slow for real-time queries, as it takes a long time to initialize the map/reduce job and send the code to all nodes.
Potentially, you can use ClickHouse for real-time queries. It does not support secondary indexes, however. This means it will probably scan lots of rows, but it can do it very quickly.
To do this test, I’m using the data from the Percona Monitoring and Management system. The table I’m using has 150 columns, so it is good for column storage. The size in MySQL is ~250G:mysql> show table status like 'query_class_metrics'G *************************** 1. row *************************** Name: query_class_metrics Engine: InnoDB Version: 10 Row_format: Compact Rows: 364184844 Avg_row_length: 599 Data_length: 218191888384 Max_data_length: 0 Index_length: 18590056448 Data_free: 6291456 Auto_increment: 416994305
Scanning the whole table is significantly faster in ClickHouse. Retrieving just ten rows by key is faster in MySQL (especially from memory).
But what if we only need to scan limited amount of rows and do a group by? In this case, ClickHouse may be faster. Here is the example (real query used to create sparklines):
MySQLSELECT (1480888800 - UNIX_TIMESTAMP(start_ts)) / 11520 as point, FROM_UNIXTIME(1480888800 - (SELECT point) * 11520) AS ts, COALESCE(SUM(query_count), 0) / 11520 AS query_count_per_sec, COALESCE(SUM(Query_time_sum), 0) / 11520 AS query_time_sum_per_sec, COALESCE(SUM(Lock_time_sum), 0) / 11520 AS lock_time_sum_per_sec, COALESCE(SUM(Rows_sent_sum), 0) / 11520 AS rows_sent_sum_per_sec, COALESCE(SUM(Rows_examined_sum), 0) / 11520 AS rows_examined_sum_per_sec FROM query_class_metrics WHERE query_class_id = 7 AND instance_id = 1259 AND (start_ts >= '2014-11-27 00:00:00' AND start_ts < '2014-12-05 00:00:00') GROUP BY point; ... 61 rows in set (0.10 sec) # Query_time: 0.101203 Lock_time: 0.000407 Rows_sent: 61 Rows_examined: 11639 Rows_affected: 0 explain SELECT ... *************************** 1. row *************************** id: 1 select_type: PRIMARY table: query_class_metrics partitions: NULL type: range possible_keys: agent_class_ts,agent_ts key: agent_class_ts key_len: 12 ref: NULL rows: 21686 filtered: 100.00 Extra: Using index condition; Using temporary; Using filesort *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: NULL partitions: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL filtered: NULL Extra: No tables used 2 rows in set, 2 warnings (0.00 sec)
It is relatively fast.
ClickHouse (some functions are different, so we will have to rewrite the query):SELECT intDiv(1480888800 - toRelativeSecondNum(start_ts), 11520) AS point, toDateTime(1480888800 - (point * 11520)) AS ts, SUM(query_count) / 11520 AS query_count_per_sec, SUM(Query_time_sum) / 11520 AS query_time_sum_per_sec, SUM(Lock_time_sum) / 11520 AS lock_time_sum_per_sec, SUM(Rows_sent_sum) / 11520 AS rows_sent_sum_per_sec, SUM(Rows_examined_sum) / 11520 AS rows_examined_sum_per_sec, SUM(Rows_affected_sum) / 11520 AS rows_affected_sum_per_sec FROM query_class_metrics WHERE (query_class_id = 7) AND (instance_id = 1259) AND ((start_ts >= '2014-11-27 00:00:00') AND (start_ts < '2014-12-05 00:00:00')) GROUP BY point; 61 rows in set. Elapsed: 0.017 sec. Processed 270.34 thousand rows, 14.06 MB (15.73 million rows/s., 817.98 MB/s.)
As we can see, even though ClickHouse scans more rows (270K vs. 11K – over 20x more) it is faster to execute the ClickHouse query (0.10 seconds in MySQL compared to 0.01 second in ClickHouse). The column store format helps a lot here, as MySQL has to read all 150 columns (stored inside InnoDB pages) and ClickHouse only needs to read seven columns.
Wikipedia trending article of the month
Inspired by the article about finding trending topics using Google Books n-grams data, I decided to implement the same algorithm on top of the Wikipedia page visit statistics data. My goal here is to find the “article trending this month,” which has significantly more visits this month compared to the previous month. As I was implementing the algorithm, I came across another ClickHouse limitation: join syntax is limited. In ClickHouse, you can only do join with the “using” keyword. This means that the fields you’re joining need to have the same name. If the field name is different, we have to use a subquery.
Below is an example.
First, create a temporary table to aggregate the visits per month per page:CREATE TABLE wikistat_by_month ENGINE = Memory AS SELECT path, mon, sum(hits) / total_hits AS ratio FROM ( SELECT path, hits, toMonth(date) AS mon FROM wikistat WHERE (project = 'en') AND (lower(path) NOT LIKE '%special%') AND (lower(path) NOT LIKE '%page%') AND (lower(path) NOT LIKE '%test%') AND (lower(path) NOT LIKE '%wiki%') AND (lower(path) NOT LIKE '%index.html%') ) AS a ANY INNER JOIN ( SELECT toMonth(date) AS mon, sum(hits) AS total_hits FROM wikistat WHERE (project = 'en') AND (lower(path) NOT LIKE '%special%') AND (lower(path) NOT LIKE '%page%') AND (lower(path) NOT LIKE '%test%') AND (lower(path) NOT LIKE '%wiki%') AND (lower(path) NOT LIKE '%index.html%') GROUP BY toMonth(date) ) AS b USING (mon) GROUP BY path, mon, total_hits ORDER BY ratio DESC Ok. 0 rows in set. Elapsed: 543.607 sec. Processed 53.77 billion rows, 2.57 TB (98.91 million rows/s., 4.73 GB/s.)
Second, calculate the actual list:SELECT path, mon + 1, a_ratio AS ratio, a_ratio / b_ratio AS increase FROM ( SELECT path, mon, ratio AS a_ratio FROM wikistat_by_month WHERE ratio > 0.0001 ) AS a ALL INNER JOIN ( SELECT path, CAST((mon - 1) AS UInt8) AS mon, ratio AS b_ratio FROM wikistat_by_month WHERE ratio > 0.0001 ) AS b USING (path, mon) WHERE (mon > 0) AND (increase > 2) ORDER BY mon ASC, increase DESC LIMIT 100 ┌─path───────────────────────────────────────────────┬─plus(mon, 1)─┬──────────────────ratio─┬───────────increase─┐ │ Heath_Ledger │ 2 │ 0.0008467223172121601 │ 6.853825241458039 │ │ Cloverfield │ 2 │ 0.0009372609760313347 │ 3.758937474560766 │ │ The_Dark_Knight_(film) │ 2 │ 0.0003508532447770276 │ 2.8858100355450484 │ │ Scientology │ 2 │ 0.0003300109101992719 │ 2.52497180013816 │ │ Barack_Obama │ 3 │ 0.0005786473399980557 │ 2.323409928527576 │ │ Canine_reproduction │ 3 │ 0.0004836300843539438 │ 2.0058985801174662 │ │ Iron_Man │ 6 │ 0.00036261003907049 │ 3.5301196568303888 │ │ Iron_Man_(film) │ 6 │ 0.00035634745198422497 │ 3.3815325090507193 │ │ Grand_Theft_Auto_IV │ 6 │ 0.0004036713142943461 │ 3.2112732008504885 │ │ Indiana_Jones_and_the_Kingdom_of_the_Crystal_Skull │ 6 │ 0.0002856570195547951 │ 2.683443198030021 │ │ Tha_Carter_III │ 7 │ 0.00033954377342889735 │ 2.820114216429247 │ │ EBay │ 7 │ 0.0006575000133427979 │ 2.5483158977946787 │ │ Bebo │ 7 │ 0.0003958340022793501 │ 2.3260912792668162 │ │ Facebook │ 7 │ 0.001683658379576915 │ 2.16460972864883 │ │ Yahoo!_Mail │ 7 │ 0.0002190640575012259 │ 2.1075879062784737 │ │ MySpace │ 7 │ 0.001395608643577507 │ 2.103263660621813 │ │ Gmail │ 7 │ 0.0005449834079575953 │ 2.0675919337716757 │ │ Hotmail │ 7 │ 0.0009126863121737026 │ 2.052471735190232 │ │ Google │ 7 │ 0.000601645849087389 │ 2.0155448612416644 │ │ Barack_Obama │ 7 │ 0.00027336526076130943 │ 2.0031305241832302 │ │ Facebook │ 8 │ 0.0007778115183044431 │ 2.543477658022576 │ │ MySpace │ 8 │ 0.000663544314346641 │ 2.534512981232934 │ │ Two-Face │ 8 │ 0.00026975137404447024 │ 2.4171743959768803 │ │ YouTube │ 8 │ 0.001482456447101451 │ 2.3884527929836152 │ │ Hotmail │ 8 │ 0.00044467667764940547 │ 2.2265750216262954 │ │ The_Dark_Knight_(film) │ 8 │ 0.0010482536106662156 │ 2.190078096294301 │ │ Google │ 8 │ 0.0002985028319919154 │ 2.0028812075734637 │ │ Joe_Biden │ 9 │ 0.00045067411455437264 │ 2.692262662620829 │ │ The_Dark_Knight_(film) │ 9 │ 0.00047863754833213585 │ 2.420864550676665 │ │ Sarah_Palin │ 10 │ 0.0012459220318907518 │ 2.607063205782761 │ │ Barack_Obama │ 12 │ 0.0034487235202817087 │ 15.615409029600414 │ │ George_W._Bush │ 12 │ 0.00042708730873936023 │ 3.6303098900144937 │ │ Fallout_3 │ 12 │ 0.0003568429236849597 │ 2.6193094036745155 │ └────────────────────────────────────────────────────┴──────────────┴────────────────────────┴────────────────────┘ 34 rows in set. Elapsed: 1.062 sec. Processed 1.22 billion rows, 49.03 GB (1.14 billion rows/s., 46.16 GB/s.)
Their response time is really good, considering the amount of data it needed to scan (the first query scanned 2.57 TB of data).Conclusion
The ClickHouse column-oriented database looks promising for data analytics, as well as for storing and processing structural event data and time series data. ClickHouse can be ~10x faster than Spark for some workloads.Appendix: Benchmark details
Query 1select count(*) from wikistat
ClickHouse::) select count(*) from wikistat; SELECT count(*) FROM wikistat ┌─────count()─┐ │ 26935251789 │ └─────────────┘ 1 rows in set. Elapsed: 6.610 sec. Processed 26.88 billion rows, 53.77 GB (4.07 billion rows/s., 8.13 GB/s.)
Spark:spark-sql> select count(*) from wikistat; 26935251789 Time taken: 7.369 seconds, Fetched 1 row(s)
Query 2select count(*), month(dt) as mon from wikistat where year(dt)=2008 and month(dt) between 1 and 10 group by month(dt) order by month(dt);
ClickHouse::) select count(*), toMonth(date) as mon from wikistat where toYear(date)=2008 and toMonth(date) between 1 and 10 group by mon; SELECT count(*), toMonth(date) AS mon FROM wikistat WHERE (toYear(date) = 2008) AND ((toMonth(date) >= 1) AND (toMonth(date) <= 10)) GROUP BY mon ┌────count()─┬─mon─┐ │ 2100162604 │ 1 │ │ 1969757069 │ 2 │ │ 2081371530 │ 3 │ │ 2156878512 │ 4 │ │ 2476890621 │ 5 │ │ 2526662896 │ 6 │ │ 2489723244 │ 7 │ │ 2480356358 │ 8 │ │ 2522746544 │ 9 │ │ 2614372352 │ 10 │ └────────────┴─────┘ 10 rows in set. Elapsed: 37.450 sec. Processed 23.37 billion rows, 46.74 GB (623.97 million rows/s., 1.25 GB/s.)
Spark:spark-sql> select count(*), month(dt) as mon from wikistat where year(dt)=2008 and month(dt) between 1 and 10 group by month(dt) order by month(dt); 2100162604 1 1969757069 2 2081371530 3 2156878512 4 2476890621 5 2526662896 6 2489723244 7 2480356358 8 2522746544 9 2614372352 10 Time taken: 792.552 seconds, Fetched 10 row(s)
Query 3SELECT path, count(*), sum(hits) AS sum_hits, round(sum(hits) / count(*), 2) AS hit_ratio FROM wikistat WHERE project = 'en' GROUP BY path ORDER BY sum_hits DESC LIMIT 100;
ClickHouse::) SELECT :-] path, :-] count(*), :-] sum(hits) AS sum_hits, :-] round(sum(hits) / count(*), 2) AS hit_ratio :-] FROM wikistat :-] WHERE (project = 'en') :-] GROUP BY path :-] ORDER BY sum_hits DESC :-] LIMIT 100; SELECT path, count(*), sum(hits) AS sum_hits, round(sum(hits) / count(*), 2) AS hit_ratio FROM wikistat WHERE project = 'en' GROUP BY path ORDER BY sum_hits DESC LIMIT 100 ┌─path────────────────────────────────────────────────┬─count()─┬───sum_hits─┬─hit_ratio─┐ │ Special:Search │ 44795 │ 4544605711 │ 101453.41 │ │ Main_Page │ 31930 │ 2115896977 │ 66266.74 │ │ Special:Random │ 30159 │ 533830534 │ 17700.54 │ │ Wiki │ 10237 │ 40488416 │ 3955.11 │ │ Special:Watchlist │ 38206 │ 37200069 │ 973.67 │ │ YouTube │ 9960 │ 34349804 │ 3448.78 │ │ Special:Randompage │ 8085 │ 28959624 │ 3581.9 │ │ Special:AutoLogin │ 34413 │ 24436845 │ 710.11 │ │ Facebook │ 7153 │ 18263353 │ 2553.24 │ │ Wikipedia │ 23732 │ 17848385 │ 752.08 │ │ Barack_Obama │ 13832 │ 16965775 │ 1226.56 │ │ index.html │ 6658 │ 16921583 │ 2541.54 │ … 100 rows in set. Elapsed: 398.550 sec. Processed 26.88 billion rows, 1.24 TB (67.45 million rows/s., 3.10 GB/s.)
Spark:spark-sql> SELECT > path, > count(*), > sum(hits) AS sum_hits, > round(sum(hits) / count(*), 2) AS hit_ratio > FROM wikistat > WHERE (project = 'en') > GROUP BY path > ORDER BY sum_hits DESC > LIMIT 100; ... Time taken: 2522.903 seconds, Fetched 100 row(s)
Time series data is some of the most actionable data available when it comes to analyzing trends and making predictions. Simply put, time series data is data that is indexed not just by value, but by time as well – allowing you to view value changes over time as they occur. Obvious uses include the stock market, web traffic, user behavior, etc.
With the increasing number of smart devices in the Internet of Things (IoT), being able to track data over time is more and more important. With time series data, you can measure and make predictions on things like energy consumption, pH values, water consumption, data from environment-aware machines like smart cars, etc. The sensors used in IoT devices and systems generate huge amounts of time-series data.
How is all of this data collected, segmented and stored? We’d like to hear from you: what database engine are you using to store time series data? Please take a few seconds and answer the following poll. Which are you using? Help the community learn what database engines help solve critical database issues. Please select from one to three database engines as they apply to your environment. Feel free to add comments below if your engine isn’t listed.Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
For general inquiries, please send us your question and someone will contact you.