Buy Percona SupportEmergency? Get 24/7 Help Now!
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 19 hours 53 min ago

MySQL CDC, Streaming Binary Logs and Asynchronous Triggers

September 13, 2016 - 3:21pm

In this post, we’ll look at MySQL CDC, streaming binary logs and asynchronous triggers.

What is Change Data Capture and why do we need it?

Change Data Capture (CDC) tracks data changes (usually close to realtime). In MySQL, the easiest and probably most efficient way to track data changes is to use binary logs. However, other approaches exist. For example:

  • General log or Audit Log Plugin (which logs all queries, not just the changes)
  • MySQL triggers (not recommended, as it can slow down the application — more below)

One of the first implementations of CDC for MySQL was the FlexCDC project by Justin Swanhart. Nowadays, there are a lot of CDC implementations (see mysql-cdc-projects wiki for a long list).

CDC can be implemented for various tasks such as auditing, copying data to another system or processing (and reacting to) events. In this blog post, I will demonstrate how to use a CDC approach to stream MySQL binary logs, process events and save it (stream to) another MySQL instance (or MongoDB). In addition, I will show how to implement asynchronous triggers by streaming binary logs.

Streaming binary logs 

You can read binary logs using the mysqlbinlog utility, by adding “-vvv” (verbose option). mysqlbinlog can also show human readable version for the ROW based replication. For example:

# mysqlbinlog -vvv /var/lib/mysql/master.000001 BINLOG ' JxiqVxMBAAAALAAAAI7LegAAAHQAAAAAAAEABHRlc3QAAWEAAQMAAUTAFAY= JxiqVx4BAAAAKAAAALbLegAAAHQAAAAAAAEAAgAB//5kAAAAedRLHg== '/*!*/; ### INSERT INTO `test`.`a` ### SET ### @1=100 /* INT meta=0 nullable=1 is_null=0 */ # at 8047542 #160809 17:51:35 server id 1 end_log_pos 8047573 CRC32 0x56b36ca5 Xid = 24453 COMMIT/*!*/;

Starting with MySQL 5.6, mysqlbinlog can also read the binary log events from a remote master (“fake” replication slave).

Reading binary logs is a great basis for CDC. However, there are still some challenges:

  1. ROW-based replication is probably the easiest way to get the RAW changes, otherwise we will have to parse SQL. At the same time, ROW-based replication binary logs don’t contain the table metadata, i.e. it does not record the field names, only field number (as in the example above “@1” is the first field in table “a”).
  2. We will need to somehow record and store the binary log positions so that the tool can be restarted at any time and proceed from the last position (like a MySQL replication slave).

Maxwell’s daemon (Maxwell = Mysql + Kafka), an application recently released by Zendesk, reads MySQL binlogs and writes row updates as JSON (it can write to Kafka, which is its primary goal, but can also write to stdout and can be extended for other purposes). Maxwell stores the metadata about MySQL tables and binary log events (and other metadata) inside MySQL, so it solves the potential issues from the above list.

Here is a quick demo of Maxwell:

Session 1 (Insert into MySQL):

mysql> insert into a (i) values (151); Query OK, 1 row affected (0.00 sec) mysql> update a set i = 300 limit 5; Query OK, 5 rows affected (0.01 sec) Rows matched: 5 Changed: 5 Warnings: 0

Session 2 (starting Maxwell):

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout 16:00:15,303 INFO Maxwell - Maxwell is booting (StdoutProducer), starting at BinlogPosition[master.000001:15494460] 16:00:15,327 INFO TransportImpl - connecting to host: 127.0.0.1, port: 3306 16:00:15,350 INFO TransportImpl - connected to host: 127.0.0.1, port: 3306, context: AbstractTransport.Context[threadId=9,... 16:00:15,350 INFO AuthenticatorImpl - start to login, user: maxwell, host: 127.0.0.1, port: 3306 16:00:15,354 INFO AuthenticatorImpl - login successfully, user: maxwell, detail: OKPacket[packetMarker=0,affectedRows=0,insertId=0,serverStatus=2,warningCount=0,message=<null>] 16:00:15,533 INFO MysqlSavedSchema - Restoring schema id 1 (last modified at BinlogPosition[master.000001:3921]) {"database":"test","table":"a","type":"insert","ts":1472937475,"xid":211209,"commit":true,"data":{"i":151}} {"database":"test","table":"a","type":"insert","ts":1472937475,"xid":211209,"commit":true,"data":{"i":151}} {"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}} {"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}} {"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}} {"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"data":{"i":300},"old":{"i":150}} {"database":"test","table":"a","type":"update","ts":1472937535,"xid":211333,"commit":true,"data":{"i":300},"old":{"i":150}}

As we can see in this example, Maxwell get the events from MySQL replication stream and outputs it into stdout (if we change the producer, it can save it to Apache Kafka).

Saving binlog events to MySQL document store or MongoDB

If we want to save the events to some other place we can use MongoDB or MySQL JSON fields and document store (as Maxwell will provide use with JSON documents). For a simple proof of concept, I’ve created nodeJS scripts to implement a CDC “pipleline”:

var mysqlx = require('mysqlx'); var mySession = mysqlx.getSession({ host: '10.0.0.2', port: 33060, dbUser: 'root', dbPassword: 'xxx' }); process.on('SIGINT', function() { console.log("Caught interrupt signal. Exiting..."); process.exit() }); process.stdin.setEncoding('utf8'); process.stdin.on('readable', () => { var chunk = process.stdin.read(); if(chunk != null) { process.stdout.write(`data: ${chunk}`); mySession.then(session => { session.getSchema("mysqlcdc").getCollection("mysqlcdc") .add( JSON.parse(chunk) ) .execute(function (row) { // can log something here }).catch(err => { console.log(err); }) .then( function (notices) { console.log("Wrote to MySQL: " + JSON.stringify(notices)) }); }).catch(function (err) { console.log(err); process.exit(); }); } }); process.stdin.on('end', () => { process.stdout.write('end'); process.stdin.resume(); });

And to run it we can use the pipeline:

./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR | node ./maxwell_to_mysql.js

The same approach can be used to save the CDC events to MongoDB with mongoimport:

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR |mongoimport -d mysqlcdc -c mysqlcdc --host localhost:27017

Reacting to binary log events: asynchronous triggers

In the above example, we only recorded the binary log events. Now we can add “reactions”.

One of the practical applications is re-implementing MySQL triggers to something more performant. MySQL triggers are executed for each row, and are synchronous (the query will not return until the trigger event finishes). This was known to cause poor performance, and can significantly slow down bulk operations (i.e., “load data infile” or “insert into … values (…), (…)”). With triggers, MySQL will have to process the “bulk” operations row by row, killing the performance. In addition, when using statement-based replication, triggers on the slave can slow down the replication thread (it is much less relevant nowadays with ROW-based replication and potentially multithreaded slaves).

With the ability to read binary logs from MySQL (using Maxwell), we can process the events and re-implement triggers — now in asynchronous mode — without delaying MySQL operations. As Maxwell gives us a JSON document with the “new” and “old” values (with the default option binlog_row_image=FULL, MySQL records the previous values for updates and deletes) we can use it to create triggers.

Not all triggers can be easily re-implemented based on the binary logs. However, in my experience most of the triggers in MySQL are used for:

  • auditing (if you deleted a row, what was the previous value and/or who did and when)
  • enriching the existing table (i.e., update the field in the same table)

Here is a quick algorithm for how to re-implement the triggers with Maxwell:

  • Find the trigger table and trigger event text (SQL)
  • Create an app or a script to parse JSON for the trigger table
  • Create a new version of the SQL changing the NEW.<field> to “data.field” (from JSON) and OLD.<field> to “old.field” (from JSON)

For example, if I want to audit all deletes in the “transactions” table, I can do it with Maxwell and a simple Python script (do not use this in production, it is a very basic sample):

import json,sys line = sys.stdin.readline() while line: print line, obj=json.loads(line); if obj["type"] == "delete": print "INSERT INTO transactions_delete_log VALUES ('" + str(obj["data"]) + "', Now() )" line = sys.stdin.readline()

MySQL:

mysql> delete from transactions where user_id = 2; Query OK, 1 row affected (0.00 sec)

Maxwell pipeline:

$ ./bin/maxwell --user='maxwell' --password='maxwell' --host='127.0.0.1' --producer=stdout --log_level=ERROR | python trigger.py {"database":"test","table":"transactions","type":"delete","ts":1472942384,"xid":214395,"commit":true,"data":{"id":2,"user_id":2,"value":2,"last_updated":"2016-09-03 22:39:31"}} INSERT INTO transactions_delete_log VALUES ('{u'last_updated': u'2016-09-03 22:39:31', u'user_id': 2, u'id': 2, u'value': 2}', Now() )

Maxwell limitations

Maxwell was designed for MySQL 5.6 with ROW-based replication. Although it can work with MySQL 5.7, it does not support new MySQL 5.7 data types (i.e., JSON fields). Maxwell does not support GTID, and can’t failover based on GTID (it can parse events with GTID thou).

Conclusion

Streaming MySQL binary logs (for example with Maxwell application) can help to implement CDC for auditing and other purposes, and also implement asynchronous triggers (removing the MySQL level triggers can increase MySQL performance).

ProxySQL and MHA Integration

September 13, 2016 - 11:03am

This blog post discusses ProxySQL and MHA integration, and how they work together.

MHA (Master High Availability Manager and tools for MySQL) is almost fully integrated with the ProxySQL process. This means you can count on the MHA standard feature to manage failover, and ProxySQL to manage the traffic and shift from one server to another.

This is one of the main differences between MHA and VIP, and MHA and ProxySQL: with MHA/ProxySQL, there is no need to move IPs or re-define DNS.

The following is an example of an MHA configuration file for use with ProxySQL:

server default] user=mha password=mha ssh_user=root repl_password=replica manager_log=/tmp/mha.log manager_workdir=/tmp remote_workdir=/tmp master_binlog_dir=/opt/mysql_instances/mha1/logs client_bindir=/opt/mysql_templates/mysql-57/bin client_libdir=/opt/mysql_templates/mysql-57/lib master_ip_failover_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_failover master_ip_online_change_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_online_change log_level=debug [server1] hostname=mha1r ip=192.168.1.104 candidate_master=1 [server2] hostname=mha2r ip=192.168.1.107 candidate_master=1 [server3] hostname=mha3r ip=192.168.1.111 candidate_master=1 [server4] hostname=mha4r ip=192.168.1.109 no_master=1

NOTE: Be sure to comment out the “FIX ME ” lines in the sample/scripts.

After that, just install MHA as you normally would.

In ProxySQL, be sure to have all MHA users and the servers set.

When using ProxySQL with standard replication, it’s important to set additional privileges for the ProxySQL monitor user. It must also have “Replication Client” set or it will fail to check the SLAVE LAG. The servers MUST have a defined value for the attribute max_replication_lag, or the check will be ignored.

As a reminder:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',600,3306,1000,0); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.107',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.111',601,3306,1000,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.109',601,3306,1000,10); INSERT INTO mysql_replication_hostgroups VALUES (600,601); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_W',600,1); insert into mysql_query_rules (username,destination_hostgroup,active) values('mha_R',601,1); insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',600,1,3,'^SELECT.*FOR UPDATE'); insert into mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) values('mha_RW',601,1,3,'^SELECT'); LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK; insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_W','test',1,600,'test_mha'); insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_R','test',1,601,'test_mha'); insert into mysql_users (username,password,active,default_hostgroup,default_schema) values ('mha_RW','test',1,600,'test_mha'); LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK

OK, now that all is ready,  let’s rock’n’roll!

Controlled fail-over

First of all, the masterha_manager should not be running or you will get an error.

Now let’s start some traffic:

Write sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all run Read only sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10 --mysql-ignore-errors=all run

Let it run for a bit, then check:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | 192.168.1.104 | 3306 | ONLINE | 10 | 0 | 20 | 0 | 551256 | 44307633 | 0 | 285 | <--- current Master | 601 | 192.168.1.111 | 3306 | ONLINE | 5 | 3 | 11 | 0 | 1053685 | 52798199 | 4245883580 | 1133 | | 601 | 192.168.1.109 | 3306 | ONLINE | 3 | 5 | 10 | 0 | 1006880 | 50473746 | 4052079567 | 369 | | 601 | 192.168.1.107 | 3306 | ONLINE | 3 | 5 | 13 | 0 | 1040524 | 52102581 | 4178965796 | 604 | | 601 | 192.168.1.104 | 3306 | ONLINE | 7 | 1 | 16 | 0 | 987548 | 49458526 | 3954722258 | 285 | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Now perform the failover. To do this, instruct MHA to do a switch, and to set the OLD master as a new slave:

masterha_master_switch --master_state=alive --conf=/etc/mha.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=0

Check what happened:

[ 160s] threads: 10, tps: 354.50, reads: 3191.10, writes: 1418.50, response time: 48.96ms (95%), errors: 0.00, reconnects: 0.00 [ 170s] threads: 10, tps: 322.50, reads: 2901.98, writes: 1289.89, response time: 55.45ms (95%), errors: 0.00, reconnects: 0.00 [ 180s] threads: 10, tps: 304.60, reads: 2743.12, writes: 1219.91, response time: 58.09ms (95%), errors: 0.10, reconnects: 0.00 <--- moment of the switch [ 190s] threads: 10, tps: 330.40, reads: 2973.40, writes: 1321.00, response time: 50.52ms (95%), errors: 0.00, reconnects: 0.00 [ 200s] threads: 10, tps: 304.20, reads: 2745.60, writes: 1217.60, response time: 58.40ms (95%), errors: 0.00, reconnects: 1.00 [ 210s] threads: 10, tps: 353.80, reads: 3183.80, writes: 1414.40, response time: 48.15ms (95%), errors: 0.00, reconnects: 0.00

Check ProxySQL:

mysql> select * from stats_mysql_connection_pool where hostgroup between 600 and 601 order by hostgroup,srv_host desc; +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 600 | 192.168.1.107 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 123457 | 9922280 | 0 | 658 | <--- new master | 601 | 192.168.1.111 | 3306 | ONLINE | 2 | 6 | 14 | 0 | 1848302 | 91513537 | 7590137770 | 1044 | | 601 | 192.168.1.109 | 3306 | ONLINE | 5 | 3 | 12 | 0 | 1688789 | 83717258 | 6927354689 | 220 | | 601 | 192.168.1.107 | 3306 | ONLINE | 3 | 5 | 13 | 0 | 1834415 | 90789405 | 7524861792 | 658 | | 601 | 192.168.1.104 | 3306 | ONLINE | 6 | 2 | 24 | 0 | 1667252 | 82509124 | 6789724589 | 265 | +-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

In this case, the servers weren’t behind the master and switch happened quite fast.

We can see that the WRITE operations that normally are an issue, given the need to move around a VIP or change name resolution, had a limited hiccup.

Read operations were not affected, at all. Nice, eh?

Do you know how long it takes to do a switch under these conditions? real 0m2.710s yes 2.7 seconds.

This is more evidence that, most of the time, an MHA-based switch is caused by the need to redirect traffic from A to B using the network.

Crash fail-over

What happened if instead of an easy switch, we have to cover a real failover?

First of all, let’s start masterha_manager:

nohup masterha_manager --conf=/etc/mha.cnf --wait_on_monitor_error=60 --wait_on_failover_error=60 >> /tmp/mha.log 2>&1

Then let’s start a load again. Finally, go to the MySQL node that uses master xxx.xxx.xxx.107

ps aux|grep mysql mysql 18755 0.0 0.0 113248 1608 pts/0 S Aug28 0:00 /bin/sh /opt/mysql_templates/mysql-57/bin/mysqld_safe --defaults-file=/opt/mysql_instances/mha1/my.cnf mysql 21975 3.2 30.4 4398248 941748 pts/0 Sl Aug28 93:21 /opt/mysql_templates/mysql-57/bin/mysqld --defaults-file=/opt/mysql_instances/mha1/my.cnf --basedir=/opt/mysql_templates/mysql-57/ --datadir=/opt/mysql_instances/mha1/data --plugin-dir=/opt/mysql_templates/mysql-57//lib/plugin --log-error=/opt/mysql_instances/mha1/mysql-3306.err --open-files-limit=65536 --pid-file=/opt/mysql_instances/mha1/mysql.pid --socket=/opt/mysql_instances/mha1/mysql.sock --port=3306 And kill the MySQL process. kill -9 21975 18755

As before, check what happened on the application side:

[ 80s] threads: 4, tps: 213.20, reads: 1919.10, writes: 853.20, response time: 28.74ms (95%), errors: 0.00, reconnects: 0.00 [ 90s] threads: 4, tps: 211.30, reads: 1901.80, writes: 844.70, response time: 28.63ms (95%), errors: 0.00, reconnects: 0.00 [ 100s] threads: 4, tps: 211.90, reads: 1906.40, writes: 847.90, response time: 28.60ms (95%), errors: 0.00, reconnects: 0.00 [ 110s] threads: 4, tps: 211.10, reads: 1903.10, writes: 845.30, response time: 29.27ms (95%), errors: 0.30, reconnects: 0.00 <-- issue starts [ 120s] threads: 4, tps: 198.30, reads: 1785.10, writes: 792.40, response time: 28.43ms (95%), errors: 0.00, reconnects: 0.00 [ 130s] threads: 4, tps: 0.00, reads: 0.60, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.40 <-- total stop in write [ 140s] threads: 4, tps: 173.80, reads: 1567.80, writes: 696.30, response time: 34.89ms (95%), errors: 0.40, reconnects: 0.00 <-- writes restart [ 150s] threads: 4, tps: 195.20, reads: 1755.10, writes: 780.50, response time: 33.98ms (95%), errors: 0.00, reconnects: 0.00 [ 160s] threads: 4, tps: 196.90, reads: 1771.30, writes: 786.80, response time: 33.49ms (95%), errors: 0.00, reconnects: 0.00 [ 170s] threads: 4, tps: 193.70, reads: 1745.40, writes: 775.40, response time: 34.39ms (95%), errors: 0.00, reconnects: 0.00 [ 180s] threads: 4, tps: 191.60, reads: 1723.70, writes: 766.20, response time: 35.82ms (95%), errors: 0.00, reconnects: 0.00

So it takes ~10 seconds to perform failover.

To understand better, let see what happened in MHA-land:

Tue Aug 30 09:33:33 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Aug 30 09:33:33 2016 - [info] Reading application default configuration from /etc/mha.cnf.. ... Read conf and start Tue Aug 30 09:33:47 2016 - [debug] Trying to get advisory lock.. Tue Aug 30 09:33:47 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. ... Wait for errors Tue Aug 30 09:34:47 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) <--- Error time Tue Aug 30 09:34:56 2016 - [warning] Connection failed 4 time(s).. <--- Finally MHA decide to do something Tue Aug 30 09:34:56 2016 - [warning] Master is not reachable from health checker! Tue Aug 30 09:34:56 2016 - [warning] Master mha2r(192.168.1.107:3306) is not reachable! Tue Aug 30 09:34:56 2016 - [warning] SSH is reachable. Tue Aug 30 09:34:58 2016 - [info] Master failover to mha1r(192.168.1.104:3306) completed successfully. <--- end of the failover

MHA sees the server failing at xx:47, but because of the retry and checks validation, it actually fully acknowledges the downtime at xx:56 (~8 seconds after).

To perform the whole failover, it only takes ~2 seconds (again). Because no movable IPs or DNSs were involved, the operations were fast. This is true when the servers have the binary-log there, but it’s a different story if MHA also has to manage and push data from the binarylog to MySQL.

As you can see, ProxySQL can also help reduce the timing for this scenario, totally skipping the network-related operations. These operations are the ones causing the most trouble in these cases.

Percona Monitoring and Management (PMM) is now available

September 13, 2016 - 3:00am

Percona announces the availability of Percona Monitoring and Management (PMM), an open source software database monitoring and management tool. Completely open source and free to download and use, Percona Monitoring and Management provides point-in-time visibility and historical trending of database performance that enables DBAs and application developers to optimize the performance of their MySQL and MongoDB databases.

Percona Monitoring and Management combines several best-of-breed tools, including Grafana, Prometheus, and Consul, in a single, easy-to-manage virtual appliance, along with Percona-developed query analytics, administration, API, agent and exporter components. Percona Monitoring and Management monitors and provides actionable performance data for MySQL variants, including Oracle MySQL Community Edition, Oracle MySQL Enterprise Edition, Percona Server for MySQL, and MariaDB, as well as MongoDB variants, including MongoDB Community Edition, and Percona Server for MongoDB.

PMM is an on-premises solution that keeps all of your performance and query data inside the confines of your environment, with no requirement for any data to cross the Internet.

Percona Monitoring and Management Highlights:

  • Provides query and metric information that enables administrators to optimize database performance
  • Displays current queries and highlights potential query issues to enable faster issue resolution
  • Maps queries against metrics to help make informed decisions about crucial database resources: platform needs, system growth, team focus and the most important database activities.

PMM provides database maintenance teams with better visibility into database and query activity, in order to implement actionable strategies and issue resolution more quickly. More information allows you to concentrate efforts on the areas that yield the highest value.

Like prior versions, PMM is distributed through Docker Hub and is free to download. Full instructions for downloading and installing the server and client are available in the documentation.

A PMM demonstration is available at pmmdemo.percona.com. We have also implemented forums for PMM discussions.

There will also be a webinar with Percona’s Roman Vynar, Lead Platform Engineer on Thursday, September 15, 2016 at 10:00am PDT (UTC-7) about “Identifying and Solving Database Performance Issues with PMM.” Register here for the webinar to learn more about PMM.  Can’t attend the webinar we got you covered! Register anyways and we’ll send you the recording and slides even if you can’t attend the webinar.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system.

As always, thanks for your continued support of Percona!

Is Your Database Affected by CVE-2016-6662?

September 12, 2016 - 3:55pm

In this blog post, I will discuss the CVE-2016-6662 vulnerability, how to tell if it affects you, and how to prevent the vulnerability from affecting you if you have an older version of MySQL.

I’ll also list which MySQL versions include the vulnerability fixes.

As we announced in a previous post, there are certain scenarios in Percona Server (and MySQL) that can allow a remote root code execution (CVE-2016-6662).

Vulnerability approach

The website legalhackers.com contains the full, current explanation of the CVE-2016-6662 vulnerability.

To summarize, the methods used to gain root privileges require multiple conditions:

  1. A remote (or even local) MySQL user that has FILE permissions (or SUPER, which encompasses all of them).
  2. Improper OS files/directories permissions around MySQL configuration files that allow the MySQL system user access to modify or create new configuration files.
  3. Several techniques alter the MySQL configuration to include loading a malicious shared library.
    The techniques currently described require FILE or SUPER privileges, but also include the currently undisclosed CVE-2016-6663 (which demonstrates how to alter the configuration without FILE privileges).
  4. Have that malicious shared library loaded when MySQL restarts, which includes the code that allows privilege escalation.
Fixed versions MySQL fixes

MySQL seems to have already released versions that include the security fixes.

This is coming from the release notes in MySQL 5.6.33:

  • For mysqld_safe, the argument to --malloc-lib now must be one of the directories /usr/lib, /usr/lib64, /usr/lib/i386-linux-gnu, or /usr/lib/x86_64-linux-gnu. In addition, the --mysqld and --mysqld-version options can be used only on the command line and not in an option file. (Bug #24464380)
  • It was possible to write log files ending with .ini or .cnf that later could be parsed as option files. The general query log and slow query log can no longer be written to a file ending with .ini or .cnf. (Bug #24388753)
  • Privilege escalation was possible by exploiting the way REPAIR TABLE used temporary files. (Bug #24388746)

You aren’t affected if you use version 5.5.52, 5.6.33 or 5.7.15.

Release notes: 5.5.525.6.335.7.15

Percona Server

The way Percona increased security was by limiting which libraries are allowed to be loaded with LD_PRELOAD (including --malloc-lib), and limiting them to /usr/lib/, /usr/lib64 and the MySQL installation base directory.

This means only locations that are accessible by root users can load shared libraries.

The following Percona Server versions have this fix:

We are working on releasing new Percona XtraDB Cluster versions as well.

Future Percona Server releases will include all fixes from MySQL.

MariaDB

MariaDB has fixed the issue in 5.5.5110.1.17 and 10.0.27

I have an older MySQL Version, what to do now?

It is possible to change the database configuration so that it isn’t affected anymore (without changing your MySQL versions and restarting your database). There are several options, each of them focusing on one of the conditions required for the vulnerability to work.

Patch mysqld_safe Manually

Just before publishing this, a blogpost came out with another alternative on how to patch your server: https://www.psce.com/blog/2016/09/12/how-to-quickly-patch-mysql-server-against-cve-2016-6662/.

Database user permissions

One way to avoid the vulnerability is making sure no remote user has SUPER or FILE privileges.

However, CVE-2016-6663 mentions there is a way to do this without any FILE privileges (likely related to the REPAIR TABLE issue mentioned in MySQL release notes).

Configuration files permissions

The vulnerability needs to be able to write to some MySQL configuration files. Prevent that and you are secure.

Make sure you configure permissions for various config files as follows:

  • MySQL reads configuration files from different paths, including from your datadir
    • Create an (empty) my.cnf  and .my.cnf in the datadir (usually /var/lib/mysql) and make root the owner/group with 0644 permissions.
    • Other Locations to look into: /etc/my.cnf /etc/mysql/my.cnf /usr/etc/my.cnf ~/.my.cnf  (mysqld --help --verbose shows you where mysqld will look)
  • This also includes !includedir paths defined in your current configurations — make sure they are not writeable by the mysql user as well
  • No config files should be writeable by the mysql user (change ownership and permissions)

Percona Live Europe featured talk with Ronald Bradford — Securing your MySQL/MariaDB data

September 12, 2016 - 8:59am

Welcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Ronald Bradford, Founder & CEO of EffectiveMySQL. His talk will be on Securing your MySQL/MariaDB data. This talk will answer questions like:

  • How do you apply the appropriate filesystem permissions?
  • How do you use TLS/SSL for connections, and are they good for replication?
  • Encrypting all your data at rest
  • How to monitor your database with the audit plugin

. . . and more. I had a chance to speak with Ronald and learn a bit more about database security:

PerconaGive me a brief history of yourself: how you got into database development, where you work, what you love about it?

Ronald: My first introduction to relational theory and databases was with the writings of C.J. Date and Michael Stonebraker while using the Ingres RDBMS in 1988. For 28 years, my industry experience in the database field has covered a number of relational and non-relational products, including MySQL – which I started using at my first startup in 1999. For the last 17 years, I have enjoyed contributing to the MySQL ecosystem in many ways. I’ve consulted with hundreds of organizations, both small and large, that rely on MySQL to deliver strategic value to their business customers. I have given over 130 presentations in the past ten years across six continents and published a number of books and blog articles from my experiences with MySQL and open source. I am also the organizer of the MySQL Meetup group in New York City.

My goals have always been to help educate the current generation of software engineers to appreciate, use and maximize the right product for the job. I always hope that MySQL is the right solution, but recommend other options when it is not.

I am presently looking for my next opportunity to help organizations develop a strategic and robust data infrastructure that ensures business continuity for growing needs – ensuring a reliable and consistent user experience.

Percona: Your talk is called “Securing your MySQL/MariaDB data.” Why is securing your database important, and what are the real-world ramifications for a database security breach?

Ronald: We secure the belongings in our home, we secure the passengers in our car, we secure the possessions we carry on us. Data is a valuable asset for many organizations, and for some it is the only asset of value for continued operation. Should such important business information not have the same value as people or possessions?

Within any industry, you want to be the disruptor and not the disrupted. The press coverage on any alleged or actual data breach generally leads to a loss of customer confidence. This in turn can directly affect your present and future business viability – enabling competitors to take advantage of the situation. Data security should be as important as data recovery and system performance. Today we hear about data breaches on a weekly basis – everything from government departments to large retail stores. We often do not hear of the data breaches that can occur with smaller organizations, who also have your information: your local medical provider, or a school or university that holds your personal information.

A data breach can be much more impactful than data loss. It can be harder to detect and assess the long-term impact of a security breach because there might be unauthorized access over a longer time period. Often there are insufficient audit trails and logs to validate the impact of any security breach. Inadequate access controls can also lead to unauthorized data access both internally and externally. Many organizations fail to manage risk by not providing a “least privileges required approach” for any access to valuable data by applications or staff.

Any recent real-world example highlights the potential of insufficient data security, and therefore the increased risk of your personal information being used illegally. What is your level of confidence about security when you register with a new service and then you receive an email with your login and password in clear text? If your password is not secure, your personal data is also not secure and now it’s almost impossible for your address, phone number and other information to be permanently removed from this insecure site.

Percona: Are there significant differences between security for on-premise and cloud-based databases? What are they?

Ronald: There should be no differences in protecting your data within MySQL regardless of where this is stored.  When using a cloud-based database there is the additional need to have a shared responsibility with your cloud provider ensuring their IaaS and provided services have adequate trust and verification. For example, you need to ensure that provisioned disk and memory is adequately zeroed after use, and also ensure that adequate separation exists between hosts and clients on dedicated equipment in a virtualized environment. While many providers state these security and compliance processes, there have been instances where data has not been adequately protected.

Just as you may trust an internal department with additional security in the physical and electronic access to the systems that hold your data, you should “trust but verify” your cloud provider’s capacity to protect your data and that these providers continue to assess risk regularly and respond appropriately.

Percona: What is changing in database security that keeps you awake at night? What things does the market need to address immediately?

Ronald: A discussion with a CTO recently indicated he was worried about how their infrastructure would support high availability: what is the impact of any outage, and how does the organization know if he is prepared enough? Many companies, regardless of their size, are not prepared for either a lack of availability or a security breach.

The recent Delta is an example of an availability outage that cost the company many millions of dollars. Data security should be considered with the exact same concern, however it is often the poor cousin to availability. Disaster recovery is a commonly used term for addressing the potential loss of access to data, but there is not a well-known term or common processes for addressing data protection.

You monitor the performance of your system for increased load and for slow queries. When did you last monitor the volume of access to secure data to look for unexpected patterns or anomalies? A data breach can be a single SQL statement that is not an expected application traffic pattern. How can you protect your data in this situation? We ask developers to write unit tests to improve code coverage. Does your organization ask developers to write tests to perform SQL injection, or write SQL statements that should not be acceptable to manipulate data and are therefore correctly identified, alerted and actioned? Many organizations run load and volume testing regularly, but few organizations run security drills as regularly.

As organizations continue to address the growing data needs in the digital age, ongoing education and awareness are very important. There is often very little information in the MySQL ecosystem about validating data security, determining what is applicable security monitoring, and what is the validation and verification of authorized and unauthorized data access. What also needs to be addressed is the use (and abuse) of available security in current and prior MySQL versions. The key advancements in MySQL 5.6 and MySQL 5.7, combined with a lack of a migration path for organizations, is a sign that ongoing security improvements are not considered as important as other features.

Percona: What are looking forward to the most at Percona Live Europe this year?

Ronald: Percona Live Europe is a chance for all attendees, including myself, to see, hear and share in the wide industry use of MySQL today (and the possibilities tomorrow).

With eight sessions per time slot, I often wish for the ability to be in multiple places at  once! Of particular interest to myself are new features that drive innovation of the product, such as MySQL group replication.

I am also following efforts related to deploying your application stack in containers using Docker. Solving the state and persistence needs of a database is very different to providing application micro-services. I hope to get a better appreciation for finding a balance between the use of containers, VMs and dedicated hardware in a MySQL stack that promotes accelerated development, performance, business continuity and security.

You can read more about Ronald and his thoughts on database security at ronaldbradford.com.

Want to find out more about Ronald, MySQL/MariaDB and security? Register for Percona Live Europe 2016, and come see his talk Securing your MySQL/MariaDB data.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Percona Server Critical Update CVE-2016-6662

September 12, 2016 - 7:06am

This blog is an announcement for a Percona Server update with regards to CVE-2016-6662.

We have added a fix for CVE-2016-6662 in the following releases:

From seclist.org:

An independent research has revealed multiple severe MySQL vulnerabilities. This advisory focuses on a critical vulnerability with a CVEID of CVE-2016-6662. The vulnerability affects MySQL servers in all version branches (5.7, 5.6, and 5.5) including the latest versions, and could be exploited by both local and remote attackers.

Both the authenticated access to MySQL database (via network connection or web interfaces such as phpMyAdmin) and SQL Injection could be used as exploitation vectors. Successful exploitation could allow attackers to execute arbitrary code with root privileges which would then allow them to fully compromise the server on which an affected version of MySQL is running.

This is a CRITICAL update, and the fix mitigates the potential for remote root code execution.

We encourage our users to update to the latest version of their particular fork as soon as possible, ensuring they have appropriate change management procedures in place beforehand so they can test the update before placing it into production.

Percona would like to thank Dawid Golunski of http://legalhackers.com/ for disclosing this vulnerability in the MySQL software, and working with us to resolve this problem.

Don’t Spin Your Data, Use SSDs!

September 9, 2016 - 1:49pm

This blog post discussed the advantages of SSDs over HDDs for database environments.

For years now, I’ve been telling audiences for my MySQL Performance talk the following: if you are running an I/O-intensive database on spinning disks you’re doing it wrong. But there are still a surprising number of laggards who aren’t embracing SSD storage (whether it’s for cost or reliability reasons).

Let’s look at cost first. As I write this now (September 2016), high-performance server-grade spinning hard drives run for about $240 for 600GB (or $0.40 per GB).  Of course, you can get an 8TB archive drive at about same price (about $0.03 per GB), but it isn’t likely you’d use something like that for your operational database. At the same time, you can get a Samsung 850 EVO drive for approximately $300 (or $0.30 per GB), which is cheaper than the server-grade spinning drive!  

While it’s not the best drive money can buy, it is certainly an order of magnitude faster than any spinning disk drive!

(I’m focusing on the cost per GB rather than the cost of the number of IOPS per drive as SSDs have overtaken HDDs years ago when it comes to IOPS/$.)

If we take a look at Amazon EBS pricing, we will find that Amazon has moved to SSD volumes by default as “General Purpose” storage (gp2). Prices for this volume type run about 2x higher per GB than high-performance HDD-based volumes (st1) and provisioned IOPs volumes. The best volumes for databases will likely run you 4x higher than HDD.

This appears to be a significant cost difference, but keep in mind you can get much more IOPS at much better latency from these volumes. They also handle IO spikes better, which is very important for real workloads.

Whether we’re looking at a cloud or private environment, it is wrong just to look at the cost of the storage alone – you must look at the whole server cost. When using an SSD, you might not need to buy a RAID card with battery-backed-up (BBU) cache, as many SSDs have similar functions built in.

(For some entry-level SSDs, there might be an advantage to purchasing a RAID with BBU, but it doesn’t affect performance nearly as much as for HDDs. This works out well, however, as entry level SSDs aren’t going to cost that much to begin with and won’t make this setup particularly costly, relative to a higher-end SSD.)  

Some vendors can charge insane prices for SSDs, but this is where you should negotiate and your alternative vendor choice powers.

Some folks are concerned they can’t get as much storage per server with SSDs because they are smaller. This was the case a few years back, but not any more. You can find a 2TB 2.5” SSD drive easily, which is larger than the available 2.5” spinning drives. You can go as high as 13TB in the 2.5” form factor

There is a bit of challenge if you’re looking at the NVMe (PCI-E) cards, as you typically can’t have as many of those per server as you could using spinning disks, but the situation is changing here as well with the 6.4TB SX300 from Sandisk/FusionIO or the PM1725 from Samsung. Directly attached storage provides extremely high performance and 10TB-class sizes.  

To get multiple storage units together, you can use hardware RAID, software RAID, LVM striping or some file systems (such as ZFS) can take care of it for you.    

Where do we stand with SSD reliability? In my experience, modern SSDs (even inexpensive ones) are pretty reliable, particularly for online data storage. The shelf life of unpowered SSDs is likely to be less than HDDs, but we do not really keep servers off for long periods of time when running database workloads. Most SSDs also do something like RAID internally (it’s called RAIN) in addition to error correction codes that protect your data from a full single flash chip.

In truth, focusing on storage-level redundancy is overrated for databases. We want to protect most critical applications from complete database server failure, which means using some form of replication, storing several copies of data. In this case, you don’t need bulletproof storage on a single server – just a replication setup where you won’t lose the data and any server loss is easy to handle. For MySQL, solutions like Percona XtraDB Cluster come handy. You can use external tools such as Orchestrator or MHA to make MySQL replication work.  

When it comes to comparing SSD vs. HDD performance, whatever you do with SSDs they will likely still perform better than HDDs. Your RAID5 and RAID6 arrays made from SSDs will beat your RAID10 and RAID0 made from HDDs (unless your RAID card is doing something nasty).

Another concern with SSD reliability is write endurance. SSDs indeed have a specified amount of writes they can handle (after which they are likely to fail). If you’re thinking about replacing HDDs with SSDs, examine how long SSDs would endure under a comparable write load.  

If we’re looking at a high HDD write workload, a single device is likely to handle 200 write IOPS of 16KB (when running InnoDB). Let’s double that. That comes to 6.4MB/sec, which gives us  527GB/day (doing this 24/7). Even with the inexpensive Samsung 850 Pro we get 300TB of official write endurance – enough for 1.5 years. And in reality, drives tend to last well beyond their official specs.    

If you don’t like living on the edge, more expensive server-grade storage options have much better endurance. For example, 6.4TB SX300 offers almost 100x more endurance at 22 Petabytes written.

In my experience, people often overestimate how many writes their application performs on a sustained basis. The best approach is to do the math, but also monitor the drive status with a SMART utility or vendor tool. The tools can alert you in advance when drive wears out.

Whatever your workload is, you will likely find an SSD solution that offers you enough endurance while significantly exceeding the performance of an HDD-based solution.

Finally, there is a third and very important component of SSD reliability for operational database workloads: not losing your data during a power failure. Many “consumer-grade” SSDs come with drive write cache enabled by default, but without proper power loss protection. This means you can lose some writes during a power failure, causing data loss or database corruption.

Disabling write cache is one option, though it can severely reduce write performance and does not guarantee data won’t be lost. Using enterprise-grade SSDs from a reputable vendor is another option, and testing SSDs yourself might be a good idea if you’re on a budget.  

Conclusion

When it comes to operational databases, whether your workload is on-premises or in the cloud,  Don’t spin your data – use SSD. There are choices and options for almost any budget and every workload.

Basic Housekeeping for MySQL Indexes

September 9, 2016 - 10:44am

In this blog post, we’ll look at some of the basic housekeeping steps for MySQL indexes.

We all know that indexes can be the difference between a high-performance database and a bad/slow/painful query ride. It’s a critical part that needs deserves some housekeeping once in a while. So, what should you check? In no particular order, here are some things to look at:

1. Unused indexes

With sys schema, is pretty easy to find unused indexes: use the schema_unused_indexes view.

mysql> select * from sys.schema_unused_indexes; +---------------+-----------------+-------------+ | object_schema | object_name | index_name | +---------------+-----------------+-------------+ | world | City | CountryCode | | world | CountryLanguage | CountryCode | +---------------+-----------------+-------------+ 2 rows in set (0.01 sec)

This view is based on the performance_schema.table_io_waits_summary_by_index_usage table, which will require enabling the Performance Schema, the events_waits_current consumer and the wait/io/table/sql/handler instrument. PRIMARY (key) indexes are ignored.

If you don’t have them enabled, just execute these queries:

update performance_schema.setup_consumers set enabled = 'yes' where name = 'events_waits_current'; update performance_schema.setup_instruments set enabled = 'yes' where name = 'wait/io/table/sql/handler';

Quoting the documentation:

“To trust whether the data from this view is representative of your workload, you should ensure that the server has been up for a representative amount of time before using it.”

And by representative amount, I mean representative: 

  • Do you have a weekly job? Wait at least one week
  • Do you have monthly reports? Wait at least one month
  • Don’t rush!

Once you’ve found unused indexes, remove them.

2. Duplicated indexes

You have two options here:

  • pt-duplicate-key-checker
  • the schema_redundant_indexes view from sys_schema

The pt-duplicate-key-checker is part of Percona Toolkit. The basic usage is pretty straightforward:

[root@e51d333b1fbe mysql-sys]# pt-duplicate-key-checker # ######################################################################## # world.CountryLanguage # ######################################################################## # CountryCode is a left-prefix of PRIMARY # Key definitions: # KEY `CountryCode` (`CountryCode`), # PRIMARY KEY (`CountryCode`,`Language`), # Column types: # `countrycode` char(3) not null default '' # `language` char(30) not null default '' # To remove this duplicate index, execute: ALTER TABLE `world`.`CountryLanguage` DROP INDEX `CountryCode`; # ######################################################################## # Summary of indexes # ######################################################################## # Size Duplicate Indexes 2952 # Total Duplicate Indexes 1 # Total Indexes 37

Now, the schema_redundant_indexes view is also easy to use once you have sys schema installed. The difference is that it is based on the information_schema.statistics table:

mysql> select * from schema_redundant_indexesG *************************** 1. row *************************** table_schema: world table_name: CountryLanguage redundant_index_name: CountryCode redundant_index_columns: CountryCode redundant_index_non_unique: 1 dominant_index_name: PRIMARY dominant_index_columns: CountryCode,Language dominant_index_non_unique: 0 subpart_exists: 0 sql_drop_index: ALTER TABLE `world`.`CountryLanguage` DROP INDEX `CountryCode` 1 row in set (0.00 sec)

Again, once you find the redundant index, remove it.

3. Potentially missing indexes

The statements summary tables from the performance schema have several interesting fields. For our case, two of them are pretty important: NO_INDEX_USED (means that the statement performed a table scan without using an index) and NO_GOOD_INDEX_USED (“1” if the server found no good index to use for the statement, “0” otherwise).

Sys schema has one view that is based on the performance_schema.events_statements_summary_by_digest table, and is useful for this purpose: statements_with_full_table_scans, which lists all normalized statements that have done a table scan.

For example:

mysql> select * from world.CountryLanguage where isOfficial = 'F'; 55a208785be7a5beca68b147c58fe634 - 746 rows in set (0.00 sec) mysql> select * from statements_with_full_table_scansG *************************** 1. row *************************** query: SELECT * FROM `world` . `Count ... guage` WHERE `isOfficial` = ? db: world exec_count: 1 total_latency: 739.87 us no_index_used_count: 1 no_good_index_used_count: 0 no_index_used_pct: 100 rows_sent: 746 rows_examined: 984 rows_sent_avg: 746 rows_examined_avg: 984 first_seen: 2016-09-05 19:51:31 last_seen: 2016-09-05 19:51:31 digest: aa637cf0867616c591251fac39e23261 1 row in set (0.01 sec)

The above query doesn’t use an index because there was no good index to use, and thus was reported. See the explain output:

mysql> explain select * from world.CountryLanguage where isOfficial = 'F'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: CountryLanguage type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 984 Extra: Using where

Note that the “query” field reports the query digest (more like a fingerprint) instead of the actual query.

In this case, the CountryLanguage table is missing an index over the “isOfficial” field. It is your job to decide whether it is worth it to add the index or not.

4. Multiple column indexes order

It was explained before that Multiple Column index beats Index Merge in all cases when such index can be used, even when sometimes you might have to use index hints to make it work.

But when using them, don’t forget that the order matters. MySQL will only use a multi-column index if at least one value is specified for the first column in the index.

For example, consider this table:

mysql> show create table CountryLanguageG *************************** 1. row *************************** Table: CountryLanguage Create Table: CREATE TABLE `CountryLanguage` ( `CountryCode` char(3) NOT NULL DEFAULT '', `Language` char(30) NOT NULL DEFAULT '', `IsOfficial` enum('T','F') NOT NULL DEFAULT 'F', `Percentage` float(4,1) NOT NULL DEFAULT '0.0', PRIMARY KEY (`CountryCode`,`Language`), KEY `CountryCode` (`CountryCode`), CONSTRAINT `countryLanguage_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1

A query against the field “Language” won’t use an index:

mysql> explain select * from CountryLanguage where Language = 'English'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: CountryLanguage type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 984 Extra: Using where

Simply because it is not the leftmost prefix for the Primary Key. If we add the “CountryCode” field, now the index will be used:

mysql> explain select * from CountryLanguage where Language = 'English' and CountryCode = 'CAN'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: CountryLanguage type: const possible_keys: PRIMARY,CountryCode key: PRIMARY key_len: 33 ref: const,const rows: 1 Extra: NULL

Now, you’ll have to also consider the selectivity of the fields involved. Which is the preferred order?

In this case, the “Language” field has a higher selectivity than “CountryCode”:

mysql> select count(distinct CountryCode)/count(*), count(distinct Language)/count(*) from CountryLanguage; +--------------------------------------+-----------------------------------+ | count(distinct CountryCode)/count(*) | count(distinct Language)/count(*) | +--------------------------------------+-----------------------------------+ | 0.2368 | 0.4644 | +--------------------------------------+-----------------------------------+

So in this case, if we create a multi-column index, the preferred order will be (Language, CountryCode).

Placing the most selective columns first is a good idea when there is no sorting or grouping to consider, and thus the purpose of the index is only to optimize where lookups. You might need to choose the column order, so that it’s as selective as possible for the queries that you’ll run most.

Now, is this good enough? Not really. What about special cases where the table doesn’t have an even distribution? When a single value is present way more times than all the others? In that case, no index will be good enough. Be careful not to assume that average-case performance is representative of special-case performance. Special cases can wreck performance for the whole application.

In conclusion, we depend heavily on proper indexes. Give them some love and care once in a while, and the database will be very grateful.

All the examples were done with the following MySQL and Sys Schema version:

mysql> select * from sys.version; +-------------+-----------------+ | sys_version | mysql_version | +-------------+-----------------+ | 1.5.1 | 5.6.31-77.0-log | +-------------+-----------------+

MySQL Replication Troubleshooting: Q & A

September 8, 2016 - 11:53am

In this blog, I will provide answers to the Q & A for the MySQL Replication Troubleshooting webinar.

First, I want to thank everybody for attending the August 25 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: Hi Sveta. One question: how is it possible to get N previous events using the SHOW BINLOG EVENTS command? For example, the position is 999 and I want to analyze the previous five events. Is it possible?

A: Not, there is no such option. You cannot get the previous five events using SHOW BINLOG EVENTS. However, you can use mysqlbinlog with the option --stop-position and tail its output.

Q: We are having issues with inconsistencies over time. We also have a lot of “waiting for table lock” statuses during high volume usage. Would changing these tables to InnoDB help the replicated database remain consistent?

A: Do you use MyISAM? Switching to InnoDB might help, but it depends on what types of queries you use. For example, if you often use the LOCK TABLE  command, that will cause a "waiting for table lock"  error for InnoDB too. Regarding data consistency between the master and slave, you need to use row-based replication.

Q: For semi-sync replication, what’s the master’s behavior when the master never received ACK from any of the slaves?

A: It will timeout after rpl_semi_sync_master_timeout  milliseconds, and then switch to asynchronous replication.

Q: We’re using MySQL on r3.4xlarge EC2 instances (16 CPU). We use RBR. innodb_read_io_threads and innodb_write_io_threads =4. We often experience lags. Would increasing these to eight offer better IO for slaves? What other parameters could boost slave IO?

A: Yes, an increased number of IO threads would most likely improve performance. Other parameters that could help are similar to the ones discussed in “InnoDB Troubleshooting” and “Introduction to Troubleshooting Performance: What Affects Query Execution?” webinars. You need to pay attention to InnoDB options that affect IO (innodb_thread_concurrency, innodb_flush_method, innodb_flush_log_at_trx_commit, innodb_flush_log_at_timeout ) and general IO options, such as sync_binlog .

Q: How many masters can I have working together?

A: What do you mean by “how many masters can [you] have working together”? Do you mean circular replication or a multi-master setup? In any case, the only limitation is hardware. For a multi-master setup you should ensure that the slave has enough resources to process all requests. For circular replication, ensure that each of the masters in the chain can handle the increasing number of writes as they replicate down the chain, and do not lead to permanently increasing slave lags.

Q: What’s the best way to handle auto_increment?

A: Follow the advice in the user manual: set auto_increment_offset  to a unique value on each of servers,auto_increment_increment  to the number of servers and never update auto-incremented columns manually.

Q: I configured multi threads replication. Sometimes the replication lag keeps increasing while the slave was doing “invalidating query cache entries(table)”.  How should I do to fine tune it?

A: The status "invalidating query cache entries(table)" means that the query cache is invalidating entries, and has been changed by a command currently being executed by the slave SQL thread. To avoid this issue, you need to keep the query cache small (not larger than 512 MB) and de-fragment it from time to time using the FLUSH QUERY CACHE command.

Q: Sometimes when IO is slow and during lag we see info: Reading event from the relay log “Waiting for master to send event” — How do we troubleshoot to get more details.

A: The "Waiting for master to send event" state shows that the slave IO thread sent a request for a new event, and is waiting for the event from the master. If you believe it hasn’t received the event in a timely fashion, check the error log files on both the master and slave for connection errors. If there is no error message, or if the message doesn’t provide enough information to solve the issue, use the network troubleshooting methods discussed in the “Troubleshooting hardware resource usage” webinar.

Save

Percona is Hiring: Director of Platform Engineering

September 8, 2016 - 11:20am

Percona is hiring a Director of Platform Engineering. Find out more!

At Percona, we recognize you need much more than just a database server to successfully run a database-powered infrastructure. You also need strong tools that deploy, manage and monitor the software. Percona’s Platform Engineering group is responsible just for that. They build next-generation open source solutions for the deployment, monitoring and management of open source databases.

This  team is currently responsible for products such as Percona Toolkit , Percona Monitoring Plugins and Percona Monitoring and Management.  

Percona builds products that advance state-of-the-art open source software. Our products help our customers monitor and manage their databases. They help our services team serve customers faster, better and more effectively.

The leader of the Platform Engineering group needs a strong vision, as well as an understanding of market trends, best practices for automation, monitoring and management – in the cloud and on premises. This person must have some past technical operations background and experience building and leading engineering teams that have efficiently delivered high-quality software. The ideal candidate will also understand the nature of open source software development and experience working with distributed teams.

This position is for “player coach” – you will get your hands dirty writing code, performing quality assurance, making great documentation and assisting customers with troubleshooting.

We not looking for extensive experience with a particular programming language, but qualified candidates should be adept at learning new programming languages. Currently, our teams use a combination of Perl, Python, Go and Javascript.

The Director of Platform Engineering reports to Vadim Tkachenko, CTO and VP of Engineering. They will also work closely with myself, other senior managers and experts at Percona.

Interested? Please apply here on Percona’s website.

Percona Live Europe featured talk with Igor Canadi — Everything you wanted to know about MongoRocks

September 7, 2016 - 10:47am

Welcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Igor Canadi, Software Engineer at Facebook, Inc. His talk will be on Everything you wanted to know about MongoRocks. MongoRocks is MongoDB with RocksDB storage engine. It was developed by Facebook, where it’s used to power mobile backend as a service provider Parse.

I had a chance to speak with Igor and learn a bit more about these questions:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it?

Igor: After I finished my undergrad at the University of Zagreb in Croatia, I joined University of Wisconsin-Madison’s Masters program. Even though UW-M is famous for its work on databases, during my two years there I worked in a different area. However, as I joined Facebook after school, I heard of a cool new project called RocksDB. Everything about building a new storage engine sounded exciting to me, although I had zero idea how thrilling the ride will actually be. The best part was working with and getting to know amazing people from Facebook, Parse, MongoDB, Percona, and many other companies that are using or experimenting with RocksDB.

Percona: Your talk is called “Everything you wanted to know about MongoRocks.” Briefly, what is MongoRocks and why did it get developed?

Igor: Back in 2014 MongoDB announced that they are building a pluggable storage engine API, which would enable MongoDB users to seamlessly choose a storage engine that works best for their workload. Their first prototype was actually using RocksDB as a storage engine, which was very exciting for us. However, they bought WiredTiger soon after, another great storage engine, and decided to abandon MongoDB+RocksDB project. At the same time, Parse was running into scaling challenges with their MongoDB deployment. We decided to help out and take over the development of MongoRocks. We started rolling it out at Parse in March of 2015 already and completed the rollout in October. Running MongoRocks instead of MongoDB with the MMap storage engine resulted in much greater efficiency and lower latencies in some scenarios. Some of the experiences are captured in Parse’s blog posts: http://blog.parse.com/announcements/mongodb-rocksdb-parse/ and http://blog.parse.com/learn/engineering/mongodb-rocksdb-writing-so-fast-it-makes-your-head-spin/

Percona: What are the workloads and database environments that are best suited for a MongoRocks deployment? Do you see and expansion of the solution to encompass other scenarios?

Igor: Generally speaking, MongoRocks should compress really well. Over the years of using LSM engines, we learned that its compression rates are hard to beat. The difference can sometimes be substantial. For example, many benchmarks of MyRocks, which is a MySQL with RocksDB storage engines, have shown that compressed InnoDB uses two times as much space as compressed RocksDB. With better compression, more of your data fits in memory, which could also improve read latencies and lower the stress on storage media. However, this is a tricky question to answer generally. It really depends on the metrics you care about. One great thing about Mongo and different storage engines is that the replication format is the same across all of them, so it’s simple to try it out and see how it performs under your workload. You can just add an additional node in your replica set that’s using RocksDB and monitor the metric you care about on that node.

Percona: What are the unique database requirements at Facebook that keep you awake at night? What would you most like to see feature-wise in MongoDB in the near future (or any database technology)?

Igor: One of the most exciting database projects that we’re working on at Facebook is MyRocks, which I mentioned previously. Currently, we use MySQL with InnoDB to store our Facebook graph and we are experimenting with replacing that with MyRocks. The main motivation behind the project is 2x better compression rates, but we also see better performance in some areas. If you’re attending Percona Live Europe I encourage you to attend either Mark Callaghan’s talk on MyRocks, or Yoshinori’s 3-hour tutorial to learn more.

Percona: What are looking forward to the most at Percona Live Europe this year?

Igor: The best part of attending conferences is the people. I am looking forward to seeing old friends and meeting new ones. If you like to talk storage engines, hit me up!

You can read more about Igor’s thoughts on MongoRocks at his twitter feed.

Want to find out more about Igor, Facebook and MongoRocks? Register for Percona Live Europe 2016, and come see his talk Everything you wanted to know about MongoRocks.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

MyRocks Docker images

September 6, 2016 - 1:28pm

In this post, I’ll point you to MyRocks Docker images with binaries, allowing you to install and play with the software.

During the @Scale conference, Facebook announced that MyRocks is mature enough that it has been installed on 5% of Facebook’s MySQL slaves. This has saved 50% of the space on these slaves, which allows them to decrease the number of servers by half. Check out the announcement here:  https://code.facebook.com/posts/190251048047090/myrocks-a-space-and-write-optimized-mysql-database/

Those are pretty impressive numbers, so I decided to take a serious look at MyRocks. The biggest showstopper is usually binary availability, since Facebook only provides the source code: https://github.com/facebook/mysql-5.6.

You can get the image from https://hub.docker.com/r/perconalab/myrocks/.

To start MyRocks:

docker run -d --name myr -P  perconalab/myrocks

To access it, use a regular MySQL client:

mysql -h127.0.0.1

From there you should see RocksDB installed:

show engines; +------------+---------+----------------------------------------------------------------+--------------+------+------------+ | Engine | Support | Comment | Transactions | XA | Savepoints | +------------+---------+----------------------------------------------------------------+--------------+------+------------+ | ROCKSDB | DEFAULT | RocksDB storage engine | YES | YES | YES |

I hope it makes easier to start experimenting with MyRocks!

MongoDB at Percona Live Europe

September 6, 2016 - 8:28am

This year, you will find a great deal about MongoDB at Percona Live Europe.

As we continue to work on growing the independent MongoDB ecosystem, this year’s Percona Live Europe in Amsterdam includes many talks about MongoDB. If your company uses MongoDB technologies, is focused exclusively on developing with MongoDB or MongoDB operations, or is just evaluating MongoDB, attending Percona Live Europe will prove a valuable experience.  

As always with Percona Live conferences, the focus is squarely on the technical content — not sales pitches. We encourage our speakers to tell the truth: the good, the bad and the ugly. There is never a “silver bullet” when it comes to technology — only tradeoffs between different solution options.

As someone who has worked in database operations for more than 15 years, I recognize and respect the value of “negative information.” I like knowing what does not work, what you should not do and where trouble lies. Negative information often proves more valuable than knowing how great the features of a specific technology work — especially since the product’s marketing team tends to highlight those very well (and they seldom require independent coverage).

For MongoDB at this year’s Percona Live Europe:
  • We have talks about MongoRocks, a RocksDB powered storage engine for MongoDB — the one you absolutely need to know about if you’re looking to run the most efficient MongoDB deployment at scale!  
  • We will cover MongoDB Backups best practices, as well as several talks about MongoDB monitoring and management  (1, 2, 3) — all of them with MongoDB Community Edition and Percona Server for MongoDB (so they don’t require a MongoDB Enterprise subscription).

There will also be a number of talks about how MongoDB interfaces with other technologies. We show how ToroDB can use the MongoDB protocol while storing data in a relational database (and why that might be a good idea), we contrast and compare MySQL and MongoDB Geospatial features, and examine MongoDB from MySQL DBA point of view.

We also how to use Apache Spark to unify data from MongoDB, MySQL, and Redis, and what are generally the best practices for choosing databases for different application needs.

Finally, if you’re just starting with MongoDB and would like a jump start before attending more detailed MongoDB talks, we’ve got a full day MongoDB 101 tutorial for you.

Join us for the full conference, or register for just one day if that is all your schedule allows. But come to Percona Live Europe in Amsterdam on October 3-5 to get the best and latest MongoDB information.

InnoDB Troubleshooting: Q & A

September 2, 2016 - 2:12pm

In this blog, I will provide answers to the Q & A for the InnoDB Troubleshooting webinar.

First, I want to thank everybody for attending the August 11 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: What’s a good speed for buffer pool speed/size for maximum query performance?

A: I am sorry, I don’t quite understand the question. InnoDB buffer pool is an in-memory buffer. In an ideal case, your whole active dataset (rows that are accessed by application regularly) should be in the buffer pool. There is a good blog post by Peter Zaitsev describing how to find the best size for the buffer pool.

Q: Any maximum range for these InnoDB options?

A: I am again sorry, I only see questions after the webinar and don’t know which slide you were on when you asked about options. But generally speaking, the maximum ranges should be limited by hardware: the size of InnoDB buffer pool limited by the amount of physical memory you have, the size of innodb_io_capacity  limited by the number of IOPS which your disk can handle, and the number of concurrent threads limited by the number of CPU cores.

Q: On a AWS r3.4xlarge, 16 CPU, 119GB of RAM, EBS volumes, what innodb_thread_concurrency, innodb_read_io_threads, innodb_write_io_threads would you recommend? and innodb_read_io_capacity?

A: innodb_thread_concurrency = 16, innodb_read_io_threads = 8, innodb_write_io_threads = 8, innodb_io_capacity — but it depends on the speed of your disks. As far as I know, AWS offers disks with different speeds. You should consult IOPS about what your disks can handle when setting innodb_io_capacity, and “Max IOPS” when setting innodb_io_capacity_max.

Q: About InnoDB structures and parallelism: Are there InnoDB settings that can prevent or reduce latching (causes semaphore locks and shutdown after 600s) that occur trying to add an index object to memory but only DML queries on the primary key are running?

A: Unfortunately, semaphore locks for the CREATE INDEX command are not avoidable. You only can affect other factors that speed up index creation. For example, how fast you write records to the disk or how many concurrent queries you run. Kill queries that are waiting for a lock too long. There is an old feature request asking to handle long semaphore waits gracefully. Consider clicking “Affects Me” button to bring it to the developers’ attention.

Q: How can we check these threads?

A: I assume you are asking about InnoDB threads? You can find information about running threads in SHOW ENGINE INNODB STATUS :

-------- FILE I/O -------- I/O thread 0 state: waiting for completed aio requests (insert buffer thread) I/O thread 1 state: waiting for completed aio requests (log thread) I/O thread 2 state: waiting for completed aio requests (read thread) I/O thread 3 state: waiting for completed aio requests (read thread) I/O thread 4 state: waiting for completed aio requests (read thread) I/O thread 5 state: waiting for completed aio requests (read thread) I/O thread 6 state: waiting for completed aio requests (write thread) I/O thread 7 state: waiting for completed aio requests (write thread) I/O thread 8 state: waiting for completed aio requests (write thread) I/O thread 9 state: waiting for completed aio requests (write thread) Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] , ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 1; buffer pool: 0 529 OS file reads, 252 OS file writes, 251 OS fsyncs 0.74 reads/s, 16384 avg bytes/read, 7.97 writes/s, 7.94 fsyncs/s

And in the Performance Schema THREADS table:

mysql> select thread_id, name, type from performance_schema.threads where name like '%innodb%'; +-----------+----------------------------------------+------------+ | thread_id | name | type | +-----------+----------------------------------------+------------+ | 2 | thread/innodb/io_handler_thread | BACKGROUND | | 3 | thread/innodb/io_handler_thread | BACKGROUND | | 4 | thread/innodb/io_handler_thread | BACKGROUND | | 5 | thread/innodb/io_handler_thread | BACKGROUND | | 6 | thread/innodb/io_handler_thread | BACKGROUND | | 7 | thread/innodb/io_handler_thread | BACKGROUND | | 8 | thread/innodb/io_handler_thread | BACKGROUND | | 9 | thread/innodb/io_handler_thread | BACKGROUND | | 10 | thread/innodb/io_handler_thread | BACKGROUND | | 11 | thread/innodb/io_handler_thread | BACKGROUND | | 13 | thread/innodb/srv_lock_timeout_thread | BACKGROUND | | 14 | thread/innodb/srv_monitor_thread | BACKGROUND | | 15 | thread/innodb/srv_error_monitor_thread | BACKGROUND | | 16 | thread/innodb/srv_master_thread | BACKGROUND | | 17 | thread/innodb/srv_purge_thread | BACKGROUND | | 18 | thread/innodb/page_cleaner_thread | BACKGROUND | | 19 | thread/innodb/lru_manager_thread | BACKGROUND | +-----------+----------------------------------------+------------+ 17 rows in set (0.00 sec)

Q: Give brief on InnoDB thread is not same as connection thread.

A: You create a MySQL connection thread each time the client connects to the server. Generally, the lifetime of this thread is the same as the connection (I won’t discuss the thread cache and thread pool plugin here to avoid unnecessary complexity). This way, if you have 100 connections you have 100 connection threads. But not all of these threads do something. Some are actively querying MySQL, but others are sleeping. You can find the number of threads actively doing something if you examine the status variable Threads_running. InnoDB doesn’t create as many threads as connections to perform its job effectively. It creates fewer threads (ideally, it is same as the number of CPU cores). So, for example just 16 InnoDB threads can handle100 and more connection threads effectively.

Q: How can we delete bulk data in Percona XtraDB Cluster?  without affecting production? nearly 6 million records worth 40 GB size table

A: You can use the utility pt-archiver. It deletes rows in chunks. While your database will still have to handle all these writes, the option --max-flow-ctl  pauses a purge job if the cluster spent too much time pausing for flow control.

Q: Why do we sometimes get “–tc-heuristic-recover” message in error logs? Especially when we recover after a crash? What does this indicate? And should we commit or rollback?

A: This means you used two transactional engines that support XA in the same transaction, and mysqld crashed in the middle of the transaction. Now mysqld cannot determine which strategy to use when recovering transactions: either COMMIT or ROLLBACK. Strangely, this option is documented as “not used”. It certainly is, however. Test case for bug #70860 proves it. I reported a documentation bug #82780.

Q: Which parameter controls the InnoDB thread count?

A: The main parameter is innodb_thread_concurrency. For fine tuning, use innodb_read_io_threads, innodb_write_io_threads, innodb_purge_threads, innodb_page_cleaners. Q:

Q: At what frequency will the InnoDB status be dumped in a file by using innodb-status-file?

A: Approximately every 15 seconds, but it can vary slightly depending on the server load.

Q: I faced an issue that once disk got detached from running server due to some issue on AWS ec2. MySQL went to default mode. After MySQL stopped and started, we observed slave skipped some around 15 mins data. We got it by foreign key relationship issue. Can you please explain why it was skipped data in slave?

A: Amazon Aurora supports two kinds of replication: physical as implemented by Amazon (this is the default for replicas in the same region), and the regular asynchronous replication for cross-region replication. If you use the former, I cannot help you because this is a closed-source Amazon feature. You need to report a bug to Amazon. If you used the latter, this looks buggy too. According to my experience, it should not happen. With regular replication you need to check which transactions were applied (best if you use GTIDs, or at least the log-slave-updates option) and which were not. If you find a gap, report a bug at bugs.mysql.com.

Q: Can you explain more about adaptive hash index?

A: InnoDB stores its indexes on disks as a B-Tree. While B-Tree indexes are effective in general, some queries can take advantage of using much simpler hash indexes. While your server is in use, InnoDB analyzes the queries it is currently processing and builds an in-memory hash index inside the buffer pool (using the prefix of the B-Tree key). While adaptive hash index generally works well, “with some workloads, the speedup from hash index lookups greatly outweighs the extra work to monitor index lookups and maintain the hash index structure” Another issue with adaptive hash index is that until version 5.7.8, it was protected by a single latch — which could be a contention point under heavy workloads. Since 5.7.8, adaptive hash index can be partitioned. The number of parts is controlled by option innodb_adaptive_hash_index_parts.

Save

Percona Live Europe featured talk with Manyi Lu — MySQL 8.0: what’s new in Optimizer

September 1, 2016 - 2:09pm

Welcome to a new Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Manyi Lu, Director Software Development at Oracle. Her talk will be on MySQL 8.0: what’s new in Optimizer. There are substantial improvements in the optimizer in MySQL 5.7 and MySQL 8.0. Most noticeably, users can now combine relational data with NoSQL using the new JSON features. I had a chance to speak with Manyi and learn a bit more about the MySQL 8.0:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Manyi: Oh, my interest in database development goes way back to university almost twenty years ago. After graduation, I joined local startup Clustra and worked on the development of a highly available distributed database system for the telecom sector. Since then, I have worked on various aspects of the database, kernel, and replication. Lately I am heading the MySQL optimizer and GIS team.

What I love most about my work are the talented and dedicated people I am surrounded by, both within the team and in the MySQL community. We are passionate about building a database used by millions.

Percona: Your talk is called “MySQL 8.0: what’s new in Optimizer.” So, obvious question, what is new in the MySQL 8.0 Optimizer?

Manyi: There are a number of interesting features in 8.0. CTE or Common Table Expression has been one of the most demanded SQL features. MySQL 8.0 will support both the WITH and WITH RECURSIVE clausesA recursive CTE is quite useful for reproducing reports based on hierarchical data. For DBAs, Invisible Index should make life easier. They can mark an index invisible to the optimizer, check the performance and then decide to either drop it or keep it. On the performance side, we have improved the performance of table scans, range scans and similar queries by batching up records read from the storage engine into the server. We have significant work happening in the cost model area. In order to produce more optimal query plans, we have started the work on adding support for histograms, and for taking into account whether data already is in memory or needs to be read from disk.

Besides the optimizer, my team is also putting a major effort into utf8mb4 support. We have added a large set of utf8mb4 collations based on the latest Unicode standard. These collations have better support for emojis and languages. Utf8 is the dominating character encoding for the web, and this move will make the life easier for the vast majority of MySQL users. We also plan to add support for accent and case sensitive collations.

Please keep in mind that 8.0.0 is the first milestone release. There are quite a few features in the pipeline down the road.

Percona: How are some of the bridges between relational and NoSQL environments (like JSON support) of benefit to database deployments?

Manyi: The JSON support that we introduced in 5.7 has been immensely popular because it solves some very basic day-to-day problems. Relational database forces you to have a fixed schema, and the JSON datatype gives you the flexibility to store data without a schema. In the past, people stored relational data in MySQL and had to install yet another datastore to handle unstructured or semi-structured data that are schema-less in nature. With JSON support, you can store both relational and non-relational data in the same database, which makes database deployment much simpler. And not only that, but you can also perform queries across the boundaries of relational and non-relational data.

Clients that communicate with a MySQL Server using the newly introduced X Protocol can use the X DevAPI to develop applications. Developers do not even need to understand SQL if they do not want to. There are a number of connectors that support the X protocol, so you can use X DevApi in your preferred programming language. We have made MySQL more appealing to a larger range of developers.

Percona: What is the latest on the refactoring of the MySQL Optimizer and Parser?

Manyi: The codebase of optimizer and parser used to be quite messy. The parsing, optimizing and execution stages were intermingled, and the code was hard to maintain. We have had a long-running effort to clean up the codebase. In 5.7, the optimization stage was separated from the execution stage. In 8.0, the focus is refactoring the prepare stage and complete parser rewrite.

We have already seen the benefits of the refactoring work. Development time on new features has been reduced. CTE is a good example. Without refactoring done previously, it would have taken much longer to implement CTE. With a cleaner codebase, we also managed to reduce the bug count, which means more development resources can be allocated to new features instead of maintenance.

Percona: Where do you see MySQL heading in order to deal with some of the database trends that keep you awake at night?

Manyi: One industry trend is cloud computing and Database as a Service becoming viable options to in-house databases. In particular, it speeds up technology deployments and reduces initial investments for smaller organizations. MySQL, being the most popular open source database, fits well into the cloud data management trend.

What we can do is make MySQL even better in the cloud setting. E.g., better support for horizontal scaling, fail-over, sharding, cross-shard queries and the like.

Percona: What are looking forward to the most at Percona Live Europe this year?

Manyi: I like to speak and get feedback from MySQL users. Their input has a big impact on our roadmap. I also look forward to learning more about innovations by web-scale players like Facebook, Alibaba and others. I always feel more energized after talking to people who are passionate about MySQL and databases in general.

You can learn more about Manyi and her thoughts on MySQL 8.0 here: http://mysqlserverteam.com/

Want to find out more about Manyi, MySQL and Oracle? Register for Percona Live Europe 2016, and see her talk MySQL 8.0: what’s new in Optimizer.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.



General Inquiries

For general inquiries, please send us your question and someone will contact you.