EmergencyEMERGENCY? Get 24/7 Help Now!
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 1 day 18 hours ago

Virtual columns in MySQL and MariaDB

March 4, 2016 - 12:14pm

In this blog post, we’ll compare virtual columns in MySQL and MariaDB.

Virtual columns are one of my top features in MySQL 5.7: they can store a value that is derived from one or several other fields in the same table in a new field. It’s a very good way to build a functional index. This feature has been available in MariaDB for some time, so let’s compare the two and see if they are equivalent. We’ll look at different aspects for this comparison.


The MariaDB documentation is very easy to find.

Finding the documentation for virtual columns in 5.7 is a bit more challenging. Here is the best link I’ve found.

The MariaDB documentation isn’t  clear when you should use a persistent column rather than a virtual one. If you read carefully, you’ll see that indexes are only supported on persistent columns, but the pros and cons of both options could have been better presented.

For MySQL there is one interesting paragraph listing the potential use cases for stored columns and virtual columns. This paragraph is not super visible, but the gist of it is “always use a virtual column except if the value is too expensive to evaluate on the fly.” Note that you don’t need to use a stored column to index it in 5.7.


Creating a virtual column is very similar in both systems:

ALTER TABLE sbtest1 ADD reverse_pad char(60) GENERATED ALWAYS AS (reverse(pad)) VIRTUAL;

Note that NOT NULL is not supported with MariaDB while it’s allowed in 5.7:

# MariaDB 10.0 MariaDB [db1]> ALTER TABLE sbtest1 ADD reverse_pad char(60) GENERATED ALWAYS AS (reverse(pad)) VIRTUAL NOT NULL; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'NOT NULL' at line 1 # 5.7 mysql> ALTER TABLE sbtest1 ADD reverse_pad char(60) GENERATED ALWAYS AS (reverse(pad)) VIRTUAL NOT NULL; Query OK, 0 rows affected (0.07 sec) Records: 0 Duplicates: 0 Warnings: 0

When creating a materialized virtual column, the syntax is unfortunately not identical: MariaDB has PERSISTENT columns while 5.7 has STORED columns. It doesn’t look like a big deal, but it’s another item to add to a check list before a migration.

Adding a virtual column

# 5.7 ALTER TABLE sbtest1 ADD reverse_pad char(60) GENERATED ALWAYS AS (reverse(pad)) VIRTUAL NOT NULL; Query OK, 0 rows affected (0.03 sec)

Great! Creating the column is only a metadata change, so it runs nearly instantly whatever the size of the table is.

With MariaDB, it’s quite different:

# MariaDB 10.0 ALTER TABLE sbtest1 ADD reverse_pad char(60) GENERATED ALWAYS AS (reverse(pad)) VIRTUAL; Query OK, 0 rows affected (7 min 8.50 sec)

Yes, a full table rebuild was needed. And if we are running some sysbench insert workload, we can easily see that this is not an online rebuild – for around 1/3 of the schema change, writes were stalled:


That’s probably one of the most striking differences: with MariaDB a column must be PERSISTENT for it to be indexed. This is not necessary in MySQL 5.7. The only situation when an indexed column in 5.7 must be STORED is when it’s a primary key.

When it comes to adding an index on several columns, some being regular columns and some being virtual columns, both versions allow this action:

# 5.7 mysql> ALTER TABLE sbtest1 ADD INDEX k_rev (k, reverse_pad); Query OK, 0 rows affected (2 min 38.14 sec) # MariaDB 10.0 MariaDB [db1]> ALTER TABLE sbtest1 ADD INDEX k_rev (k, reverse_pad); Query OK, 10187085 rows affected (4 min 43.76 sec)

The big difference though is that adding the index is performed online in 5.7, while it’s a blocking operation in MariaDB 10.


While at first sight MariaDB 10 and MySQL 5.7 offer very similar features with virtual columns, the reality is quite different: for virtual columns in MySQL and MariaDB the syntax is not exactly the same, adding a virtual column is not done the same way and indexing sets different constraints. The MySQL 5.7 implementation seems more polished for a production usage with large tables and/or heavy traffic.

Percona Server 5.5.48-37.8 is now available

March 4, 2016 - 7:18am

Percona is glad to announce the release of Percona Server 5.5.48-37.8 on March 4, 2016. Based on MySQL 5.5.48, including all the bug fixes in it, Percona Server 5.5.48-37.8 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. Details of the release can be found in the 5.5.48-37.8 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:

  • With Expanded Fast Index Creation enabled, DDL queries involving InnoDB temporary tables would cause later queries on the same tables to produce warnings that their indexes were not found in the index translation table. Bug fixed #1233431.
  • Package upgrade on Ubuntu would run mysql_install_db even though data directory already existed. Bug fixed #1457614.
  • Starting MySQL with systemctl would fail with a timeout if the socket was specified with a custom path. Bug fixed #1534825.
  • mysqldumpslow script has been removed because it was not compatible with Percona Server extended slow query log format. Please use pt-query-digest from Percona Toolkit instead. Bug fixed #856910.
  • When cmake/make/make_binary_distribution workflow was used to produce binary tarballs it would produce tarballs with mysql-... naming instead of percona-server-.... Bug fixed #1540385.

Other bugs fixed: #1521120 and #1534246.

Release notes for Percona Server 5.5.48-37.8 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

Deals of a “Live” time: $5 Percona Live Keynote Passes!

March 3, 2016 - 9:55am

The Percona Live Data Performance Conference 2016 in Santa Clara, California is just around the corner: April 18-21. We’re busy getting things ready to make sure everybody gets the most out of their time there. As part of that, we have some news and a couple of outstanding offers for you! $5 Percona Live keynote passes and $101 101 crash courses.

Percona Live MySQL Conference 2016 Keynote Speakers

We are extremely pleased with this year’s keynote speakers. We have a wide assortment of industry leaders and experts presenting a great set of topics, as well as special guest speaker Bill Nye!

The schedule is as follows:

Day 1

9:00 AM – Peter Zaitsev, Percona, opens Percona Live Data Performance Conference 2016.

9:20 AM – Chad Jones, Deep Information Science will present “Transcending database tuning problems: How machine learning helps DBAs play more ping pong,” which will show how unsupervised machine learning based on resource, workload and information modeling can predictively and continuously tune databases.

9:50 AM – Bill Nye, Scientist and Media Personality, will present “Things to Keep a Nye On,” where he’ll discuss the importance of science, critical thinking, and reason.

Day 2

9:00 AM – Patrick McFadin, DataStax, will present “Take back the power in your cloud applications with Apache Cassandra,” where he’ll talk about Apache Cassandra cloud applications.

9:25 AM – Experts from Deep Information Science, RocksDB, Red Hat, Intel and Percona will present “Data in the Cloud Keynote Panel: Cloudy with a chance of running out of disk space?  Or Sunny times ahead?” where they will discuss how new technologies, revamped products and a never ending stream of idea’s aim to improve the performance and manageability of cloud-based data, and where the industry is heading.

9:50 AM – Tomas Ulin, Oracle, will present “MySQL: Combining SQL and NoSQL,” which will explore how MySQL can deliver on the promises of NoSQL, while keeping all the proven benefits of SQL.

10:15 AM – Mark Callaghan, RocksDB, will present “MyRocks, MongoRocks and RocksDB,” and explain how and why MyRocks provides better performance, efficiency and compression using real and synthetic workloads

Day 3

9:00 AM – Peter Zaitsev, Percona, will present “Winning with Open Source Databases,” and will demonstrate how Percona is a true open source partner that helps you optimize your database performance to better run your business.

9:30 AM – Anurag Gupta, Amazon Web Services, will present “AWS Big Data Services:  Reimagining Big Data Processing in the Cloud,” which will provide an under-the-hood view of some of the most popular Big Data services at AWS including Amazon DynamoDB, Amazon Redshift, Amazon EMR and Amazon Aurora.

9:55 AM – Community Award Ceremony

$5 Keynote Passes

To help make the keynotes and the community events accessible to the greatest number of community members, we are once again offering  $5 Percona Live keynote passes for the Percona Live Data Performance Conference 2016. A keynote pass provides access to the keynote addresses, Birds of a Feather sessions, the exhibit floor, and the Community Networking Reception on Thursday night. The first 100 people who register for an Expo-Only pass (new registrations only) using the discount code “KEY” will be able to register for just $5.

$101 101 Passes

For a limited time, you can get access to the 101 Crash Courses for only $101!

Percona Live is once again hosting Crash Courses for developers, systems administrators, and other technical resources. This year, we’ve compacted the training into a single day, and are offering two options: MySQL 101 and MongoDB 101!

If you use code Single101, you can get either the MySQL or MongoDB crash course track for $101. If you want to get both tracks for $202, use code Double202.

This deal expires soon, so reserve your spot now!

End of Advanced Rate Pricing

Just a quick note: advanced rate pricing is set to expire March 6th! If you haven’t secured your reservation, do now before rates go up! It’s a fantastic deal, but it won’t last for much longer.

Reserve your place here.

Percona Live featured talk with Mark Callaghan – Keynote speaker: RocksDB

March 2, 2016 - 3:43pm

Welcome to the next installment of our talks with Percona Live Data Performance Conference 2016 speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus!

In this installment, we’ll meet Mark Callaghan, MTS at Facebook. He will be a keynote speaker at Percona Live, discussing Facebook and RocksDB. I had a chance to speak with Mark and learn a bit more about his experiences at Facebook:

PerconaGive me a brief history of yourself: how you got into database development, where you work, what you love about it.

Mark: Database development wasn’t an area I had much interest in until I finished Graduate school at the University of Wisconsin-Madison, with a Master’s degree in Computer Science. I wanted to live in Portland, Oregon and was offered two jobs – one at Informix and the other at Intel. I decided to work on database internals at Informix mainly because there was an office downtown. After a year, I was offered a job at Oracle and stayed there for eight years working on the query execution engine with a focus on bitmap indexes. The last ten years of my career have been devoted to web-scale MySQL at Google and Facebook.

I’m nearing my seven year anniversary at Facebook and am excited for many more years on my team. Our team focuses on using MySQL and RocksDB for transaction processing and closely aligns with other teams here at Facebook. Our team is highly productive and is privileged enough to work on many interesting challenges. The best part is that we get to share much of our work via open source projects. This is a great place to be.

I am most proud of the improvements we have made to quality-of-service and availability for MySQL. Many people have delivered high-impact projects, we have more opportunities to make things better and the MyRocks effort is an example of that.

Percona: You’re giving a keynote lecture on RocksDB. RocksDB was famously engineered at Facebook. What was it about the Facebook environment that required its own, specially-built storage engine?

Mark: The initial use case for RocksDB was to create a database that provides low-latency and high throughput with fast storage devices. RocksDB is embedded so it avoids network latency. It excels at consuming more of the I/O capacity available from fast storage devices. For some workloads, it can sustain millions of operations per second on our servers.

Over time, there have been additional use cases, and RocksDB has become popular within Facebook where it handles more than 4B QPS. It is also being used by a number of other web-scale companies.

Percona: Why would somebody choose RocksDB over a different storage engine? Where does it fit best (outside of Facebook, of course)?

Mark: MyRocks is the name of the RocksDB storage engine for MySQL. Storage efficiency is the primary motivation for MyRocks. It has better compression and less write amplification than InnoDB. For interesting workloads, we get 2X better compression and one-half the write-amplification compared to InnoDB. This means we can use less storage for the same workload and not decrease device lifetime when the storage is SSD.

It turns out that in some cases we can demonstrate better performance than InnoDB. This is an unexpected bonus and I spend a lot of time explaining when and why this occurs.

Percona: What do you see as an issue that we the database community need to be on top of with regard development? What keeps you up at night with regard to the future of RocksDB?

Mark: It takes a community to make an open source project successful. We’d like to continue building our community for MyRocks, and are excited for its future! I hope to see the MyRocks engine in Percona Server and MariaDB, and look forward to attending other talks at Percona Live in 2017.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016.

Mark: I look forward to learning more about key data performance challenges faced in the community, and sharing more about how MyRocks can help address various needs. People from the MyRocks team, including myself, will be in a booth to answer questions. Yoshinori Matsunobu will give a three-hour tutorial on MyRocks. I look forward to learning how other people solve problems with MySQL. Finally, I get to see many friends.

Percona: Any blogs you want to promote?

I focus on performance and efficiency. My favorite blogs for that are from Percona, Dimitri Kravtchuk, Vivid Cortex and Domas Mituzas.

To hear Mark’s keynote speech, register for Percona Live Data Performance Conference 2016. Use the code “FeaturedTalk” and receive $100 off the current registration price!

The Percona Live Data Performance Conference is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

How MaxScale monitors servers

March 2, 2016 - 7:56am

In this post, we’ll address how MaxScale monitors servers. We saw in the

We saw in the previous post how we could deal with high availability (HA) and read-write split using MaxScale.

If you remember from the previous post, we used this section to monitor replication:

[Replication Monitor] type=monitor module=mysqlmon servers=percona1, percona2, percona3 user=maxscale passwd=264D375EC77998F13F4D0EC739AABAD4 monitor_interval=1000 script=/usr/local/bin/failover.sh events=master_down

But what are we monitoring? We are monitoring the assignment of master and slave roles inside MaxScale according to the actual replication tree in the cluster using the default check from the mysqlmon monitoring modules.

There are other monitoring modules available with MaxScale:

So back to our setup. MaxScale monitors the roles of our servers involved in replication. We can see the status of every server like this:

# maxadmin -pmariadb show server percona2 Server 0x1cace90 (percona2) Server: Status: Slave, Running Protocol: MySQLBackend Port: 3306 Server Version: 5.6.28-76.1-log Node Id: 2 Master Id: 1 Slave Ids: Repl Depth: 1 Number of connections: 0 Current no. of conns: 0 Current no. of operations: 0

Now if we stop the slave, we can see:

# maxadmin -pmariadb show server percona2 Server 0x1cace90 (percona2) Server: Status: Running Protocol: MySQLBackend Port: 3306 Server Version: 5.6.28-76.1-log Node Id: 2 Master Id: -1 Slave Ids: Repl Depth: -1 Number of connections: 40 Current no. of conns: 0 Current no. of operations: 0 # maxadmin -pmariadb list servers Servers. -------------------+-----------------+-------+-------------+-------------------- Server | Address | Port | Connections | Status -------------------+-----------------+-------+-------------+-------------------- percona1 | | 3306 | 0 | Master, Running percona2 | | 3306 | 0 | Running percona3 | | 3306 | 0 | Slave, Running -------------------+-----------------+-------+-------------+--------------------

and in the MaxScale logs:

2016-02-23 14:29:09 notice : Server changed state: percona2[]: lost_slave

Now if the slave is lagging, nothing happens, and we will then keep sending reads to a slave that is not up to date

To avoid that situation, we can add to the “[Replication Monitor]” section the following parameter:


If we do so, MaxScale (if it has enough privileges) will create a schema maxscale_schema  with a table replication_heartbeat . This table will be used to verify the replication lag like pt-heartbeat does.

When enabled, after we restart MaxScale, we can see the slave lag:

# maxadmin -pmariadb show server percona2 Server 0x2784f00 (percona2) Server: Status: Slave, Running Protocol: MySQLBackend Port: 3306 Server Version: 5.6.28-76.1-log Node Id: 2 Master Id: 1 Slave Ids: Repl Depth: 1 Slave delay: 670 Last Repl Heartbeat: Tue Feb 23 14:25:24 2016 Number of connections: 0 Current no. of conns: 0 Current no. of operations: 0

Does this mean that now the node won’t be reached (no queries will be routed to it)?

Let’s check:

percona3 mysql> select @@hostname; +------------+ | @@hostname | +------------+ | percona2 | +------------+

That doesn’t sound good…

# maxadmin -pmariadb show server percona2 Server 0x2784f00 (percona2) Server: Status: Slave, Running Protocol: MySQLBackend Port: 3306 Server Version: 5.6.28-76.1-log Node Id: 2 Master Id: 1 Slave Ids: Repl Depth: 1 Slave delay: 1099 Last Repl Heartbeat: Tue Feb 23 14:25:24 2016 Number of connections: 1 Current no. of conns: 1 Current no. of operations: 0

We can see that there is 1 current connection .

How come? The monitoring actually works as expected, but we didn’t configure our Splitter Service  to not use that lagging slave.

We need to configure it like this:

[Splitter Service] type=service router=readwritesplit servers=percona1, percona2 max_slave_replication_lag=30 ...

And now, if the slave lags for 30 seconds or more, it won’t be used.

But what happen if for any reason we need to stop all the slaves (or if replication breaks)?

To find out, I performed a STOP SLAVE;  on percona2 and percona3. This what we see in the logs:

2016-02-23 22:55:16 notice : Server changed state: percona2[]: lost_slave 2016-02-23 22:55:34 notice : Server changed state: percona1[]: lost_master 2016-02-23 22:55:34 notice : Server changed state: percona3[]: lost_slave 2016-02-23 22:55:34 error : No Master can be determined. Last known was 2016-02-23 22:55:45 error : Couldn't find suitable Master from 2 candidates. 2016-02-23 22:55:45 error : 140003532506880 [session_alloc] Error : Failed to create Splitter Service session because routercould not establish a new router session, see earlier error. 2016-02-23 22:55:46 error : Couldn't find suitable Master from 2 candidates. 2016-02-23 22:55:46 error : 140003542996736 [session_alloc] Error : Failed to create Splitter Service session because routercould not establish a new router session, see earlier error.

If there are no more slaves, the master is not a master anymore, and the routing doesn’t work. The service is unavailable!

As soon as we start a slave, the service is back:

2016-02-23 22:59:17 notice : Server changed state: percona3[]: new_slave 2016-02-23 22:59:17 notice : A Master Server is now available:

Can we avoid this situation when all slaves are stopped?

Yes we can, but we need to add into the monitoring section the following line:


If we stop  the two slaves again, in MaxScale’s log we can now read:

2016-02-23 23:02:19 notice : Server changed state: percona2[]: lost_slave 2016-02-23 23:02:46 warning: [mysql_mon]: root server [] is no longer Master, let's use it again even if it could be a stale master, you have been warned! 2016-02-23 23:02:46 notice : Server changed state: percona3[]: lost_slave

And we can still connect to our service and use the single master.

Next time we will see how the read-write split works.

EXPLAIN FORMAT=JSON: nested_loop makes JOIN hierarchy transparent

February 29, 2016 - 2:38pm

Once again it’s time for another EXPLAIN FORMAT=JSON is cool! post. This post will discuss how EXPLAIN FORMAT=JSON allows the nested_loop command to make the JOIN operation hierarchy transparent.

The regular EXPLAIN  command lists each table that participates in a JOIN  operation on a single row. This works perfectly for simple queries:

mysql> explain select * from employees join titles join salariesG *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 299379 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: SIMPLE table: titles partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 442724 filtered: 100.00 Extra: Using join buffer (Block Nested Loop) *************************** 3. row *************************** id: 1 select_type: SIMPLE table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2745434 filtered: 100.00 Extra: Using join buffer (Block Nested Loop) 3 rows in set, 1 warning (0.00 sec)

You can see that the first accessed table was employees, then titles  and finally salaries. Everything is clear.

EXPLAIN FORMAT=JSON in this case puts everything into the nested_loop array (even if “MySQL isn’t limited to nested-loop joins”):

mysql> explain format=json select * from employees join titles join salariesG *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "7.277755124e16" }, "nested_loop": [ { "table": { "table_name": "employees", "access_type": "ALL", "rows_examined_per_scan": 299379, "rows_produced_per_join": 299379, "filtered": "100.00", "cost_info": { "read_cost": "929.00", "eval_cost": "59875.80", "prefix_cost": "60804.80", "data_read_per_join": "13M" }, "used_columns": [ "emp_no", "birth_date", "first_name", "last_name", "gender", "hire_date" ] } }, { "table": { "table_name": "titles", "access_type": "ALL", "rows_examined_per_scan": 442724, "rows_produced_per_join": 132542268396, "filtered": "100.00", "using_join_buffer": "Block Nested Loop", "cost_info": { "read_cost": "62734.88", "eval_cost": "26508453679.20", "prefix_cost": "26508577218.88", "data_read_per_join": "7T" }, "used_columns": [ "emp_no", "title", "from_date", "to_date" ] } }, { "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2745434, "rows_produced_per_join": 363886050091503872, "filtered": "100.00", "using_join_buffer": "Block Nested Loop", "cost_info": { "read_cost": "314711040856.92", "eval_cost": "7.277721002e16", "prefix_cost": "7.277755124e16", "data_read_per_join": "5171P" }, "used_columns": [ "emp_no", "salary", "from_date", "to_date" ] } } ] } } 1 row in set, 1 warning (0.00 sec)

For a simple query this output does not add much. Except cost info and information about used columns and efficiency of composite indexes.

But what if you not only join tables, but use the other SQL language options? For example, for the query below, which has two JOIN operations and two subqueries, a regular EXPLAIN returns this plan:

mysql> explain select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <subquery2> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: PRIMARY table: dept_manager partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: <subquery2>.emp_no rows: 1 filtered: 100.00 Extra: NULL *************************** 3. row *************************** id: 1 select_type: PRIMARY table: employees partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: <subquery2>.emp_no rows: 1 filtered: 100.00 Extra: NULL *************************** 4. row *************************** id: 2 select_type: MATERIALIZED table: <derived3> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 9 filtered: 100.00 Extra: NULL *************************** 5. row *************************** id: 3 select_type: DERIVED table: titles partitions: NULL type: index possible_keys: PRIMARY,emp_no key: emp_no key_len: 4 ref: NULL rows: 442724 filtered: 7.51 Extra: Using where; Using index; Using temporary; Using filesort; LooseScan *************************** 6. row *************************** id: 3 select_type: DERIVED table: salaries partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: employees.titles.emp_no rows: 9 filtered: 100.00 Extra: NULL 6 rows in set, 1 warning (0.00 sec)

It’s pretty hard to understand which part is a subquery and which is not. It’s also it is hard to find out if DERIVED belongs to the first JOIN or to the second. And I am not quite sure why <subquery2>  was marked as PRIMARY, which is supposed to indicate “Outermost SELECT”.

The real issue here is that the internal representation of JOIN is hierarchical, and MySQL Server (like in the case for UNION) has trouble representing an object as a “flat” table. EXPLAIN FORMAT=JSON, with its hierarchical nature, can help us in this case.

mysql> explain format=json select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "39.45" }, "nested_loop": [ { "table": { "table_name": "<subquery2>", "access_type": "ALL", "materialized_from_subquery": { "using_temporary_table": true, "query_block": { "table": { "table_name": "t", "access_type": "ALL", "rows_examined_per_scan": 9, "rows_produced_per_join": 9, "filtered": "100.00", "cost_info": { "read_cost": "10.45", "eval_cost": "1.80", "prefix_cost": "12.25", "data_read_per_join": "144" }, "used_columns": [ "emp_no", "salary" ], "materialized_from_subquery": { "using_temporary_table": true, "dependent": false, "cacheable": true, "query_block": { "select_id": 3, "cost_info": { "query_cost": "176246.11" }, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "9.54" }, "nested_loop": [ { "table": { "table_name": "titles", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "emp_no", "used_key_parts": [ "emp_no" ], "key_length": "4", "rows_examined_per_scan": 442724, "rows_produced_per_join": 33229, "filtered": "7.51", "using_index": true, "loosescan": true, "cost_info": { "read_cost": "3380.56", "eval_cost": "6645.94", "prefix_cost": "63199.96", "data_read_per_join": "2M" }, "used_columns": [ "emp_no", "title", "from_date" ], "attached_condition": "(`employees`.`titles`.`title` like '%manager%')" } }, { "table": { "table_name": "salaries", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.titles.emp_no" ], "rows_examined_per_scan": 9, "rows_produced_per_join": 9, "filtered": "100.00", "cost_info": { "read_cost": "49622.62", "eval_cost": "1.91", "prefix_cost": "176236.57", "data_read_per_join": "152" }, "used_columns": [ "emp_no", "salary", "from_date" ] } } ] } } } } } } } }, { "table": { "table_name": "dept_manager", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "<subquery2>.emp_no" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 9, "filtered": "100.00", "cost_info": { "read_cost": "9.00", "eval_cost": "1.80", "prefix_cost": "23.05", "data_read_per_join": "144" }, "used_columns": [ "dept_no", "emp_no", "from_date", "to_date" ] } }, { "table": { "table_name": "employees", "access_type": "eq_ref", "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "<subquery2>.emp_no" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 1, "filtered": "100.00", "cost_info": { "read_cost": "1.00", "eval_cost": "0.20", "prefix_cost": "39.45", "data_read_per_join": "48" }, "used_columns": [ "emp_no", "birth_date", "first_name", "last_name", "gender", "hire_date" ] } } ] } } 1 row in set, 1 warning (0.01 sec)

At first we see that all our tables, JOIN operations and subqueries are in the nested_loop array:

"nested_loop": [ { "table": { "table_name": "<subquery2>", ... { "table": { "table_name": "dept_manager", ... { "table": { "table_name": "employees", ... } ]

Then we see that the first table, <subquery2>, was materialized_from_subquery:

"table": { "table_name": "<subquery2>", "access_type": "ALL", "materialized_from_subquery": { ...

Which, in its turn, was materialized_from_subquery too:

"table": { "table_name": "<subquery2>", "access_type": "ALL", "materialized_from_subquery": { ... "materialized_from_subquery": { ...

This last subquery performs grouping_operation  on the other nested_loop  (JOIN) of tables titles  and salaries:

"grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "9.54" }, "nested_loop": [ { "table": { "table_name": "titles", ... }, { "table": { "table_name": "salaries", ...

Now we have a better picture of how the query was optimized: tables titles and salaries  were joined first, then GROUP BY was executed on the result, then the result was materialized and queried. The result of the query select emp_no from <materialized> t  was materialized again as <subquery2>, and only after it joined with two other tables.

Conclusion: EXPLAIN FORMAT=JSON  helps to understand how complex queries are optimized.

Graphing MySQL performance with Prometheus and Grafana

February 28, 2016 - 11:49pm

This post explains how you can quickly start using such trending tools as Prometheus and Grafana for monitoring and graphing of MySQL and system performance.

I will try to keep this blog as short as possible, so you can quickly set things up before getting bored. I plan to cover the details in the next few posts. I am going to go through the installation process here in order to get some really useful and good-looking graphs in the end.


Prometheus is an open-source service monitoring system and time series database. In short, the quite efficient daemon scrapes metrics from remote machines using HTTP protocol and stores data in the local time-series database. Prometheus provides a simple web interface, a very powerful query language, HTTP API etc. However, the storage is not designed to be durable for the time being.

The remote machines need to run exporters to expose metrics to Prometheus. We will be using the following two:

Grafana is an open source, feature-rich metrics dashboard and graph editor for Graphite, Elasticsearch, OpenTSDB, Prometheus and InfluxDB. It is a powerful tool for visualizing large-scale measurement data and designed to work with time-series. Grafana supports different types of graphs, allows for custom representation of individual metrics on the graph and various methods of authentication including LDAP.


Here is a diagram of the setup we are going to use:

Prometheus setup

To install on the monitor host.

Get the latest tarball from Github.

wget https://github.com/prometheus/prometheus/releases/download/0.17.0rc2/prometheus-0.17.0rc2.linux-amd64.tar.gz mkdir /opt/prometheus tar zxf prometheus-0.17.0rc2.linux-amd64.tar.gz -C /opt/prometheus --strip-components=1

Create a simple config:

cat << EOF > /opt/prometheus/prometheus.yml global: scrape_interval: 5s evaluation_interval: 5s scrape_configs: - job_name: linux target_groups: - targets: [''] labels: alias: db1 - job_name: mysql target_groups: - targets: [''] labels: alias: db1 EOF

where is the IP address of the db host we are going to monitor and db1 is its short name. Note, the “alias” label is important here because we rely on it in the predefined dashboards below to get per host graphs.

Start Prometheus in foreground:

[root@centos7 ~]# cd /opt/prometheus [root@centos7 prometheus]# ./prometheus prometheus, version 0.17.0rc2 (branch: release-0.17, revision: 667c221) build user: fabianreinartz@macpro build date: 20160205-13:35:53 go version: 1.5.3 INFO[0000] Loading configuration file prometheus.yml source=main.go:201 INFO[0000] Loading series map and head chunks... source=storage.go:297 INFO[0000] 0 series loaded. source=storage.go:302 WARN[0000] No AlertManager configured, not dispatching any alerts source=notification.go:165 INFO[0000] Starting target manager... source=targetmanager.go:114 INFO[0000] Target manager started. source=targetmanager.go:168 INFO[0000] Listening on :9090 source=web.go:239

Now we can access Prometheus’ built-in web interface by http://monitor_host:9090

If you look at the Status page from the top menu, you will see that our monitoring targets are down so far. Now let’s setup them – prometheus exporters.

Prometheus exporters setup

Install on the db host. Of course, you can use the same monitor host for the experiment. Obviously, this node must run MySQL.

Download exporters from here and there.

wget https://github.com/prometheus/node_exporter/releases/download/0.12.0rc3/node_exporter-0.12.0rc3.linux-amd64.tar.gz wget https://github.com/prometheus/mysqld_exporter/releases/download/0.7.1/mysqld_exporter-0.7.1.linux-amd64.tar.gz mkdir /opt/prometheus_exporters tar zxf node_exporter-0.12.0rc3.linux-amd64.tar.gz -C /opt/prometheus_exporters tar zxf mysqld_exporter-0.7.1.linux-amd64.tar.gz -C /opt/prometheus_exporters

Start node_exporter in foreground:

[root@centos7 ~]# cd /opt/prometheus_exporters [root@centos7 prometheus_exporters]# ./node_exporter INFO[0000] No directory specified, see --collector.textfile.directory source=textfile.go:57 INFO[0000] Enabled collectors: source=node_exporter.go:146 INFO[0000] - filesystem source=node_exporter.go:148 INFO[0000] - loadavg source=node_exporter.go:148 INFO[0000] - time source=node_exporter.go:148 INFO[0000] - vmstat source=node_exporter.go:148 INFO[0000] - diskstats source=node_exporter.go:148 INFO[0000] - filefd source=node_exporter.go:148 INFO[0000] - mdadm source=node_exporter.go:148 INFO[0000] - meminfo source=node_exporter.go:148 INFO[0000] - netdev source=node_exporter.go:148 INFO[0000] - textfile source=node_exporter.go:148 INFO[0000] - entropy source=node_exporter.go:148 INFO[0000] - stat source=node_exporter.go:148 INFO[0000] - uname source=node_exporter.go:148 INFO[0000] - conntrack source=node_exporter.go:148 INFO[0000] - netstat source=node_exporter.go:148 INFO[0000] - sockstat source=node_exporter.go:148 INFO[0000] - version source=node_exporter.go:148 INFO[0000] Starting node_exporter v0.12.0rc3 at :9100 source=node_exporter.go:167

Unlike node_exporter, mysqld_exporter wants MySQL credentials. Those privileges should be sufficient:

mysql> GRANT REPLICATION CLIENT, PROCESS ON *.* TO 'prom'@'localhost' identified by 'abc123'; mysql> GRANT SELECT ON performance_schema.* TO 'prom'@'localhost';

Create .my.cnf and start mysqld_exporter in foreground:

[root@centos7 ~]# cd /opt/prometheus_exporters [root@centos7 prometheus_exporters]# cat << EOF > .my.cnf [client] user=prom password=abc123 EOF [root@centos7 prometheus_exporters]# [root@centos7 prometheus_exporters]# ./mysqld_exporter -config.my-cnf=".my.cnf" INFO[0000] Starting Server: :9104 file=mysqld_exporter.go line=1997

At this point we should see our endpoints are up and running on the Prometheus Status page:

Grafana setup

Install on the monitor host.

Grafana has RPM and DEB packages. The installation is as simple as installing one package.
RPM-based system:

yum install https://grafanarel.s3.amazonaws.com/builds/grafana-2.6.0-1.x86_64.rpm

or APT-based one:

wget https://grafanarel.s3.amazonaws.com/builds/grafana_2.6.0_amd64.deb apt-get install -y adduser libfontconfig dpkg -i grafana_2.6.0_amd64.deb

Open and edit the last section of /etc/grafana/grafana.ini resulting in the following ending:

[dashboards.json] enabled = true path = /var/lib/grafana/dashboards

Percona has built the predefined dashboards for Grafana with Prometheus for you.

Let’s get them deployed:

git clone https://github.com/percona/grafana-dashboards.git cp -r grafana-dashboards/dashboards /var/lib/grafana

Finally, start Grafana:

service grafana-server start

At this point, we are one step before being done. Login into Grafana web interface http://monitor_host:3000 (admin/admin).

Go to Data Sources and add one for Prometheus:

Now check out the dashboards and graphs. Say choose “System Overview” and period “Last 5 minutes” on top-right. You should see something similar:

If your graphs are not populating ensure the system time is correct on the monitor host.


Here are some real-world samples (images are clickable and scrollable):



Prometheus and Grafana is a great tandem for enabling monitoring and graphing capabilities for MySQL. The tools are pretty easy to deploy, they are designed for time series with high efficiency in mind. In the next blog posts I will talk more about technical aspects, problems and related stuff.

Monitoring MongoDB Response Time

February 26, 2016 - 3:52pm

In this blog post, we’ll discuss how using Prometheus can help with monitoring MongoDB response time. I am currently comparing the performance of different storage engines on Percona Server for MongoDB, using a slightly customized version of Tim Callaghan’s sysbench-mongodb. Since I’m interested in measuring response time for database operations, I created a very simple exporter of response time data for Prometheus.

My first approach to measuring MongoDB response time was inspired by Ignacio Nin’s work on tcprstat, some years ago – and by the way the VividCortex query agent works (which is not surprising, since, to the best of my knowledge, Baron inspired tcprstat in the first place).

With this in mind, I created mongo-response-time, which performs the only function of printing to stdout the response time of every mongodb query seen on the wire, along with a timestamp up to the second. My thanks go to the programmers of Facebook’s Go projects, as their code helped me hit the ground running.

As a first approach this was useful enough for my needs, and here is an example of a basic graph created from data generated by it: 

I had to use a log scale as otherwise the graph was just a thick bar near the bottom, and a few outliers above. This is already useful, but it does not scale well. As an example, a sysbench-mongodb run of about an hour produced a csv file with a little over eight million data points. Larger rounds (like 24 hours) are just too difficult to handle with R (in one case, even though I had enough memory in my box to hold more than three copies of the file, read.csv aborted after running out of memory – if this happens to you, I suggest the use of RMySQL instead, which seems more memory-efficient than read.csv for ingesting large amounts of data).

For a second approach, I decided to live with less fidelity and settled for some quantiles and a max. For this, I created a simple Prometheus exporter that exposes 0.5, 0.9 and 0.99 quantiles, and also the max response time for every five second period.

With it, I was able to visualize the MongoDB response time data in Grafana in a way that is affordable and good enough for my needs, as can be seen in the following graphs: 

The quantiles are calculated on the client side, using the Summary type from Prometheus’ Go client. The exporter also provides the same quantiles, but through a Histogram, which has the advantage of being more lightweight on clients. I decided to use the Summary as a source for this graph as the impact on the machine seems negligible for now, and I do find its data a bit more reliable (if I compare to calculating quantiles per periods analyzing the full data set in R). You can see how the max (a Gauge, glad you asked!) is useful to have, as it lets you find out about outliers that even the .99 quantile misses (which is expected, by the way).

If you want to try this out, you can find darwin and linux binaries here, and if you hit any problems, please reply here or email me directly at fernando (dot) ipar at Percona’s domain.

Happy monitoring!

Percona Back to Basics: MongoDB updates

February 25, 2016 - 2:25pm

Welcome to the first in a new series of MongoDB blogs. These blogs will cover a selection of topics, including:

  • How-tos
  • New release and new features
  • Getting back to the basics
  • Solutions from the field

In this first blog, we’ll discuss MongoDB updates. You can use the update method to update documents in a collection. MongoDB updates are well-covered in the MongoDB documentation, but there are some cases we should review for clarity, and to understand how and when to use them.

In this blog post, we’ll cover:

  • To $set or $inc 
  • What about the engine differences?
  • What is a move?
  • How can I limit updates when a customer wants to make massive changes and remain online?

$set vs $inc

The $set and $inc options are a throwback to the MMAPv1 storage engine, but are a consideration for optimization rather than a rule. If we know we want to add 100 to some value in a document that is 200k in size, it could cost many more times the disk IO to update the entire document (using $set). The question is how much more efficient is $inc? The manual talks about it being faster because it writes less, and that moves are more costly (we’ll cover them in a second). However, it doesn’t give the technical logic behind this argument.

$set could update 3200 to 3300 with no issue, and would not initiate a move (in MMAP). However, anything adding an entry to an array, adding a subdoc, adding characters to a string, adding new fields, etc., might cause a move. The larger issue at hand is that $set requires you to fetch the data first to be able to set it, while $inc lets you blindly increment the data. In practice, this might look something like:

db.logins.update ({"user": user},{"login_count": { "$inc": 1}});

Replacing the whole document might look like this:

user_data = db.logins.findOne({"user": user}) db.logins.update(              {"user":user_data.user},              {"$set":                           {"login_count": user_data.login_count+1}               } )

With regards to incrementing data, BSON is designed to advertise the length of a field at the start of each field, making it easy to skip over bytes you don’t need to read, parse and consider. As the cursor is at a very specific offset, it can change a number since it will still take the same storage size – meaning nothing else in the document needs to be shifted around. The important point is the number of seeks we need to make on a document. With BSON, if we want to update the 900th field, we would make 900 “jumps” to get to the correct position. JSON on the other hand, would read the whole document into memory and then parse each and every bracket. This requires significantly more CPU.

For BSON, the application must spend some application CPU to move between BSON and native types – but this isn’t a deal breaker: the CPU on apps is more scalable.

What about engines?

There are cases where $set  could be optimal – especially if the storage engine uses a fast-update concept (this is also known as “read-less”). What this means is we can just write blindly to the document space, making the changes we want. If the space needed is the same as what is available, we might even be able to avoid a move or restructure of the document. This is true in TokuMX, PerconaFT, or MMAPv1. However in other engines – such as systems built on LSM structures like WiredTiger and RocksDB – this is impossible (you can read more about LSM’s later, but the way an insert or update works is largely the same). It will append a new copy of the full record to the end of a file, which is very fast because it doesn’t need to look for a free item in a free list of the right size.

The downside is that using $set to append a field, or $inc to increase a counter, is much more expensive as it executes a full document read and a complete document write. This is why the type of storage engine is critically important when explaining methods for updating documents and the expected overhead.

What is a move?

A move occurs when a document is using 64k, but an update would make it 65k. Since this is larger, the new document will not fit in the existing location. This means from a storage perspective an update becomes a read, an insert, and delete. In some engines, this might be fine (for example, RocksDB will just mark the delete for later), but in other engines (i.e., LSM-based engines) too many reads can force the engine to clean up when the history list gets too long. This forced overhead is one of the reasons that LSM read operations can get bogged down, while writes are very fast.

It could be said that the default LSM state is that it needs to perform a read in any case. For the memory map, however, this means the write lock has escalated and could be many times more expensive than a non-move.

Limiting the effect of massive updates

Let’s assume that we are a shopping cart system, and we have following document structure in a collection with 100 million documents:

{  _id : ObjectId(),  accountName: "dmurphy",  // unique, immutable id, enforced in app  v : 1,  address: "123 nowhere",  state: "tx",  zip: 78634,  phone: "123-124-1234",  password: "9dd4e461268c8034f5c8564e155c67a6" }

This has worked well for tracking, but now we want to support users having multiple addresses. We have a million users, and we want to force them to a new form, as having mixed types for a long time could be an issue. (There are cases and designs to help a client be intelligent and self-updating, however, that is out of the scope of this blog.)

The document now should be:

{    _id : ObjectId(),    accountName: "dmurphy",  // unique, immutable id, enforced in app    v : 2,    addresses: [        {address: "123 nowhere",state: "tx",zip: 78634,phone1: "123-124-1234"}    ],    password: "9dd4e461268c8034f5c8564e155c67a6" }

There are a couple of reasons for this selection. You should NEVER reuse a field with different data types that are indexed. MongoDB can technically store both times, however far in the past; the index could return incorrect data or due to scan order, causing user confusion by not matching types you might think it would. In MongoDB  “123” is not anything like 123. Therefore, depending on your query you might not get all expected results. Also, we incremented the version to “2”, so that if you were programmatically checking and fixing versions in your application, you would know if it needs to be done. That model does not work for inactive users, however, which is more relevant to this example. This means we have two ways we could make our update

Option 1:


Option 2:

{addresses:{$exists: 0}}

Option 1 is much more secure and exact, while option 2 is based more on the outcome. We would want to use option 1 for clarity and repeatability, but how do we ensure it doesn’t update all 100 million documents (as the IO needed and impact on the system would be far too expensive – such as filling the oplog so much it could make a restore impossible):

function parseNS(ns){    //Expects we are forcing people to not violate the rules and not doing "foodb.foocollection.month.day.year" if they do they need to use an array.    if (ns instanceof Array){        database =  ns[0];        collection = ns[1];    }    else{        tNS =  ns.split(".");        if (tNS.length > 2){            print('ERROR: NS had more than 1 period in it, please pass as an [ "dbname","coll.name.with.dots"] !');            return false;        }        database = tNS[0];        collection = tNS[1];    }    return {database: database,collection: collection}; } function buildAddressDelta(doc){    addresses = [];    addresses[0] = { address: doc.address , state: doc.state, zip: doc.zip , phone1:doc.phone };    delta = {"$set": {"addresses" : addresses,"version": 2},"$unset" : { "address":"","state":"","zip":"","phone":""}};    return delta } function updateWithPauses(ns, query, delta , batchSize,pauseMS){    count = 0;    objNS = parseNS(ns);    if (tNS == false){ return false; }         totalToProcess = db.getDB(objNS.database).getCollection(objNS.collection).find(query).count();    while(db.getDB(objNS.database).getCollection(objNS.collection).find(query).count() > 0){        db.getDB(objNS.database).getCollection(objNS.collection).find(query).limit(batchSize).forEach(function(doc){            active_error = false;            try:{                db.getDB(objNS.database).getCollection(objNS.collectioin).update({_id:doc._id},buildAddressDelta(doc)});            }            catch(err){                print("Found error when updating _id: "+doc._id+"t: +"err.message);                active_error = false;            }            if (! active_error) {count++;}            if (count % batchSize == 0) {                print("Processed "+count+" of "+totalToProcess+" with "+(totalToProcess-count)+" to go approximately.");                sleep(pauseMS);            }        });    } }

In this example, the specific bit was “buildAddressDelta”, and the more generic part was “updateWithPauses”. A future improvement would be to make the “buildAddressDelta” become “buildDelta”, and pass it an array of deltas to apply. As you can see, the delta is adding the new array of addresses with the current as a member, updating the version, and unsetting the old fields – which should be pretty straightforward. Our focus here is more on the “updateWithPauses” script, which is doing a few things:

  1. Splitting and setting up an NS object for ease-of-use
  2. Finding out if we still have documents to change (and quitting when it’s done)
  3. Getting one document at a time and updating it, we could up a bulk op per batch also.
  4. Forcing a pause and reporting each time we hit a batchSize (% in JS means modulo)

It is possible to do more, but this is safe and has a natural slow down by not batching while still doing a forced yield from time to time. You can also safely bail out of the shell to stop the process if it is impacting the system too much, and restart it again later as it will try and find documents it needs to change just in time for each loop.


Hopefully, this blog has helped to demonstrate some of the finer points of MongoDB updates. The MongoDB documentation has comprehensive coverage of the processes, but don’t hesitate to ping us for specific questions!


Percona Live featured talk with Kat Styons – Shifting the Paradigm: MongoDB and the MEAN Stack

February 24, 2016 - 12:23pm

Welcome to the next installment of our talks with Percona Live Data Performance Conference 2016 speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus!

In this installment we’ll meet Kat Styons, full stack engineer at the Washington Post. Her presented talk is Shifting the Paradigm: MongoDB and the MEAN Stack. I had a chance to speak with Kat and learn a bit more about the database environment at the Washington Post:

Percona: Give me brief history of yourself: how you got into database development, where you work, what you love about it.

Kat: I’m a full stack engineer at the Washington Post. I started there as a Java developer, and in the beginning all of my projects’ data tiers were in MySQL and Oracle. About two years ago my team began developing a form building tool called the Submission Platform. We created a prototype of the tool using a full Javascript stack and MongoDB (the MEAN stack) and it was so quick to develop and fit the use case so well that we decided to give it a shot for real. While we were learning to work with a full JS stack and MongoDB, the Post was undergoing a shift too. We’ve adopted a Dev Ops model that lets our developers have more control over the environments their code runs in.

Percona: Your talk is going to be on “Shifting the Paradigm: MongoDB and the MEAN Stack.” In relation to media, what are the challenges you faced that caused you to develop and implement the Submission Platform? And why MongoDB and MEAN?

Kat: We realized we wanted to create the Submission Platform when we heard that editors in the newsroom were spending hours curating user submitted content. One story in particular, the annual Peeps Contest, received hundreds of submissions from readers, each containing photos of a “Peeps Diorama.” No third party tool was suited to a newsroom’s curation process. No one was handling photos in submissions in a way that was efficient to view, vote on and rank. We could do better.

When we started thinking about how our data should be structured, MongoDB became the obvious choice. If we had stored our submissions in a relational database, in order for submissions for different forms to have the structural flexibility we needed, each submission would have had to be stored across six tables. Then we would have to reference all six tables when retrieving a submission. We had a working model for this, but when we tried our prototype in MongoDB and saw that each submission could be stored in a single collection, it was clear that Mongo’s flexible schema-less structure was a more natural fit for our data.

Percona: I’m assuming that accessing data from the cloud is a big deal for your field . . . what is the implications of easy access to data? How do data cloud technologies impact you and your customers’ needs (internal or external)?

Kat: We recently moved nearly all of our projects to AWS. As a developer newly adopting the Dev Ops model I found AWS’ tools and documentation incredibly helpful. Developers at the Post have more control than ever over the environments their code runs in and that’s shortening the time it takes us to get a project from conception to deployment.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Kat: I’m excited to hear the problems faced by other developers, architects and DBAs and how they were able to solve those problems. Some of the best interactions I’ve had at conferences in the past have been when a speaker or attendee tells me a real world problem they’ve faced and what they did to solve it. When I’m trying to learn, nothing beats a real world example.


Want to find out more about Kat Styons and the Washington Post MongoDB database environment? Subscribe to the Washington Post Developers Blog: https://developer.washingtonpost.com/pb/blog/.

To hear Kat’s talk on Shifting the Paradigm: MongoDB and the MEAN Stack, register for Percona Live Data Performance Conference 2016. Use the code “FeaturedTalk” and receive $100 off the current registration price!

For even more information about MondoDB and its applications, register for Percona’s MongoDB Crash Course at Percona Live 2016. Crash Courses are led by Percona MySQL experts who will show you the fundamentals of MySQL or MongoDB tools and techniques.

The Percona Live Data Performance Conference is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Percona Server for MongoDB 3.0.8-1.3 is now available

February 24, 2016 - 6:26am

Percona is pleased to announce the release of Percona Server for MongoDB 3.0.8-1.3 on February 24, 2016. Download the latest version from the Percona web site or from the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.0 protocol and drivers. Based on MongoDB 3.0.8, it extends MongoDB with MongoRocks and PerconaFT storage engines, as well as features like external authentication and audit logging. Percona Server for MongoDB requires no changes to MongoDB applications or code.

NOTE: MongoRocks has been labeled experimental. This is not related to any changes, it is a reminder that the storage engine is still under development. There is currently no officially released version of MongoRocks that can be recommended for production.

New Features

  • The mongod daemon now logs stdout and stderr to the following files:
    • On Debian and Ubuntu:
      • /var/log/mongodb/mongod.stdout
      • /var/log/mongodb/mongod.stderr
    • On RHEL and CentOS:
      • /var/log/mongo/mongod.stdout
      • /var/log/mongo/mongod.stderr

When you upgrade, make sure that the defaults file contains the STDOUT and STDERR variables.

On Debian and Ubuntu: /etc/default/mongod

On RHEL and CentOS: /etc/sysconfig/mongod

If you did not manually modify the defaults file before upgrading, the installer should correctly replace the defaults file. Otherwise, add the STDOUT and STDERR variables after you install the new version of Percona Server for MongoDB.

On Ubuntu 14.04 LTS (Trusty Tahr), the post-install script cannot correctly start mongod with stdout and stderr not directed to /dev/null. To enable the new logging functionality, you need to add the -C option to the start-stop-daemon command in /etc/init.d/mongod. The line that starts mongod should look similar to the following:

start-stop-daemon -C --background --start --quiet --pidfile $PIDFILEPATH

After you make necessary changes to the script, restart the daemon:

$ service mongod restart

  • Changed the default value of the processManagement.fork parameter in mongod.conf from false to true. This does not affect normal operation, only PID tracking in the OS.

Before upgrading on Debian 8 (“jessie”), Ubuntu 15.04 (Vivid Vervet), or Ubuntu 15.10 (Wily Werewolf), you will need to manually change the processManagement.fork parameter to true.

  • PSMDB-30: Added SELinux support for Red Hat Enterprise Linux and CentOS.

Bugs Fixed

  • PSMDB-39: Fixed issue that caused incorrect behavior during hot backup when the --logpathoption was set.

Percona Server for MongoDB 3.0.8-1.3 release notes are available in the official documentation.

Percona Server 5.7.10-3 GA is now available

February 23, 2016 - 5:33am

Percona is glad to announce the first GA release of Percona Server 5.7.10-3 on February 23, 2016. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.7.10, including all the bug fixes in it, Percona Server 5.7.10-3 is the current GA release in the Percona Server 5.7 series. All of Percona’s software is open-source and free, all the details of the release can be found in the 5.7.10-3 milestone at Launchpad.

New Features:

Known Issues:

  • In Percona Server 5.7 super_read_only feature has been replaced with the upstream implementation. There are currently two known issues compared to Percona Server 5.6 implementation:
    • Bug #78963, super_read_only aborts STOP SLAVE if variable relay_log_info_repository is set to TABLE which could lead to a server crash in Debug builds.
    • Bug #79328, super_read_only set as a server option has no effect.
  • InnoDB crash recovery might fail if innodb_flush_method is set to ALL_O_DIRECT. The workaround is to set this variable to a different value before starting up the crashed instance (bug #1529885).

Bugs Fixed:

  • Percona Server 5.7.10-1 didn’t write the initial root password into the log file /var/log/mysqld.log during the installation on CentOS 6. Bug fixed #1541769.
  • Cardinality of partitioned TokuDB tables became inaccurate after the changes introduced by TokuDB Background ANALYZE TABLE feature in Percona Server 5.7.10-1. Bug fixed #925.
  • Running the TRUNCATE TABLE while TokuDB Background ANALYZE TABLE is enabled could lead to a server crash once analyze job tries to access the truncated table. Bug fixed #938.
  • Percona TokuBackup would fail with an unclear error if backup process found mysqld_safe.pid file (owned by root) inside the datadir. Fixed by excluding the pid file by default. Bug fixed #125.
  • PAM Authentication Plugin build warning has been fixed. Bug fixed #1541601.

Release notes for Percona Server 5.7.10-3 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

Percona XtraBackup 2.4.1 GA is now available

February 23, 2016 - 5:30am

Percona is glad to announce the first GA release of Percona XtraBackup 2.4.1 on February 23rd 2016. Downloads are available from our download site and from apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

New Features:

  • Percona XtraBackup has implemented basic support for MySQL 5.7 and Percona Server 5.7.

Bugs Fixed:

  • Percona XtraBackup didn’t respect innodb_log_file_size variable stored in backup-my.cnf. Bug fixed #1527081.
  • If server would run out of space while backups were taken with innobackupex –rsync option backup process would fail but innobackupex would still complete with completed OK! message. Bug fixed #1537256.
  • Percona XtraBackup was silently skipping extra arguments. Bug fixed #1533542 (Fungo Wang).

Other bugs fixed: #1544671 and #1535535.

Release notes with all the bugfixes for Percona XtraBackup 2.4.1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

EXPLAIN FORMAT=JSON: cost_info knows why optimizer prefers one index to another

February 22, 2016 - 8:41am

Time for another entry in the EXPLAIN FORMAT=JSON is cool! series of blog posts. This time we’ll discuss how using EXPLAIN FORMAT=JSON allows you to see that cost_info  knows why the optimizer prefers one index to another.

Tables often have more than one index. Any of these indexes can be used to resolve query. The optimizer has to make a choice in this case. One of the metrics that can be used to help make the choice is the potential cost of the query evaluation.

For example, let’s take the table titles  from the standard employees database:

mysql> show create table titlesG *************************** 1. row *************************** Table: titles Create Table: CREATE TABLE `titles` ( `emp_no` int(11) NOT NULL, `title` varchar(50) NOT NULL, `from_date` date NOT NULL, `to_date` date DEFAULT NULL, PRIMARY KEY (`emp_no`,`title`,`from_date`), KEY `emp_no` (`emp_no`), CONSTRAINT `titles_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ON DELETE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.01 sec)

It has two indexes: emp_no  and PRIMARY, each of  which could be used to resolve query:

select distinct title from titles where year(from_date) > '1990';

At first glance, emp_no  doesn’t really fit for this query. PRIMARY  does fit, because it contains both the title  and from_date  fields. Unfortunately, it cannot be used to resolve the query, because we don’t limit the search by emp_no  and title .  It can, however, be used to select rows from the index. When we use EXPLAIN , though, it shows us that the optimizer has chosen index emp_no  (every secondary index in InnoDB contains a link to the clustered index anyway):

mysql> explain select distinct title from titles where year(from_date) > '1990'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: titles partitions: NULL type: index possible_keys: PRIMARY,emp_no key: emp_no key_len: 4 ref: NULL rows: 442724 filtered: 100.00 Extra: Using where; Using index; Using temporary 1 row in set, 1 warning (0.00 sec)

PRIMARY KEY  exists in the field possible_keys, but was not chosen. EXPLAIN FORMAT=JSON  can show us why.

First let’s run it on the original query:

mysql> explain format=json select distinct title from titles where year(from_date) > '1990'G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "89796.80" }, "duplicates_removal": { "using_temporary_table": true, "using_filesort": false, "table": { "table_name": "titles", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "emp_no", "used_key_parts": [ "emp_no" ], "key_length": "4", "rows_examined_per_scan": 442724, "rows_produced_per_join": 442724, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "1252.00", "eval_cost": "88544.80", "prefix_cost": "89796.80", "data_read_per_join": "27M" }, "used_columns": [ "emp_no", "title", "from_date" ], "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')" } } } } 1 row in set, 1 warning (0.01 sec)

The important part here is:

"cost_info": { "query_cost": "89796.80" },

Which shows that the overall query_cost  is 89796.80. We don’t really know what the units are for this cost, or how it is actually measured. It isn’t important; the only thing that is important for now is that smaller is better. (Think of it like shopping for a product: it doesn’t matter which you buy it from, just that you buy it at the lowest price.)

Another important member of the index is cost_info, which belongs to the table itself:

"cost_info": { "read_cost": "1252.00", "eval_cost": "88544.80", "prefix_cost": "89796.80", "data_read_per_join": "27M" },

Here we get even more details, such as cost of read operation and evaluation. prefix_cost  is not useful for this example, because it contains the cost of joining to the next table in JOIN. Since we don’t join the table titles  with any other value of prefix_cost, is equivalent to the cost of the full query. data_read_per_join  contains the amount of data that should be read for each JOIN  operation. In our case it is once again the same as how much data we should read to fully evaluate the query.

Now let’s force index PRIMARY  and examine the EXPLAIN FORMAT=JSON  output:

mysql> explain format=json select distinct title from titles force index(primary) where year(from_date) > '1990'G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "531269.80" }, "duplicates_removal": { "using_temporary_table": true, "using_filesort": false, "table": { "table_name": "titles", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no", "title", "from_date" ], "key_length": "59", "rows_examined_per_scan": 442724, "rows_produced_per_join": 442724, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "442725.00", "eval_cost": "88544.80", "prefix_cost": "531269.80", "data_read_per_join": "27M" }, "used_columns": [ "emp_no", "title", "from_date" ], "attached_condition": "(year(`employees`.`titles`.`from_date`) > '1990')" } } } } 1 row in set, 1 warning (0.01 sec)

Notice the numbers are different this time. The total query cost is 531269.80, which is about 6 times greater than 89796.80:

"cost_info": { "query_cost": "531269.80" },

read_cost  is 442725.00, which is 353 times greater than 1252.00. However, the eval_cost  and data_read_per_join  are the same as the query that uses index emp_no :

"cost_info": { "read_cost": "442725.00", "eval_cost": "88544.80", "prefix_cost": "531269.80", "data_read_per_join": "27M" },

These numbers clearly explain why the optimizer prefers the index emp_no  to PRIMARY KEY.

In our example above this behavior is correct. In a real life scenario, if the optimizer’s choice is wrong. these numbers can show either that there is a bug in the optimizer or  that the table’s statistics are outdated and need to be updated.

Conclusion: EXPLAIN FORMAT=JSON  can be used together with FORCE INDEX  to find out why the optimizer prefers one index to another.

Upcoming Webinar Wednesday: Using Grafana for MySQL Monitoring

February 21, 2016 - 7:30am

Grafana is the leading graph and dashboard builder for visualizing time series, which is a great tool for visual monitoring of MySQL databases. Come learn how to use Grafana for MySQL monitoring.

In this webinar, Torkel Odegaard, Grafana Founder and Creator, will provide an introduction to Grafana and talk about adding data sources, creating dashboards and getting the most out of your data visualization. Then Peter Zaitsev, Percona Co-Founder and CEO, will demonstrate how to set up Grafana and Prometheus for in-depth, completely open source monitoring for MySQL troubleshooting and capacity planning. They will look into some features Grafana has to offer, explain why different graphs are important and how you specifically can use them to analyze MySQL performance and troubleshoot operational issues.

  • When: Wednesday, February 24, 2016
  • Time: 10:00am Pacific Time

Register here.

Peter Zaitsev

Peter co-founded Percona in 2006, assuming the role of CEO. Percona helps companies of all sizes maximize their success with MySQL. Percona was named to the Inc. 5000 in 2013. Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. As CEO of Percona, Peter enjoys mixing business leadership with hands on technical expertise. Peter is co-author of High Performance MySQL published by O’Reilly, one of the most popular books on MySQL performance. Peter blogs regularly on MySQLPerformanceBlog.com and speaks frequently at conferences. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

Torkel Odegaard

Creator of Grafana and now co-founder of Raintank, a company that is building and proving a metrics and monitoring platform based on open source software. Torkel was a hobby C/C++ developer in his early teens and university years which later turned into a carrier as a .NET developer with strong focus on maintainability, TDD and agile. The last couple of years has been a love affair with Go, nodejs, messaging architectures, writing deploy automation and other DevOps related tools. He is currently working full time on the open source project Grafana, the leading graph and dashboard builder for visualizing time series metrics.

Myth Busting: MongoDB Scalability (it scales!)

February 19, 2016 - 10:15am

Recently I was talking with Tim Sharp, one of my colleagues from our Technical Account Manager team about MongoDB’s scalability. While doing some quick training with some of the Percona team, Tim brought something to my attention: he mentioned to me that several customers he has been in contact with have been telling him that “MongoDB doesn’t scale!” MongoDB’s scalability was in question? My response was, “Is that a joke?”

Coming from MySQL users, I was confounded given MongoDB’s native ability to easily scale horizontally. However, I quickly realized that they may not have been referencing its ability to shard. So I decided to take a step back and look at the history of MongoDB, not just its present form, to get to the bottom of this. Here’s my analysis:

The History

From MongoDB’s inception, it was obvious that it was a developers database – and thus a vast majority of the development resources went into functionality (not performance). Nowhere was this more apparent than with the MMAP storage engine. From what I can tell, this is the root of the belief that MongoDB doesn’t scale. And, for good reason too! From MongoDB v1.0, which was released on August 27, 2009 (according to Wikipedia) and stretching for three years (all the way until v2.2 released on August 28, 2012), there was a per-process lock. That means that you could only have one operation being performed on a single MongoDB server at a time. To put that into context, this is useful for a website that has only one user on it, but not something that is generally worthy of my least favorite buzzword, “webscale.” Obviously, they changed that with v2.2 and introduced an extremely mature database-level lock (note the sarcasm). In most users eyes, including mine, this was hardly a revelation. The lack of “useability” of a database-level lock is especially apparent in MongoDB, due the fact that it doesn’t support the ability to join data from separate collections (think tables in RDBMS), let alone databases. Don’t get me wrong, there were/are thousands of users making MMAP work for them – but, IMO, at a significant cost!

Finally, at MongoDB World in June of 2014, it looked like there was finally going to be some major vertical scalability changes. These were to come in the form of a document-level locking scheme for the MMAP storage engine and a new storage engine API. There was even a demo to show MMAPs new thread scaling and throughput capabilities. To make a long story short, that version of MMAP never made it out of the cage, but through a key acquisition, WiredTiger did. And with it, the dawn of the MongoDB document-level lock had come.

To wrap up the last two paragraphs in a simple statement: MongoDB prior version 3.0 relied much too heavily on horizontal scalability and, almost completely, ignored the need for increased single server efficiency. For a database that was developed to service the changing needs of the web/application driven businesses (i.e. highly concurrent), this was confusing, if not maddening. Furthermore, during a critical period of new user attraction, potential long term users of MongoDB were disenfranchised because they viewed the lack of vertical scalability as frustratingly inefficient; a multiplier of infrastructure complexity. Now that I look back, it’s no wonder MongoDB has a reputation for scaling problems.

The Present

As I previously mentioned, MongoDB, Inc., acquired a company called WiredTiger, and with it the WiredTiger storage engine. This finally ushered in a modern locking scheme and a greatly enhanced throughput (vertical scalability). WiredTiger also has the ability to support compression for greater space efficiency (the MMAP engine is also a storage hog). However, the storage engine API lets anyone build a storage engine, which leads to more options . . .

A Contender Enters the Ring

To start at the end, á la the movie Memento, on December 14th, 2015 Percona released the GA version of Percona Server for MongoDB, an enhanced, high-performance distribution of MongoDB. It was designed to build on the success of their purely FOSS (free and open source software) Percona Server for MySQL. Now that I’ve spoiled it, let’s take a brief look at how we got there.

Enter Tokutek

In the beginning of 2015, Percona acquired the company that I was working for: Tokutek. With it (and me) came a “futuristic” fork of MongoDB called TokuMX. TokuMX was, as far as I know, the first implementation of a document-level lock and write optimized storage engine (Fractal Tree) in MongoDB (not to mention compression and advanced features). The effect was impressive, with customers regularly reporting upwards of 10-20x the performance with compression ratios of 6:1 on low end and 32:1 for highly repetitive data – talk about enhanced scale! However, maintaining a fork with such an advanced feature set (on top of what MongoDB included) proved difficult to maintain given that a new version of MongoDB was being released approximately every six months, and code inter-dependencies for TokuMX were complex. Our customers began asking for greater parity with upstream . . .

When the storage engine API came along, it was a great revelation for our customers. If we went back to the conceptual stages of TokuMX at Tokutek, we would have preferred to hook into a storage engine API just like we did with our first product, TokuDB for MySQL (now on Percona Server for MySQL). However it was not something that MongoDB, Inc. (then 10gen) was interested in investing in. With the birth of the MongoDB v3.0 storage engine API, we can plug the Fractal Tree in and maintain greater parity with upstream, while providing high performance and high compression all with the latest and greatest MongoDB features.

Social Networking Pushes Open-Source

As you may know, Facebook has a significant investment in MongoDB for their infrastructure and a database engine that they’ve developed called RocksDB. RocksDB was developed off of LevelDB and, if memory serves me right, first implemented as an engine for MySQL. Facebook later saw the value in implementing RocksDB in MongoDB so that they could increase the efficiency of their infrastructure. This is also the first implementation of an LSM data structure in MongoDB. LSM storage engines can provide impressive performance and compression for high throughput, insert-heavy workloads.

MongoRocks (as it’s called for MongoDB), has the full weight of the Facebook development team and is vetted through what we know to be a serious workload. Also, Percona is the only company partnering with Facebook to offer commercial level support, pre-built and tested binaries/packages and consulting for MongoRocks.

The latest version of Percona Server for MongoDB (full MongoDB Community Code plus PerconaFT and MongoRocks pre-compiled in, along with Enterprise features) can be downloaded here.

The Proof is in the Pudding

Enough with the talk. Numbers don’t lie, so let me show you how much the vertical scalability has improved from MongoDB v2.2 to MongoDB v3.0. I’ll even break it out by storage engine. These numbers are meant to be relative, and meant to show the relative difference in performance between the versions (and storage engines for v 3.0). However, for the sake of transparency, they were done with iibench simulating a pure insert workload.

The graph below shows the huge space savings one can achieve with the new storage engines.  

What Should You Conclude?

To put it bluntly, MongoDB’s performance has improved dramatically. The great thing is, the storage engine API hasn’t even matured yet. Currently, it’s one of the biggest performance limiters of the storage engines. As it matures, storage engines, and thus MongoDB, should get faster. As always, please do not take these numbers as gospel, they were created with a simple workload meant to only show the difference in vertical scalability. Your evaluation should be done with a workload that simulates your application’s behavior in order to fully understand how MongoDB will perform in your environment. For a more detailed discussion about the features and workload suitability of each storage engine, take a look at the blog series that Dave Avery and I co-wrote: MongoDB revs you up (follow the links to see all four posts).

In short: If you tried MongoDB prior to v 2.6, you should circle back and re-evaluate it (try PSMDB for more choice).

For a more detailed look at the performance of the storage engines available in Percona Server for MongoDB, take a look at Vadim Tkachenko’s blog post.

MongoDB, security, and how not to get stung

February 18, 2016 - 4:49pm

In a recent article over at The Inquirer, Microsoft got roundly savaged for a database security leak. Apparently their third-party database operator didn’t correctly secure the backdoor to their “careers “database, leaving it open for anybody to ransack for personal information. When it was discovered (by a user), it was quickly acknowledged and remedied. So is MongoDB security any good? Short answer, of course it is.

In the back and forth, MongoDB’s database security got caught in the crossfire:

“All indications are that the database, a MongoDB instance, was not write-protected. You probably see where this is going. During the exposed timeframe an attacker could have modified the database, and thus the HTML code, of job listing pages being served through m.careersatmicrosoft.com.”*

While it’s true that MongoDB was the database system in question, and it’s true that the security was left open, it’s pretty unfair to imply that somehow this is a MongoDB problem – rather than the all-too-familiar “operator error.” MongoDB can be secured (as it currently is now in this particular instance), you just need to have someone experienced with the database system security to verify sure it is done correctly.

As MongoDB put it:

“Recently a blog post was published that claimed a user had not properly secured their instance of MongoDB and was therefore at risk. As the article explains, the potential issue is a result of how a user might configure their deployment without security enabled. There is no security issue with MongoDB – extensive security capabilities are included with MongoDB,” said Kelly Stirman, VP of Strategy at MongoDB.*

MongoDB is correct, but let’s face it: in the current real world, unfortunately, the idea of foolproof security is a unicorn. If somebody wants to get past your security badly enough, they’re most likely going to do it. Coding errors happen, patches are released, updates are recommended – the real crucible for any software is the cold, hard light of day. We expect this in the industry.

This doesn’t mean give up, however! It means be vigilant about understanding how to properly configure security settings, and making sure to monitor them consistently enough to know when malfeasance is afoot. If you’re a third-party operator running a database, you’d better know what the security settings are, and how to turn them on properly. The larger concern for anybody running software, database or otherwise, is making sure that the administrator or support is an expert on how to properly configure, maintain, and monitor it in a production environment, so as to minimize the impact of breaches. Filing fix-it tickets after the fact isn’t really a good plan (see the article in question).

MongoDB is as secure as any database option, as long as the security is correctly setup. MongoDB has a fairly comprehensive list of settings and procedures that explain how their security works.

Let’s go ahead and link to them:

  • Enable Access ControlTutorials for enabling access control.
  • Authentication MechanismsTutorials for specifying various authentication mechanisms supported by MongoDB.
  • Configure Users and RolesTutorials for managing users and roles.
  • NetworkTutorials for securing your network via TLS/SSL and firewall configuration.
  • EncryptionTutorials for storage encryption. (Another encryption option is file system encryption, which is available with most operating systems. File system encryption is fully compliant with PCI.)
  • AuditingTutorials for configuring auditing. (Percona has an excellent resource on auditing, here, that comes free with Percona Server.)
  • MiscellaneousTutorial illustrating field-level redaction or instructions for reporting a security vulnerability to MongoDB.

Something in one of these sections most likely fixed Microsoft’s issue.

So somehow, Microsoft’s database security was left unexamined – a mistake that could be pretty fatal for a company that doesn’t have the resources that Microsoft does. If you rely on your database’s security (and performance) to run your business, you need a guarantee that the administrators can effectively solve problems and protect you from security gaffes (like not turning it on). This is where regular, expert oversight is a crucial part of any database plan. For MongoDB to work as expected, you need to have a support team that configures it correctly and monitors it regularly – and knows what to be monitoring.

This is where being a MongoDB expert comes in handy!

*Quotes taken from the original Inquirer article, here.

Percona Live featured talk with Geir Hoydalsvik: What’s New in MySQL

February 16, 2016 - 9:32am

Welcome to the next installment of our talks with Percona Live Data Performance Conference 2016 speakers! In this series of blogs, we’ll highlight some of the talks that will happen at this year’s conference, as well as discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live registration bonus!

In this installment we’ll meet Geir Høydalsvik, Oracle Senior Software Development Director. His talk, presented with Simon Mudd, booking.com DBA, is What’s New in MySQL. This will cover some key enhancements to MySQL version 5.7.

I had a chance to speak with Geir and learn a bit more about MySQL 5.7:

Percona: Give me brief history: how you got into database development, what you work on, and why you love it.

Geir: My interest in databases grew while working in banking in the late nineties. Back then I implemented back-end ATM servers using HP-UX and Sybase as the development platform. I remember we had an allowed maintenance window from 2am-5am and struggled with finishing a blocking create index operation on our main table with 30 million rows. I remember thinking “Why can’t this be done online?”

Then by chance the local database startup Clustra started an “online schema change” project and asked me to join. Since then it has been all about developing databases. A DBMS is a combined communication system, an operating system, a file system, a compiler. Each and every aspect of computer science is relevant: maintainability, observability, performance, availability, reliability, security. The quality of developers you meet in this area is stunning. Customers design their business around their data. So, why should I consider doing anything else?

Percona: Your talk is going to be on “What’s New in MySQL.” So the obvious question is what’s new?!? But more specifically, what is the feature you think will have the most affect in the community, and why?

Geir: MySQL 5.7 is the result of almost three years of development, and brings improvements to all areas of the product. Just take a look at http://www.thecompletelistoffeatures.com/. In terms of features that will have the most affect in the community, I would say the JSON support, the JSON datatype, JSON functions, and virtual columns. Virtual columns can be used together with JSON functions to index the content in a JSON document. This functionality is opening up new choices for developers. Next would be the Query Rewrite capabilities, a feature that can help solve many legacy problems in today’s complex software stacks. Finally, I would highlight the much improved monitoring capabilities with SYS Schema, on top of an extended Performance Schema, and close to 600 instrumentation points.

Percona: How will it affect application performance? What might prevent people from immediately adopting the new version? Are there some caveats that need mentioning?

Geir: There are many scenarios where applications will see much improved performance: multi-core scalability, parallel replication slaves, connect/disconnect speed, temporary tables, GIS queries, a better optimizer that provides better JOIN ordering, a better use of indexes, etc. A smooth upgrade was one of our main design objectives, and so upgrading to 5.7 should be easy in most cases. The challenge comes when there is a need to balance 100% backward compatibility with improved data safety or improved security.

Percona: What do you see as an issue that we the open source database community needs to be on top of with regard to MySQL development? What keeps you up at night with regard to the future of MySQL?

Geir: I think the future of MySQL is bright, and if I don’t sleep well at night it’s because of excitement. I see a big and complex eco-system bubbling with ideas and products – with MySQL being one component mixed with others. The challenge is to find the right mix, but I feel we are on the right track.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Geir: I look forward to collecting feedback from our users trying out new versions of our products. We collected some great ideas at Percona Amsterdam last year! Then, as always, I look forward to the buzzing crowd of people, highly-voiced opinions, and exciting technologies.

You can read more of Geir’s thoughts, and about MySQL, at the News from the MySQL Server Team blog. Geir contributes as often as he can. You can also read specifically about MySQL 5.7 in his blog post here: http://mysqlserverteam.com/whats-new-in-mysql-5-7-generally-available/.

Want to find out more about Geir, Simon, and MySQL 5.7? Register for Percona Live Data Performance Conference 2016, and come see his talk What’s New in MySQL. Use the code “FeaturedTalk” and receive $100 off the current registration price!

The Percona Live Data Performance Conference is the premier open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Data Performance Conference will be April 18-21 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

MySQL Auditing with MariaDB Auditing Plugin

February 15, 2016 - 3:49pm

This blog will address how the MariaDB Auditing Plugin can help monitor database activity to help with security, accountability and troubleshooting.

Why Audit Your Databases?

Auditing is an essential task for monitoring your database environment. By auditing your database, you can achieve accountability for actions taken or content accessed within your environment. You will also deter users (or others) from inappropriate actions.

If there is any bad behavior, you can investigate suspicious activity. For example, if a user is deleting data from tables, the admins could audit all connections to the database and all deletions of rows. You can also use auditing to notify admins when an unauthorized user manipulates or deletes data or that a user has more privileges than expected.

Auditing Plugins Available for MySQL

As Sergei Glushchenko said in a previous blog, MySQL version 5.5.3 and later provides the Audit Plugin API, which can be used to write an audit plugin. The API provides notification for the following events:

  • messages written to general log (LOG)
  • messages written to error log (ERROR)
  • query results sent to client (RESULT)
  • logins (including failed) and disconnects (CONNECT)

All current audit plugins for MySQL provide an audit log as result of their work. They differ in record format, filtering capabilities and verbosity of log records.

  • MySQL Enterprise Audit Plugin – This plugin is not open source and is only available with MySQL Enterprise, which has a significant cost attached to it. It is the most stable and robust.
  • Percona Audit Log Plugin – Percona provides an open source auditing solution that installs with Percona Server 5.5.37+ and 5.6.17+. This plugin has quite a few output features as it outputs XML, JSON and to syslog. Percona’s implementation is the first to be a drop-in replacement for MySQL Enterprise Audit Plugin. As it has some internal hooks to the server to be feature-compatible with Oracle’s plugin, it is not available as a standalone for other versions of MySQL. This plugin is actively maintained by Percona.
  • McAfee MySQL Audit Plugin – Around the longest and has been used widely. It is open source and robust, while not using the official auditing API. It isn’t updated as often as one may like. There hasn’t been any new features in some time. It was recently updated to support MySQL 5.7.
  • MariaDB Audit Plugin – The only plugin that claims to support MySQL, Percona Server and MariaDB. It is open source and constantly upgraded with new versions of MariaDB. Versions starting at 1.2 are most stable, and it may be risky to use versions below that in your production environment. Versions below 1.2 may be unstable and I have seen it crash production servers. Older versions also log clear text passwords.

About the MariaDB Auditing Plugin

The MariaDB Auditing Plugin provides auditing functionality for not only MariaDB, but Percona Server and MySQL as well. It is installed with MariaDB or available as a plugin for Percona Server and MySQL.

I worked with the MariaDB Auditing Plugin because I was using MySQL community, without an enterprise license, which means the Enterprise Plugin and Percona’s plugin are off the table. We wanted to use a plugin that used MySQL’s built in auditing API, not a custom one that reads known memory blocks and is sensitive to upgrades such as McAfee’s plugin.

Get the Plugin

To get the MariaDB Auditing Plugin, download the .so from here: https://mariadb.com/products/connectors-plugins.

You can manually install the .so file to your plugin directory (ie /usr/lib/mysql/plugin on debian):


I highly recommend packaging it if you intend to do any automation (chef, puppet) or upgrades in the future.


Similar steps can be performed with fpm.

Create a directory structure for the debian package:

$ mkdir mariadb-server-audit-plugin-1.2.0 $ cd mariadb-server-audit-plugin-1.2.0 $ mkdir -p usr/lib/mysql/plugin

Copy plugin into package directory:

$ cp /path/to/server_audit.so usr/lib/mysql/plugin

Debianize the package directory:

$ dh_make --createorig

Delete example files:

$ cd debian/ ; rm -f *.ex

Configure the package:

$ echo "usr/lib/mysql/plugin/server_audit.so" > debian/install $ echo "usr/lib/mysql/plugin/server_audit.so" > debian/source/include-binaries

Build the .deb:

$ dpkg-buildpackage -us -uc

Verify package version:

$ dpkg-deb -W mariadb-server-audit-plugin_1.2.0-1_amd64.deb mariadb-server-audit-plugin 1.2.0-1


Not required but highly recommended (INSTALL PLUGIN and UNINSTALL PLUGIN tend to fail for this plugin depending on what else is happening within your environment):

$ service mysql stop

Install with dpkg:

$ dpkg -i mariadb-server-audit-plugin_1.2.0-1_amd64.deb


Reference https://mariadb.com/kb/en/mariadb/server_audit-system-variables/ for more information on configuration.

Add to my.cnf (if you didn’t restart, you can set these in sql with SET GLOBAL):

# load plugin plugin-load=server_audit=server_audit.so # do not allow users to uninstall plugin server_audit=FORCE_PLUS_PERMANENT # only audit connections and DDL queries server_audit_events=CONNECT,QUERY_DDL # enable logging server_audit_logging=ON # any users who don’t need auditing (csv) server_audit_excl_users=’root’ # or can use server_audit_incl_users=’jayj’

Log destination

When selecting the log destination, you want to use one method. It is dangerous to configure both, so decide ahead of time on your logging strategy.

# flat file server_audit_output_type=FILE server_audit_file_path=/var/log/mysql/audit.log server_audit_file_rotate_size=1000000 server_audit_file_rotations=9 # syslog server_audit_output_type=SYSLOG server_audit_syslog_facility=LOG_LOCAL6 server_audit_syslog_ident=mysql_audit server_audit_syslog_info=this-host.name server_audit_syslog_priority=LOG_INFO

Verify Install

$ service mysql start $ mysql mysql> SHOW PLUGINS; +-------------------------+----------+-----------------+-----------------+---------+ | Name | Status | Type | Library | License | +-------------------------+----------+-----------------+-----------------+---------+ ... | SERVER_AUDIT | ACTIVE | AUDIT | server_audit.so| GPL | +-------------------------+----------+-----------------+------------------+--------+ 24 rows in set (0.00 sec) mysql> SELECT * FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME='SERVER_AUDIT'G *************************** 1. row *************************** PLUGIN_NAME: SERVER_AUDIT PLUGIN_VERSION: 1.2 PLUGIN_STATUS: ACTIVE PLUGIN_TYPE: AUDIT PLUGIN_TYPE_VERSION: 3.2 PLUGIN_LIBRARY: server_audit.so PLUGIN_LIBRARY_VERSION: 1.3 PLUGIN_AUTHOR: Alexey Botchkov (MariaDB Corporation) PLUGIN_DESCRIPTION: Audit the server activity PLUGIN_LICENSE: GPL LOAD_OPTION: FORCE_PLUS_PERMANENT 1 row in set (0.01 sec)

Check the logs

$ tail server_audit.log 20130927 01:00:00,localhost.localdomain,root,localhost,1,1,QUERY,,'SET GLOBAL server_audit_logging=ON',0

Rsyslog config

I recommend starting here and setting up an elasticsearch cluster with logstash and kibana, also known as the ELK stack. This allows you to aggregate and search your logs to find problems. Here is a sample rsyslog configuration:

$ cat /etc/rsyslog.d/10-mysqlaudit.conf # keep in /var/log as syslog user can’t access /var/log/mysql usually /var/log/mysql-audit.log { daily rotate 7 missingok create 640 syslog adm compress sharedscripts postrotate reload rsyslog >/dev/null 2>&1 || true endscript }


The MariaDB Auditing Plugin is quick and easy to install and bring into your current logging or auditing solution.

Once you have installed auditing you can detect problems with an authorization or access control implementation. It allows you to create audit policies that you expect will never generate an audit record because the data is protected. If these policies do generate audit records, then you know that the other security controls are not properly implemented.

Auditing information can help you troubleshoot performance or application issues and lets you see exactly what SQL queries are being processed.

Measuring Docker IO overhead

February 11, 2016 - 12:38pm

This will be another post on using Percona Server via a Docker image. I want to follow up on my previous post regarding CPU/Network overhead in Docker “Measuring Percona Server Docker CPU/network overhead” by measuring  if there is any docker IO overhead on operations.

After running several tests, it appears (spoiler alert) that there is no Docker IO overhead. I still think it is useful to understand the different ways Docker can be used with data volumes, however. Docker’s philosophy is to provide ephemeral containers, but ephemeral does not work well for data – we do not want our data to disappear.

So, the first pattern is to create data inside a docker container. This is the default mode:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps13 -p 3306:3306 -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

(I am using --net=host to avoid network overhead; check the previous post for more information.)

The second pattern is to use an external data volume, there we need to substitute the data volume with -v /data/flash/d1/:/var/lib/mysql. The full command is:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps13 -p 3306:3306 -v /data/flash/d1/:/var/lib/mysql -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

Finally, there is third pattern: using data volume containers. For this example, I created a dummy container:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps13-data-volume -v /var/lib/mysql -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

After stopping the ps13-data-volume container, we can start a real one using the data volume from ps13-data-volume  as:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps14 --volumes-from ps13-data-volume -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

I compared all these modes with Percona Server running on a bare metal box, and direct mounted in sysbench, for both read-intensive and write-intensive IO workloads. For the reference, sysbench command is:

./sysbench --test=tests/db/oltp.lua --oltp_tables_count=16 --oltp_table_size=10000000 --num-threads=16 --mysql-host= --mysql-user=root --oltp-read-only=off --max-time=1800 --max-requests=0 --report-interval=10 run

I’m not going to show the final numbers or charts, as the results are identical for all docker modes and for the bare metal case. So I can confidently say there is NO IO overhead for any docker data volume pattern described above.

As next experiment, I want to measure the Docker container overhead in a multi-host network environment.

General Inquiries

For general inquiries, please send us your question and someone will contact you.