EmergencyEMERGENCY? Get 24/7 Help Now!
Subscribe to Latest MySQL Performance Blog posts feed
Percona's Data performance and scalability blog
Updated: 9 min 54 sec ago

Percona XtraBackup 2.4.0-rc1 is now available

February 8, 2016 - 9:35am

Percona is glad to announce the first release candidate of Percona XtraBackup 2.4.0-rc1 on February 8th 2016. Downloads are available from our download site and from apt and yum repositories.

This is a Release Candidate quality release and it is not intended for production. If you want a high quality, Generally Available release, the current Stable version should be used (currently 2.3.3 in the 2.3 series at the time of writing).

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

This release contains all of the features and bug fixes in Percona XtraBackup 2.3.3, plus the following:

New Features:

  • Percona XtraBackup has implemented basic support for MySQL 5.7 and Percona Server 5.7.

Known Issues:

  • Backed-up table data could not be recovered if backup was taken while running OPTIMIZE TABLE (bug #1541763) or ALTER TABLE ... TABLESPACE (bug #1532878) on that table.
  • Compact Backups currently don’t work due to bug #1192834.

Release notes with all the bugfixes for Percona XtraBackup 2.4.0-rc1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

MySQL 5.7: Introduction for Operational DBAs

February 8, 2016 - 7:18am

Join us Tuesday, February 16, 2016 9:00am PST (UTC-8) for a webinar on the operational and monitoring features of MySQL 5.7 with Percona CEO Peter Zaitsev.

MySQL 5.7 is a great release, providing valuable features for both daily operations and ongoing development. In this, part two of our ongoing 5.7 webinar series, we will look into the new MySQL 5.7 features and enhancements that improve operations, with a specific focus on monitoring. These include:

  • An improved optimizer, including updates to EXPLAIN and enhanced JSON support
  • Performance and scalability improvements for the InnoDB storage engine, including temp tables, spatial types and full text search parsing
  • Security improvements, such as a password expiration policy
  • Performance and sys schema improvements, such as memory usage, metadata locks, monitoring capabilities and reduced footprint/overhead
  • Better online server management
  • Improved replication functions, including new statements for multi-source replication and better monitoring

MySQL 5.7 promises to be faster, more robust, and more secure. We look forward to seeing you at the webinar!

Peter Zaitsev co-founded Percona in 2006, assuming the role of CEO. Percona helps companies of all sizes maximize their success with MySQL. Percona was named to the Inc. 5000 in 2013. Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. As CEO of Percona, Peter enjoys mixing business leadership with hands on technical expertise. Peter is co-author of High Performance MySQL published by O’Reilly, one of the most popular books on MySQL performance. Peter blogs regularly on MySQLPerformanceBlog.com and speaks frequently at conferences. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

Measuring Percona Server Docker CPU/network overhead

February 5, 2016 - 10:55am

Now that we have our Percona Server Docker images, I wanted to measure the performance overhead when we run the database in the container. Since Docker promises to use a lightweight container, in theory there should be very light overhead. We need to verify that claim, however. In this post I will show the numbers for CPU and network intensive workloads, and later I will take a look at IO.

For the CPU-bound load, I will use a sysbench OLTP read-only workload with data fitting into memory (so there is no IO performed, and the execution path only goes through the network and CPU).

My server is 24 cores (including hyper-threads), with Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz CPUs, RAM: 256GB, OS: Ubuntu 14.04. The Docker version is the latest on the day of publishing, which is 1.9.1.

First, I measured the throughput on a bare server, without containers – this will be the baseline. For reference, the command I used is the following:

/opt/sysbench/sysbench --test=/opt/tests/db/oltp.lua --oltp_tables_count=8 --oltp_table_size=10000000 --num-threads=16 --mysql-host=172.18.0.2 --mysql-user=root --oltp-read-only=on --max-time=1800 --max-requests=0 --report-interval=10 run

On the bare metal system, the throughput is 7100 transactions per second (tps).

In the next experiment, I started Percona Server in a Docker container and connect to it from the host:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps13 -p 3306:3306 -v /data/flash/d1/:/var/lib/mysql -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

In this case, the container exposed port 3306 to the host, and we used that as an access point in sysbench.

The throughput in this scenario is 2200 tps!!! That is a significant overhead. I suspect it comes from the Docker gateway, which is added to the execution path when we connect through port forwarding.

So to avoid the Docker gateway, in the next run I used the host network by running the container with --net=host:

docker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 --name ps13 -v /data/flash/d1/:/var/lib/mysql -v /data/flash/my.cnf:/etc/my.cnf --net=host percona/percona-server:5.6.28

In this case the container ran directly in the host network stack, so this should exclude any Docker network overhead. In this case, the throughput is basically back to 7100 tps.

From these tests, I can make an important conclusion. There is NO measurable CPU overhead when running Percona Server in a Docker container. But the network path raises some questions.

So in the next experiment I ran both sysbench and MySQL in two different containers, connected over the Docker network bridge.

I created a sysbench container, which you can get from:

 https://hub.docker.com/r/percona/sysbench/

To run sysbench:

docker run --name sb -t percona/sysbench

Just for the reference, I created a Docker network:

docker network create sysbenchnet

and connected both containers to the same network:

docker network connect sysbenchnet ps13; docker network connect sysbenchnet sb;

In this configuration, the throughput I’ve observed is 6300 tps.

So there is still some network overhead, but not as significant as with the port gateway case.

For the last example, I again excluded the network path and ran the sysbench container inside the MySQL container network stack using the following command:

docker run --name sb --net container:ps13 -t percona/sysbench

The throughput in this configuration is back to 7100 tps. 

And the conclusion, again, is that there is no CPU overhead even if we run both client and server inside containers, but there is some network overhead – even when running on the same host. It will be interesting to measure the network overhead when the containers are on different physical hosts.

The following chart summarizes the results:

Next time I will try to measure IO overhead in Docker containers.

MySQL password expiration features to help you comply with PCI-DSS

February 4, 2016 - 7:12am

PCI Compliance (section 8.2.4) requires users to change password every 90 days. Until MySQL 5.6.6 there wasn’t a built-in way to comply with this requirement.

Since MySQL version 5.6.6 there’s a password_expired feature which allows to set a user’s password as expired.
This has been added to the mysql.user table and its default value it’s “N.” You can change it to “Y” using the ALTER USER statement.

Here’s an quick example on how to set expiration date for a MySQL user account:

mysql> ALTER USER 'testuser'@'localhost' PASSWORD EXPIRE;

Once this is set to “Y” the username will still be able to login to the MySQL server, but it will not be able to run any queries before setting the new password. You will instead get an ERROR 1820 message:

mysql> SHOW DATABASES; ERROR 1820 (HY000): You must SET PASSWORD before executing this statement

Keep in mind that this does not affect any current connections the account has open.

After setting a new password, all operations performed using the account will be allowed (according to the account privileges):

mysql> SET PASSWORD=PASSWORD('mechipoderranen'); Query OK, 0 rows affected (0.00 sec) mysql> SHOW DATABASES; +--------------------+ | Database | +--------------------+ | information_schema | | data | | logs | | mysql | | performance_schema | | test | +--------------------+ 6 rows in set (0.00 sec) mysql>

This allows administrators to perform password expiration by scheduling the ALTER USER via cron.

Since MySQL 5.7.4, this has been improved and there’s a new feature to set a policy for password expiration, that provides more control through a global variable, default_password_lifetime which allows to set a global automatic password expiration policy.

Example usage:

Setting a default value on our configuration file. This will set all account passwords to expire every 90 days, and will start counting from the day this variable was set effective on your MySQL server:

[mysqld] default_password_lifetime=90

Setting a global policy for the passwords to never expire. Note this is the default value (so it is not strictly necessary to declare in the configuration file):

[mysqld] default_password_lifetime=0

This variable can also be changed at runtime if the user has SUPER privileges granted:

mysql> SET GLOBAL default_password_lifetime = 90; Query OK, 0 rows affected (0.00 sec)

You can also set specific values for each user account using ALTER USER. This will override the global password expiration policy. Please note that ALTER USER only understands INTERVAL expressed in DAY:

ALTER USER ‘testuser’@‘localhost' PASSWORD EXPIRE INTERVAL 30 DAY;

Disable password expiration:

ALTER USER 'testuser'@'localhost' PASSWORD EXPIRE NEVER;

Set to default value, which is the current value of default_password_lifetime:

ALTER USER 'testuser'@'localhost' PASSWORD EXPIRE DEFAULT;

Since MySQL 5.7.6, you can use the ALTER USER to change the user’s password:

mysql> ALTER USER USER() IDENTIFIED BY '637h1m27h36r33K'; Query OK, 0 rows affected (0.00 sec)

For more information on this variable, please refer to the documentation page: https://dev.mysql.com/doc/refman/5.7/en/password-expiration-policy.html

Bonus post:

Another new feature in MySQL 5.7.8 related to user management is locking/unlocking user accounts when CREATE USER, or at a later time running the ALTER USER statement.

In this example, we will first create a username with the ACCOUNT LOCK:

mysql> CREATE USER 'furrywall'@'localhost' IDENTIFIED BY '71m32ch4n6317' ACCOUNT LOCK; Query OK, 0 rows affected (0.00 sec)

As you can see below, the newly created user gets an ERROR 3118 message while trying to login:

$ mysql -ufurrywall -p Enter password: ERROR 3118 (HY000): Access denied for user 'furrywall'@'localhost'. Account is locked.

We can unlock the account using the ALTER USER ... ACCOUNT UNLOCK; statement:

mysql>ALTER USER 'furrywall'@'localhost' ACCOUNT UNLOCK; Query OK, 0 rows affected (0.00 sec)

Now the user account is unlocked and accessible:

$ mysql -ufurrywall -p Enter password: Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 17 Server version: 5.7.8-rc MySQL Community Server (GPL) Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql>

If necessary, you can lock it again:

mysql> ALTER USER 'furrywall'@'localhost' ACCOUNT LOCK; Query OK, 0 rows affected (0.00 sec)

Please check this following documentation for more details: https://dev.mysql.com/doc/refman/5.7/en/account-locking.html

New GIS Features in MySQL 5.7

February 3, 2016 - 7:25am

MySQL 5.7 has been released, and there are some exciting new features now available that I’m going to discuss in this blog — specifically around geographic information system (GIS).

I’ve used GIS features in MySQL for a long time. In my previous blog entries I’ve shown how to create geo-enabled applications with MySQL 5.6 and use MySQL 5.6 geo-spatial functions. In this blog post, I’ll look into what is new in MySQL 5.7 and how we can use those new features in practice for geo-enabled applications.

New in MySQL 5.7

MySQL 5.7 introduces the following major improvements and features for GIS:

  1. Spatial indexes for InnoDB. Finally it is here! This was a long overdue feature, which also prevented many companies from converting all tables to InnoDB.
  2. st_distance_sphere: native function to calculate a distance between two points on earth. Finally it is here as well! Like many others, I’ve created my stored procedure to calculate the distance between points on earth using haversine formula. The native function is ~20x faster than the stored procedure (in an artificial benchmark, see below). This is not surprising, as stored procedures are slow computationally – especially for trigonometrical functions.
  3. New functions: GeoHash and GeoJSON. With GeoJSON we can generate the results that are ready for visualizing on Google Maps.
  4. New GIS implementation based on Boost.Geometry library. This is great news, as originally GIS was implemented independently from scratch with a very limited set of features. Manyi Lu from MySQL server team provides more reasoning behind the choice of Boost.Geometry.

This is the great news. The bad news is that except for the st_distance_sphere, all other functions use planar geometry (no change since MySQL 5.6) and do not support Spatial Reference System Identifier (SRID). That means that if I want to calculate the distance of my favorite bike path in miles or kilometers, I’ll still have to use a stored function (see below for an example) or write an application code for that. Native function st_distance will ignore SRID for now and return a value which represents a distance on a planar – not very useful for our purposes (may be useful for order by / compare).

Distance on Sphere

MySQL 5.7 introduces the function st_distance_sphere, which uses a haversine formula to calculate distance. He is the example:

mysql> select st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954)); +--------------------------------------------------------------------------------+ | st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954)) | +--------------------------------------------------------------------------------+ | 3855600.7928957273 | +--------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

The distance is in meters by default (you can also change the radius of the earth to meters using the 3rd optional parameter, default: 6,370,986). Although our earth is represented as an oblate spheroid, all practical applications use the distance on a sphere. The difference between the haversine formula and more precise (and much slower) functions is negligible for our purposes.

The st_distance_sphere is much faster than using stored routines. Here is the artificial benchmark:

mysql> select benchmark(1000000, haversine_distance_sp(37.60954, -122.38657, 35.890334, -78.7698947)); +-----------------------------------------------------------------------------------------+ | benchmark(1000000, haversine_distance_sp(37.60954, -122.38657, 35.890334, -78.7698947)) | +-----------------------------------------------------------------------------------------+ | 0 | +-----------------------------------------------------------------------------------------+ 1 row in set (22.55 sec) mysql> select benchmark(1000000, st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954))); +----------------------------------------------------------------------------------------------------+ | benchmark(1000000, st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954))) | +----------------------------------------------------------------------------------------------------+ | 0 | +----------------------------------------------------------------------------------------------------+ 1 row in set (0.77 sec)

haversine_distance_sp is a stored routine implementation of the same algorithm.

InnoDB GIS example: find 10 restaurants near me 

In my previous blog post I’ve demonstrated how to use st_within function to find restaurants inside my zipcode (US postal code) and sort by distance. In MySQL 5.7 there will be 2 changes:

  1. We can use InnoDB table
  2. We can use st_distance_sphere function

For this example, I’ve converted Open Street Map data to MySQL and then created a new InnoDB table:

CREATE TABLE `points_new` ( `OGR_FID` int(11) NOT NULL AUTO_INCREMENT, `SHAPE` geometry NOT NULL, `osm_id` text, `name` text, `barrier` text, `highway` text, `ref` text, `address` text, `is_in` text, `place` text, `man_made` text, `other_tags` text, UNIQUE KEY `OGR_FID` (`OGR_FID`), SPATIAL KEY `SHAPE` (`SHAPE`) ) ENGINE=InnoDB AUTO_INCREMENT=13660668 DEFAULT CHARSET=latin1

SHAPE is declared as geometry (and stores points in this table). We also have SPATIAL KEY SHAPE in the InnoDB table.

The following query will find all cafe or restaurants in Durham, NC (zipcode: 27701):

SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist FROM points_new WHERE st_within(shape, (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') and name is not null ORDER BY dist asc LIMIT 10;

Table tl_2013_us_zcta510 stores the shapes of polygons for all US zipcodes. (It needs to be converted to MySQL.) In this example I’m using st_within to filter only the POIs I need, and st_distance_sphere to get the distance from my location (-78.9064543 35.9975194 are the coordinates of Percona’s office in Durham) to the restaurants.

Explain plan:

mysql> EXPLAIN -> SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist -> FROM points_new -> WHERE st_within(shape, -> (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) -> and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') -> and name is not null -> ORDER BY dist asc LIMIT 10G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: points_new partitions: NULL type: range possible_keys: SHAPE key: SHAPE key_len: 34 ref: NULL rows: 21 filtered: 18.89 Extra: Using where; Using filesort *************************** 2. row *************************** id: 2 select_type: SUBQUERY table: tl_2013_us_zcta510 partitions: NULL type: ref possible_keys: zcta5ce10 key: zcta5ce10 key_len: 8 ref: const rows: 1 filtered: 100.00 Extra: NULL 2 rows in set, 1 warning (0.00 sec)

That looks pretty good: MySQL is using and index on the SHAPE field (even with the subquery, btw).

Results:

mysql> SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist, st_astext(shape) -> FROM points_new -> WHERE st_within(shape, -> (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) -> and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') -> and name is not null -> ORDER BY dist asc LIMIT 10; +------------+----------------------------+--------+--------------------------------------+ | osm_id | name | dist | st_astext(shape) | +------------+----------------------------+--------+--------------------------------------+ | 880747417 | Pop's | 127.16 | POINT(-78.9071795 35.998501) | | 1520441350 | toast | 240.55 | POINT(-78.9039761 35.9967069) | | 2012463902 | Pizzeria Toro | 256.44 | POINT(-78.9036457 35.997125) | | 398941519 | Parker & Otis | 273.39 | POINT(-78.9088833 35.998997) | | 881029843 | Torero's | 279.96 | POINT(-78.90829140000001 35.9995516) | | 299540833 | Fishmonger's | 300.01 | POINT(-78.90850250000001 35.9996487) | | 1801595418 | Lilly's Pizza | 319.83 | POINT(-78.9094462 35.9990732) | | 1598401100 | Dame's Chicken and Waffles | 323.82 | POINT(-78.9031929 35.9962871) | | 685493947 | El Rodeo | 379.18 | POINT(-78.909865 35.999523) | | 685504784 | Piazza Italia | 389.06 | POINT(-78.9096472 35.9998794) | +------------+----------------------------+--------+--------------------------------------+ 10 rows in set (0.13 sec)

0.13 seconds response time on AWS t2.medium box sounds reasonable to me. The same query on the MyISAM table shows ~same response time: 0.14 seconds.

GeoJSON feature and Google Maps

Another nice feature of MySQL 5.7 GIS is GeoJSON function: you can convert your result set to GeoJSON, which can be used with other applications (for example Google Maps API).

Let’s say I want to visualize the above result set on Google Map. As the API requires a specific format, I can use concat / group_concat to apply the format inside the SQL:

SELECT CONCAT('{ "type": "FeatureCollection", "features": [ ', GROUP_CONCAT('{ "type": "Feature", "geometry": ', ST_AsGeoJSON(shape), ', "properties": {} }'), '] }') as j FROM points_new WHERE st_within(shape, (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') and name is not null

I will get all the restaurants and cafes in zipcode 27701. Here I’m using ST_AsGeoJSON(shape) to convert to GeoJSON, and concat/group_concat to “nest” the whole result into the format suitable for Google Maps.

Result:

mysql> set group_concat_max_len = 1000000; Query OK, 0 rows affected (0.00 sec) mysql> SELECT CONCAT('{ '> "type": "FeatureCollection", '> "features": [ '> ', -> GROUP_CONCAT('{ '> "type": "Feature", '> "geometry": ', ST_AsGeoJSON(shape), ', '> "properties": {} '> }'), -> '] '> }') as j -> FROM points_new -> WHERE st_within(shape, -> (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') ) -> and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%') -> and name is not null *************************** 1. row *************************** j: { "type": "FeatureCollection", "features": [ { "type": "Feature", "geometry": {"type": "Point", "coordinates": [-78.890852, 35.9903403]}, "properties": {} },{ "type": "Feature", "geometry": {"type": "Point", "coordinates": [-78.8980807, 35.9933562]}, "properties": {} },{ "type": "Feature", "geometry": {"type": "Point", "coordinates": [-78.89972490000001, 35.995879]}, "properties": {} } ... ,{ "type": "Feature", "geometry": {"type": "Point", "coordinates": [-78.9103211, 35.9998494]}, "properties": {} },{ "type": "Feature", "geometry": {"type": "Point", "coordinates": [-78.9158326, 35.9967114]}, "properties": {} }] } 1 row in set (0.14 sec)

I did not include the full result set for the lack of space; I also had to change the group concat max length, otherwise MySQL will cut the result of the group_concat function.

Now I can visualize it:

 

 

 

 

 

 

 

 

 

 

Example: Find the longest bike path

MySQL 5.7 (as well as the older versions) supports st_length function to calculate a length of a linestring. However, even in MySQL 5.7 st_length can’t calculate the distance on earth. To find the distance of a linestring I’ve created a very simple stored procedure:

DELIMITER // CREATE DEFINER=CURRENT_USER() FUNCTION `ls_distance_sphere`(ls GEOMETRY) RETURNS DECIMAL(20,8) DETERMINISTIC BEGIN DECLARE i, n INT DEFAULT 0; DECLARE len DECIMAL(20,8) DEFAULT 0; SET i = 1; SET n = ST_NumPoints(ls); WHILE i < n DO SET len = len + st_distance_sphere(st_pointN(ls, i), st_pointN(ls, i+1)); SET i = i + 2; END WHILE; RETURN len; END // DELIMITER ;

As the Open Street Map data has the information about roads in North America, we can use this function to calculate the length (in meters) for every road it stores:

mysql> select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null limit 10; +---------------------------------------+---------------------------+ | name | ls_distance_sphere(shape) | +---------------------------------------+---------------------------+ | Highbury Park Drive Bypass | 0.97386664 | | Ygnacio Canal Trail | 0.86093199 | | South Marion Parkway | 1.06723424 | | New River Greenway | 1.65705401 | | Northern Diversion Trail | 2.08269808 | | Gary L. Haller Trail;Mill Creek Trail | 2.09988209 | | Loop 1 | 2.05297129 | | Bay Farm Island Bicycle Bridge | 2.51141623 | | Burrard Street | 1.84810259 | | West 8th Avenue | 1.76338236 | +---------------------------------------+---------------------------+ 10 rows in set (0.00 sec)

Index the polygon/area distance using MySQL 5.7 virtual fields

To really answer the question “what is the longest bikepath (cyclepath) in North America?” we will have to order by stored function result. This will cause a full table scan and a filestort, which will be extremely slow for 30 millions of rows. The standard way to fix this is to materialize this road distance: add an additional field to the table and store the distance there.

In MySQL 5.7 we can actually use Generated (Virtual) Columns feature:

CREATE TABLE `lines_new` ( `OGR_FID` int(11) NOT NULL AUTO_INCREMENT, `SHAPE` geometry NOT NULL, `osm_id` int(11) DEFAULT NULL, `name` varchar(255) DEFAULT NULL, `highway` varchar(60) DEFAULT NULL, `waterway` text, `aerialway` text, `barrier` text, `man_made` text, `other_tags` text, `linestring_length` decimal(15,8) GENERATED ALWAYS AS (st_length(shape)) VIRTUAL, PRIMARY KEY (`OGR_FID`), SPATIAL KEY `SHAPE` (`SHAPE`), KEY `linestring_length` (`linestring_length`), KEY `highway_len` (`highway`,`linestring_length`) ) ENGINE=InnoDB AUTO_INCREMENT=27077492 DEFAULT CHARSET=latin1

Unfortunately, MySQL 5.7 does not support non-native functions (stored procedures or UDF) in generated columns, so I have to use st_length in this example. Ordering by value of st_length may be OK though:

mysql> select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null order by linestring_length desc limit 10; +-----------------------------+---------------------------+ | name | ls_distance_sphere(shape) | +-----------------------------+---------------------------+ | Confederation Trail | 55086.92572725 | | Cowboy Trail | 43432.06768706 | | Down East Sunrise Trail | 42347.39791330 | | Confederation Trail | 29844.91038542 | | Confederation Trail | 26141.04655981 | | Longleaf Trace | 29527.66063726 | | Cardinal Greenway | 30613.24487294 | | Lincoln Prairie Grass Trail | 19648.26787218 | | Ghost Town Trail | 25610.52158647 | | Confederation Trail | 27086.54829531 | +-----------------------------+---------------------------+ 10 rows in set (0.02 sec)

The query is very fast as it uses an index on both highway and linestring:

mysql> explain select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null order by linestring_length desc limit 10G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: lines_new partitions: NULL type: ref possible_keys: highway_len key: highway_len key_len: 63 ref: const rows: 119392 filtered: 90.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)

Conclusion

MySQL 5.7 contains a great set of features to work with geospatial data. Finally, spatial indexes are supported in InnoDB; st_distance_sphere as a native function is very useful. Unfortunately, other spatial functions only work with planar coordinates and do not support SRID. I hope this will be fixed in the new releases.

Percona Live Crash Courses: for MySQL and MongoDB!

February 2, 2016 - 11:45am

Percona Live Crash Courses for MySQL and MongoDB

The database community constantly tells us how hard it is to find someone with MySQL and MongoDB DBA skills who can help with the day-to-day management of their databases. This is especially difficult when companies don’t have a full-time requirement for a DBA. Developers, system administrators and IT staff spend too much time trying to solve basic database problems that keep them from doing their day job. Eventually the little problems or performance inefficiencies that start to pile up  lead to big problems.  

In answer to this growing need, Percona Live is once again hosting Crash Courses for developers, systems administrators, and other technical resources. This year, we’ve compacted the training into a single day, and are offering two options: MySQL 101 and MongoDB 101!

Don’t let the name fool you: these courses are led by Percona MySQL experts who will show you the fundamentals of MySQL or MongoDB tools and techniques.  

And it’s not just for DBAs: developers are encouraged to attend to hone their database skills. Developers who create code that can scale to match the demands of the online community are both a resource and and an investment.

Below are a list of the topics covered for each course:

MySQL 101 Topics

MongoDB 101 Topics

  • Schema Review 101: How and What You Should Be Looking at…
  • Choosing a MySQL High Availability Solution Today
  • MySQL Performance Troubleshooting Best Practices
  • Comparing Synchronous Replication Solutions in the Cloud
  • Cost Optimizations Through MySQL Performance Optimizations
  • SQL with MySQL or NoSQL with MongoDB?
  • MongoDB for MySQL DBA’s
  • MongoDB Storage Engine Comparison
  • MongoDB 3.2: New Features Overview

 

Attendees will return ready to quickly and correctly take care of the day-to-day and week-to-week management of your MySQL or MongoDB environment.

The schedule and non-conference cost for the 101 courses are:

  • MySQL 101: Tuesday April 19th ($400)
  • MongoDB 101: Wednesday April 20th ($400)
  • Both MySQL and MongoDB 101 sessions ($700)

(Tickets to the 101 sessions do not grant access to the main Percona Live breakout sessions. Full Percona Live conferences passes will grant admission to the 101 sessions. 101 Crash Course attendees will have full access to Percona Live keynote speakers the exhibit hall and receptions.)

As a special promo, the first 101 people to purchase the 101 talks receive a $299.00 discount off the ticket price! Each session only costs $101! Get both sessions for a mere $202! Register now, and use the following codes for your first 101 discount:

  • Single101= $299 off of either the MySQL or MongoDB tickets
  • Double101= $498 off of the combined MySQL/MongoDB ticket

Sign up now for special track pricing. Click here to register.

Birds of a Feather

Birds of a Feather (BOF) sessions enable attendees with interests in the same project or topic to enjoy some quality face time. BOFs can be organized for individual projects or broader topics (e.g., best practices, open data, standards). Any attendee or conference speaker can propose and moderate an engaging BOF. Percona will post the selected topics and moderators online and provide a meeting space and time. The BOF sessions will be held Tuesday, April 19, 2016 at 6:00 p.m. The deadline for BOF submissions is February 7.

Lightning Talks

Lightning Talks provide an opportunity for attendees to propose, explain, exhort, or rant on any MySQL, NoSQL or Data in the Cloud-related topic for five minutes. Topics might include a new idea, successful project, cautionary story, quick tip, or demonstration. All submissions will be reviewed, and the top 10 will be selected to present during one of the scheduled breakout sessions during the week. Lighthearted, fun or otherwise entertaining submissions are highly welcome. The deadline for submitting a Lightning Talk topic is February 7, 2016.

Experimental Percona Docker images for Percona Server

February 2, 2016 - 9:02am

Docker is incredibly popular tool for deploying software, so we decided to provide a Percona Docker image for both Percona Server MySQL and Percona Server for MongoDB.

We want to create an easy way to try our products.

There are actually some images available from https://hub.docker.com/_/percona/, but these images are provided by Docker itself, not from Percona.

In our images, we provide all the varieties of storage engines available in Percona Server (MySQL/MongoDB).

Our images are available from https://hub.docker.com/r/percona/.

The simplest way to get going is to run the following:

docker run --name ps -e MYSQL_ROOT_PASSWORD=secret -d percona/percona-server:latest

for Percona Server/MySQL, and:

docker run --name psmdb -d percona/percona-server-mongodb:latest

for Percona Server/MongoDB.

It is very easy to try the different storage engines that comes with Percona Server for MongoDB. For example, to use RocksDB, run:

docker run --name psmdbrocks -d percona/percona-server-mongodb:latest --storageEngine=RocksDB

or PerconaFT:

docker run --name psmdbperconaft -d percona/percona-server-mongodb:latest --storageEngine=PerconaFT

We are looking for any feedback  you’d like to provide: if this is useful, and what improvements we could make.

InnoDB and TokuDB on AWS

February 1, 2016 - 8:38am

In a recent post, Vadim compared the performance of Amazon Aurora and Percona Server on AWS. This time, I am comparing write throughput for InnoDB and TokuDB, using the same workload (sysbench oltp/update/update_non_index) and a similar set-up (r3.xlarge instance, with general purpose ssd, io2000 and io3000 volumes) to his experiments.

All the runs used 16 threads for sysbench, and the following MySQL configuration files for InnoDB and TokuDB respectively:

[mysqld] table-open-cache-instances=32 table_open_cache=8000 innodb-flush-method = O_DIRECT innodb-log-files-in-group = 2 innodb-log-file-size = 16G innodb-flush-log-at-trx-commit = 1 innodb_log_compressed_pages =0 innodb-file-per-table = 1 innodb-buffer-pool-size = 20G innodb_write_io_threads = 8 innodb_read_io_threads = 32 innodb_open_files = 1024 innodb_old_blocks_pct =10 innodb_old_blocks_time =2000 innodb_checksum_algorithm = crc32 innodb_file_format =Barracuda innodb_io_capacity=1500 innodb_io_capacity_max=2000 metadata_locks_hash_instances=256 innodb_max_dirty_pages_pct=90 innodb_flush_neighbors=1 innodb_buffer_pool_instances=8 innodb_lru_scan_depth=4096 innodb_sync_spin_loops=30 innodb-purge-threads=16

[mysqld] tokudb_read_block_size=16K tokudb_fanout=128 table-open-cache-instances=32 table_open_cache=8000 metadata_locks_hash_instances=256 [mysqld_safe] thp-setting=never

You can see the full set of graphs here, and the complete results here.

Let me start illustrating the results with this summary graph for the io2000 volume, showing how write throughput varies over time, per engine and workload (for all graphs, size is in 1k rows, so 1000 is actually 1M):

We can see a few things already:

  • InnoDB has better throughput for smaller table sizes.
  • The reverse is true as size becomes big enough (after 10M rows here).
  • TokuDB’s advantage is not noticeable on the oltp workload, though it is for InnoDB.

Let’s dig in a bit more and look at the extreme ends in terms of table size, starting with 1M rows:

and ending in 50M:

In the first case, we can see that not only does InnoDB show better write throughput, it also shows less variance. In the second case, we can confirm that the difference does not seem significant for oltp, but it is for the other workloads.

This should come as no surprise, as one of the big differences between TokuDB’s Fractal trees and InnoDB’s B-tree implementation is the addition of message buffers to nodes, to handle writes (the other big difference, for me, is node size). For write-intensive workloads, TokuDB needs to do a lot  less tree traversing than InnoDB (in fact, this is done only to validate uniqueness constraints when required, otherwise writes are just injected into the message buffer and the buffer is flushed to lower levels of the tree asynchronously. I refer you to this post for more details).

For oltp, InnoDB is at advantage at smaller table sizes, as it does not need to scan message buffers all across the search path when reading (nothing is free in life, and this is the cost for TokuDB’s advantage for writes). I suspect this advantage is lost for high enough table sizes because at that point, either engine will be I/O bound anyway.

My focus here was write throughput, but as a small example see how this is reflected on response time if we pick the 50M table size and drop oltp from the mix:

At this point, you may be wondering why I focused on the io2000 results (and if you’re not, bear with me please!). The reason is the results for io3000 and the general purpose ssd showed characteristics that I attribute to latency on the volumes. You can see what I mean by looking at the io3000 graph:

I say “I attribute” because, unfortunately, I do not have any metrics other than sysbench’s output to go with (an error I will amend on future benchmarks!). I have seen the same pattern while working on production systems on AWS, and in those cases I was able to correlate it with increases in stime and/or qtime on diskstats. The fact that this is seen on the lower and higher capacity volumes for the same workload, but not the io2000 one, increases my confidence in this assumption.

Conclusion

I would not consider TokuDB a general purpose replacement for InnoDB, by which I mean I would never blindly suggest someone to migrate from one to the other, as the performance characteristics are different enough to make this risky without a proper assessment.

That said, I believe TokuDB has great advantages for the right scenarios, and this test highlights some of its strengths:

  • It has a significant advantage over InnoDB on slower devices and bigger data sets.
  • For big enough data sets, this is even the case on fast devices and write intensive workloads, as the B-tree becomes I/O bound much faster

Other advantages of TokuDB over InnoDB, not directly evidenced from these results, are:

  • Better compression (helped by the much larger block size).
  • Better SSD lifetime, due to less and more sequential writes (sequential writes have, in theory at least, no write amplification compared to random ones, so even though the sequential/random difference should not matter for SSDs for performance, it does for lifetime).

EXPLAIN FORMAT=JSON knows everything about UNIONs: union_result and query_specifications

January 29, 2016 - 11:09am

Ready for another post in the EXPLAIN FORMAT=JSON is Cool series! Great! This post will discuss how to see all the information that is contained in optimized queries with UNION using the union_result and query_specifications commands.

 

When optimizing complicated queries with UNION, it is easy to get lost in the regular EXPLAIN  output trying to identify which part of the output belongs to each part of the UNION.

Let’s consider the following example:

mysql> explain -> select emp_no, last_name, 'low_salary' from employees -> where emp_no in (select emp_no from salaries -> where salary < (select avg(salary) from salaries)) -> union -> select emp_no, last_name, 'high salary' from employees -> where emp_no in (select emp_no from salaries -> where salary >= (select avg(salary) from salaries))G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: employees partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 299778 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: PRIMARY table: salaries partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: employees.employees.emp_no rows: 9 filtered: 33.33 Extra: Using where; FirstMatch(employees) *************************** 3. row *************************** id: 3 select_type: SUBQUERY table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2557022 filtered: 100.00 Extra: NULL *************************** 4. row *************************** id: 4 select_type: UNION table: employees partitions: NULL type: ALL possible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 299778 filtered: 100.00 Extra: NULL *************************** 5. row *************************** id: 4 select_type: UNION table: salaries partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: employees.employees.emp_no rows: 9 filtered: 33.33 Extra: Using where; FirstMatch(employees) *************************** 6. row *************************** id: 6 select_type: SUBQUERY table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2557022 filtered: 100.00 Extra: NULL *************************** 7. row *************************** id: NULL select_type: UNION RESULT table: <union1,4> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL filtered: NULL Extra: Using temporary 7 rows in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'low_salary' AS `low_salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))) union /* select#4 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'high salary' AS `high salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`)))

While we can guess that subquery 3 belongs to the first query of the union, and subquery 6 belongs to the second (which has number 4 for some reason), we have to be very careful (especially in our case) when queries use the same tables in both parts of the UNION.

The main issue with the regular EXPLAIN for UNION  is that it has to re-present the hierarchical structure as a table. The same issue occurs when you want to store objects created in programming language, such as Java, in the database.

EXPLAIN FORMAT=JSON, on the other hand, has hierarchical structure and more clearly displays how UNION was optimized:

mysql> explain format=json select emp_no, last_name, 'low_salary' from employees where emp_no in (select emp_no from salaries where salary < (select avg(salary) from salaries)) union select emp_no, last_name, 'high salary' from employees where emp_no in (select emp_no from salaries where salary >= (select avg(salary) from salaries))G *************************** 1. row *************************** EXPLAIN: { "query_block": { "union_result": { "using_temporary_table": true, "table_name": "<union1,4>", "access_type": "ALL", "query_specifications": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 1, "cost_info": { "query_cost": "921684.48" }, "nested_loop": [ { "table": { "table_name": "employees", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows_examined_per_scan": 299778, "rows_produced_per_join": 299778, "filtered": "100.00", "cost_info": { "read_cost": "929.00", "eval_cost": "59955.60", "prefix_cost": "60884.60", "data_read_per_join": "13M" }, "used_columns": [ "emp_no", "last_name" ] } }, { "table": { "table_name": "salaries", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.employees.emp_no" ], "rows_examined_per_scan": 9, "rows_produced_per_join": 299778, "filtered": "33.33", "first_match": "employees", "cost_info": { "read_cost": "302445.97", "eval_cost": "59955.60", "prefix_cost": "921684.48", "data_read_per_join": "4M" }, "used_columns": [ "emp_no", "salary" ], "attached_condition": "(`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))", "attached_subqueries": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 3, "cost_info": { "query_cost": "516948.40" }, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2557022, "rows_produced_per_join": 2557022, "filtered": "100.00", "cost_info": { "read_cost": "5544.00", "eval_cost": "511404.40", "prefix_cost": "516948.40", "data_read_per_join": "39M" }, "used_columns": [ "salary" ] } } } ] } } ] } }, { "dependent": false, "cacheable": true, "query_block": { "select_id": 4, "cost_info": { "query_cost": "921684.48" }, "nested_loop": [ { "table": { "table_name": "employees", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows_examined_per_scan": 299778, "rows_produced_per_join": 299778, "filtered": "100.00", "cost_info": { "read_cost": "929.00", "eval_cost": "59955.60", "prefix_cost": "60884.60", "data_read_per_join": "13M" }, "used_columns": [ "emp_no", "last_name" ] } }, { "table": { "table_name": "salaries", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.employees.emp_no" ], "rows_examined_per_scan": 9, "rows_produced_per_join": 299778, "filtered": "33.33", "first_match": "employees", "cost_info": { "read_cost": "302445.97", "eval_cost": "59955.60", "prefix_cost": "921684.48", "data_read_per_join": "4M" }, "used_columns": [ "emp_no", "salary" ], "attached_condition": "(`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))", "attached_subqueries": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 6, "cost_info": { "query_cost": "516948.40" }, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2557022, "rows_produced_per_join": 2557022, "filtered": "100.00", "cost_info": { "read_cost": "5544.00", "eval_cost": "511404.40", "prefix_cost": "516948.40", "data_read_per_join": "39M" }, "used_columns": [ "salary" ] } } } ] } } ] } } ] } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'low_salary' AS `low_salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` < (/* select#3 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))) union /* select#4 */ select `employees`.`employees`.`emp_no` AS `emp_no`,`employees`.`employees`.`last_name` AS `last_name`,'high salary' AS `high salary` from `employees`.`employees` semi join (`employees`.`salaries`) where ((`employees`.`salaries`.`emp_no` = `employees`.`employees`.`emp_no`) and (`employees`.`salaries`.`salary` >= (/* select#6 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`)))

First it puts member union_result in the query_block  at the very top level:

EXPLAIN: { "query_block": { "union_result": {

The union_result object contains information about how the result set of the UNION was processed:

"using_temporary_table": true, "table_name": "<union1,4>", "access_type": "ALL",

And also contains the query_specifications array which also contains all the details about queries in the UNION:

"query_specifications": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 1, <skipped> { "dependent": false, "cacheable": true, "query_block": { "select_id": 4,

This representation is much more clear, and also contains all the details which the regular EXPLAIN misses for regular queries.

Conclusion: EXPLAIN FORMAT=JSON not only contains additional optimization information for each query in the UNION, but also has a hierarchical structure that is more suitable for the hierarchical nature of the UNION.

Percona XtraDB Cluster 5.6.28-25.14 is now available

January 29, 2016 - 5:34am

Percona is glad to announce the new release of Percona XtraDB Cluster 5.6 on January 29, 2016. Binaries are available from the downloads area or from our software repositories.

Percona XtraDB Cluster 5.6.28-25.14 is now the current release, based on the following:

All of Percona software is open-source and free, and all the details of the release can be found in the 5.6.28-25.14 milestone at Launchpad.

For more information about relevant Codership releases, see this announcement.

Bugs Fixed:

  • 1494399: Fixed issue caused by replication of events on certain system tables (for example, mysql.slave_master_info, mysql.slave_relay_log_info). Replication in the Galera eco-system is now avoided when bin-logging is disabled for said tables.
    NOTE: As part of this fix, when bin-logging is enabled, replication in the Galera eco-system will happen only if BINLOG_FORMAT is set to either ROW or STATEMENT. The recommended format is ROW, while STATEMENT is required only for the pt-table-checksum tool to operate correctly. If BINLOG_FORMAT is set to MIXED, replication of events in the Galera eco-system tables will not happen even with bin-logging enabled for those tables.
  • 1522385: Fixed GTID holes caused by skipped replication. A slave might ignore an event replicated from master, if the same event has already been executed on the slave. Such events are now propagated in the form of special GTID events to maintain consistency.
  • 1532857: The installer now creates a /var/lib/galera/ directory (assigned to user nobody), which can be used by garbd in the event it is started from a directory that garbd cannot write to.

Known Issues:

  • 1531842: Two instances of garbd cannot be started from the same working directory. This happens because each instance creates a state file (gvwstate.dat) in the current working directory by default. Although garbd is configured to use the base_dir variable, it was not registered due to a bug. Until garbd is fixed, you should start each instance from a separate working directory.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Vote Percona Server in LinuxQuestions.org Members Choice Awards

January 28, 2016 - 1:13pm

Percona is calling on you! Vote Percona for Database of the Year in LinuxQuestions.org Members Choice Awards 2015. Help our Percona Server get recognized as one of the best database options for data performance. Percona Server is a free, fully compatible, enhanced, open source drop-in replacement for MySQL® that provides superior performance, scalability and instrumentation.

LinuxQuestions.org, or LQ for short, is a community-driven, self-help web site for Linux users. Each year, LinuxQuestions.org holds an annual competition to recognize the year’s best-in-breed technologies. The winners of each category are determined by the online Linux community!

You can vote now for your favorite products of 2015 (Percona, of course!). This is your chance to be heard!

Voting ends on February 10th, 2016. You must be a registered member of LinuxQuestions.org with at least one post on their forums to vote.

Setup a MongoDB replica/sharding set in seconds

January 28, 2016 - 11:09am

In the MySQL world, we’re used to playing in the MySQL Sandbox. It allows us to deploy a testing replication environment in seconds, without a great deal of effort or navigating multiple virtual machines. It is a tool that we couldn’t live without in Support.

In this post I am going to walk through the different ways we have to deploy a MongoDB replica/sharding set test in a similar way. It is important to mention that this is not intended for production, but to be used for troubleshooting, learning or just playing around with replication.

Replica Set regression test’s diagnostic commands

MongoDB includes a .js that allows us to deploy a replication set from the MongoDB’s shell. Just run the following:

# mongo --nodb > var rstest = new ReplSetTest( { name: 'replicaSetTest', nodes: 3 } ) > rstest.startSet() ReplSetTest Starting Set ReplSetTest n is : 0 ReplSetTest n: 0 ports: [ 31000, 31001, 31002 ] 31000 number { "useHostName" : true, "oplogSize" : 40, "keyFile" : undefined, "port" : 31000, "noprealloc" : "", "smallfiles" : "", "rest" : "", "replSet" : "replicaSetTest", "dbpath" : "$set-$node", "restart" : undefined, "pathOpts" : { "node" : 0, "set" : "replicaSetTest" } } ReplSetTest Starting.... [...]

At some point our mongod daemons will be running, each with its own data directory and port:

2133 pts/0 Sl+ 0:01 mongod --oplogSize 40 --port 31000 --noprealloc --smallfiles --rest --replSet replicaSetTest --dbpath /data/db/replicaSetTest-0 --setParameter enableTestCommands=1 2174 pts/0 Sl+ 0:01 mongod --oplogSize 40 --port 31001 --noprealloc --smallfiles --rest --replSet replicaSetTest --dbpath /data/db/replicaSetTest-1 --setParameter enableTestCommands=1 2213 pts/0 Sl+ 0:01 mongod --oplogSize 40 --port 31002 --noprealloc --smallfiles --rest --replSet replicaSetTest --dbpath /data/db/replicaSetTest-2 --setParameter enableTestCommands=1

Perfect. Now we need to initialize the replicaset:

> rstest.initiate() { "replSetInitiate" : { "_id" : "replicaSetTest", "members" : [ { "_id" : 0, "host" : "debian:31000" }, { "_id" : 1, "host" : "debian:31001" }, { "_id" : 2, "host" : "debian:31002" } ] } } m31000| 2016-01-24T10:42:36.639+0100 I REPL [ReplicationExecutor] Member debian:31001 is now in state SECONDARY m31000| 2016-01-24T10:42:36.639+0100 I REPL [ReplicationExecutor] Member debian:31002 is now in state SECONDARY [...]

and it is done!

> rstest.status() { "set" : "replicaSetTest", "date" : ISODate("2016-01-24T09:43:41.261Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "debian:31000", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 329, "optime" : Timestamp(1453628552, 1), "optimeDate" : ISODate("2016-01-24T09:42:32Z"), "electionTime" : Timestamp(1453628554, 1), "electionDate" : ISODate("2016-01-24T09:42:34Z"), "configVersion" : 1, "self" : true }, { "_id" : 1, "name" : "debian:31001", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 68, "optime" : Timestamp(1453628552, 1), "optimeDate" : ISODate("2016-01-24T09:42:32Z"), "lastHeartbeat" : ISODate("2016-01-24T09:43:40.671Z"), "lastHeartbeatRecv" : ISODate("2016-01-24T09:43:40.677Z"), "pingMs" : 0, "configVersion" : 1 }, { "_id" : 2, "name" : "debian:31002", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 68, "optime" : Timestamp(1453628552, 1), "optimeDate" : ISODate("2016-01-24T09:42:32Z"), "lastHeartbeat" : ISODate("2016-01-24T09:43:40.672Z"), "lastHeartbeatRecv" : ISODate("2016-01-24T09:43:40.690Z"), "pingMs" : 0, "configVersion" : 1 } ], "ok" : 1 }

There are many more commands you can run, just type rstest. and then press Tab twice to get the list. Follow this link if you need more info:

http://api.mongodb.org/js/current/symbols/_global_.html#ReplSetTest

What about sharding? Pretty similar:

> var shtest = new ShardingTest({ shards: 2, mongos: 1 })

This is the documentation link if you need more info:

http://api.mongodb.org/js/current/symbols/_global_.html#ShardingTest

It is important to mention that if you close the mongo shell where you run the commands, then all the spawned mongod will also shut down.

Mtools

mtools is a collection of tools and scripts that make MongoDB’s DBA lives much easier. It includes mlaunch, which can be used to start replicate sets and sharded systems for testing.

https://github.com/rueckstiess/mtools

The mlaunch tool requires pymongo, so you need to install it:

# pip install pymongo

You can also use pip to install mtools:

# pip install mtools

Then, we can just start our replica set. In this case, with two nodes and one arbiter:

# mlaunch --replicaset --nodes 2 --arbiter --name "replicaSetTest" --port 3000 launching: mongod on port 3000 launching: mongod on port 3001 launching: mongod on port 3002 replica set 'replicaSetTest' initialized. # ps -x | grep mongod 10246 ? Sl 0:03 mongod --replSet replicaSetTest --dbpath /root/data/replicaSetTest/rs1/db --logpath /root/data/replicaSetTest/rs1/mongod.log --port 3000 --logappend --fork 10257 ? Sl 0:03 mongod --replSet replicaSetTest --dbpath /root/data/replicaSetTest/rs2/db --logpath /root/data/replicaSetTest/rs2/mongod.log --port 3001 --logappend --fork 10274 ? Sl 0:03 mongod --replSet replicaSetTest --dbpath /root/data/replicaSetTest/arb/db --logpath /root/data/replicaSetTest/arb/mongod.log --port 3002 --logappend --fork

Done. You can also deploy a shared cluster, or a sharded replica set. More information in the following link:

https://github.com/rueckstiess/mtools/wiki/mlaunch

Ognom Toolkit

“It is a set of utilities, functions and tests with the goal of making the life of MongoDB/TokuMX administrators easier.”

This toolkit has been created by Fernando Ipar and Sveta Smirnova, and includes a set of scripts that allow us to deploy a testing environment for both sharding and replication configurations. The main difference is that you can specify what storage engine will be the default, something you cannot do with other to methods.

https://github.com/Percona-Lab/ognom-toolkit

We have the tools we need under “lab” directory. Most of the names are pretty self-explanatory:

~/ognom-toolkit/lab# ls README.md start_multi_dc_simulation start_sharded_test stop_all_mongo stop_sharded_test common.sh start_replica_set start_single stop_replica_set stop_single

So, let’s say we want a replication cluster with four nodes that will use PerconaFT storage engine. We have to do the following:

Set a variable with the storage engine we want to use:

# export MONGODB_ENGINE=PerconaFT

Specify where is our mongod binary:

# export MONGOD=/usr/bin/mongod

Start our 4 nodes replica set:

# ./start_replica_set Starting 4 mongod instances 2016-01-25T12:36:04.812+0100 I STORAGE Compression: snappy 2016-01-25T12:36:04.812+0100 I STORAGE MaxWriteMBPerSec: 1024 2016-01-25T12:36:04.813+0100 I STORAGE Crash safe counters: 0 about to fork child process, waiting until server is ready for connections. forked process: 1086 child process started successfully, parent exiting [...] MongoDB shell version: 3.0.8 connecting to: 127.0.0.1:27001/test { "set" : "rsTest", "date" : ISODate("2016-01-25T11:36:09.039Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "debian:27001", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 5, "optime" : Timestamp(1453721767, 5), "optimeDate" : ISODate("2016-01-25T11:36:07Z"), "electionTime" : Timestamp(1453721767, 2), "electionDate" : ISODate("2016-01-25T11:36:07Z"), "configVersion" : 4, "self" : true }, { "_id" : 1, "name" : "debian:27002", "health" : 1, "state" : 5, "stateStr" : "STARTUP2", "uptime" : 1, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2016-01-25T11:36:07.991Z"), "lastHeartbeatRecv" : ISODate("2016-01-25T11:36:08.093Z"), "pingMs" : 0, "configVersion" : 2 }, { "_id" : 2, "name" : "debian:27003", "health" : 1, "state" : 0, "stateStr" : "STARTUP", "uptime" : 1, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2016-01-25T11:36:07.991Z"), "lastHeartbeatRecv" : ISODate("2016-01-25T11:36:08.110Z"), "pingMs" : 2, "configVersion" : -2 }, { "_id" : 3, "name" : "debian:27004", "health" : 1, "state" : 0, "stateStr" : "STARTUP", "uptime" : 1, "optime" : Timestamp(0, 0), "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2016-01-25T11:36:08.010Z"), "lastHeartbeatRecv" : ISODate("2016-01-25T11:36:08.060Z"), "pingMs" : 18, "configVersion" : -2 } ], "ok" : 1 }

Now, just start using it:

rsTest:PRIMARY> db.names.insert({ "a" : "Miguel"}) rsTest:PRIMARY> db.names.stats() { "ns" : "mydb.names", "count" : 1, "size" : 36, "avgObjSize" : 36, "storageSize" : 16384, "capped" : false, "PerconaFT" : { [...]

Conclusion

When dealing with bugs, troubleshooting or testing some application that needs a complex MongoDB infrastructure, these processes can save us lot of time. No need of set up multiple virtual machines, deal with networking and human mistakes. Just say “I want a sharded cluster, do it for me.”

MongoDB revs you up: What storage engine is right for you? (Part 4)

January 27, 2016 - 12:13pm
Differentiating Between MongoDB Storage Engines: PerconaFT

In this series of posts, we discussed what a storage engine is, and how you can determine the characteristics of one versus the other:

“A database storage engine is the underlying software that a DBMS uses to create, read, update and delete data from a database. The storage engine should be thought of as a “bolt on” to the database (server daemon), which controls the database’s interaction with memory and storage subsystems.”

Generally speaking, it’s important to understand what type of work environment the database is going to interact with, and to select a storage engine that is tailored to that environment.

The first post looked at MMAPv1, the original default engine for MongoDB (through release 3.0). The second post examined WiredTiger, the new default MongoDB engine. The third post reviewed RocksDB, an engine developed for the Facebook environment.

This post will cover PerconaFT. PerconaFT was developed out of Percona’s acquisition of Tokutek, from their TokuDB product.

PerconaFT

Find it in: Percona Builds

PerconaFT is the newest version of the Fractal Tree storage engine that was designed and implemented by Tokutek, which was acquired by Percona in April of 2015. Designed at MIT, SUNY Stony Brook and Rutgers, the Fractal Tree is a data structure that aimed to remove disk bottlenecks from databases that were using the B-tree with datasets that were several times larger that cache.

PerconaFT is arguably the most “mature” storage engine for MongoDB, with support for document level concurrency and compression. The Fractal Tree was first commercially implemented in June of 2013 in TokuMX, a fork of MongoDB, with an advanced feature set.

As described previously, the Fractal Tree (which is available for MongoDB in the PerconaFT storage engine) is a write-optimized data structure utilizing many log-like “queues” called message buffers, but has an arrangement like that of a read-optimized data structure. With the combination of these properties, PerconaFT can provide high performance for applications with high insert rates, while providing very efficient lookups for update/query-based applications. This will theoretically provide very predictable and consistent performance as the database grows. Furthermore, PerconaFT typically provides, comparatively, the deepest compression rates of any of the engines we’ve discussed in this series.

An ideal fit for the PerconaFT storage engine is a system with varied workloads, where predictable vertical scaling is required in addition to the horizontal scaling provide MongoDB. Furthermore, the ability of PerconaFT to maintain performance while compressing – along with support for multiple compression algorithms (snappy, quicklz, zlib and lzma) – make it one of the best options for users looking to optimize their data footprint.

Conclusion

Most people don’t know that they have a choice when it comes to storage engines, and that the choice should be based on what the database workload will look like. Percona’s Vadim Tkachenko performed an excellent benchmark test comparing the performances of PerconaFT and WiredTiger to help specifically differentiate between these engines.

Part 1: Intro and the MMAPv1 storage engine.

Part 2: WiredTiger storage engine.

Part 3: RocksDB storage engine.

Percona CEO Peter Zaitsev discusses working remotely with Fortune Magazine

January 27, 2016 - 11:33am

As a company that believes in and supports the open source community, embracing innovation and change is par for the course at Percona. We wouldn’t be the company we are today without fostering a culture that rewards creative thinking and rapid evolution.

Part of this culture is making sure that Percona is a place where people love to work, and can transmit their passion for technology into tangible rewards – both personally and financially. One of the interesting facts about Percona’s culture is that almost 95 percent of its employees are working remotely. Engineers, support, marketing, even executive staff – most of these people interact daily via electronic medium rather than in person. Percona’s staff is worldwide across 29 countries and 19 U.S. states. How does that work? How do you make sure that the staff is happy, committed, and engaged enough to stay on? How do you attract prospective employees with this unusual model?

It turns out that not only does it work, but it works very well. It can be challenging to manage the needs of such a geographically diverse group, but the rewards (and the results) outweigh the effort.

The secret is, of course, good communication, an environment of respect and personal empowerment.

Percona’s CEO Peter Zaitsev recently provided some of his thoughts to Fortune magazine about how our business model helps to not only to foster incredible dedication and innovation, but create a work environment that encourages passion, commitment and teamwork.

Read about his ideas on Percona’s work model here.

Oh, and by the way, Percona is currently hiring! Perhaps a career here might fit in with your plans . . .

Finding MySQL Table Size on Disk

January 26, 2016 - 2:16pm

So you want to know how much space a given MySQL table takes on disk. Looks trivial, right? Shouldn’t this information be readily available in the INFORMATION_SCHEMA.TABLES? Not so fast!

This simple question actually is quite complicated in MySQL. MySQL supports many storage engines (some of which don’t store data on disk at all) and these storage engines often each store data in different layouts. For example, there are three “basic” layouts that the InnoDB storage engine supports for MySQL 5.7, with multiple variations for row_formats and two types of available compression.

So let’s simplify the situation: instead of a general question, let’s ask how to find the table size on disk for an InnoDB table stored in its own tablespace (as the parameter innodb_file_per_table=1 provides).

Before we get to the answer, let me show you the table size graph that I get by running sysbench prepare (basically populating tables with multi-value inserts):

This graphs shows the table size defined by data_length plus index_length captured from INFORMATION_SCHEMA.TABLES. You would expect gradual table growth as data is inserted into it, rather than a flat table size followed by jumps (sometimes by 10GB or more).

The graph does not match how data is changing on disk, where it is growing gradually (as expected):

-rw-r----- 1 mysql mysql 220293234688 Jan 25 17:03 sbtest1.ibd -rw-r----- 1 mysql mysql 220310011904 Jan 25 17:03 sbtest1.ibd -rw-r----- 1 mysql mysql 222499438592 Jan 25 17:07 sbtest1.ibd

As we see from this experiment, MySQL does not really maintain live data_length and index_length values,  but rather refreshes them periodically – and rather irregularly. The later part of the graph is especially surprising, where we see a couple of data refreshes becoming more regular. This is different from the first part of the graph which seems to be in line with statistics being updated when when 10 percent of the rows are changed.  (manual)

What makes it especially confusing is that there are other values such as table_rows, data_free or update_time  that are updated in the real time (even though I can’t imagine why table size related values would be any more difficult to maintain in real time!).

Is there way to get real time data_length and index_length updates as we query information_schema? There is, but it is costly.

To get information_schema to provide accurate information in MySQL 5.7, you need to do two things: disable innodb_stats_persistent and enable innodb_stats_on_metadata – both of which come with significant side effects.

Disabling persistent statistics means InnoDB has to refresh the statistics each time the server starts, which is expensive and can produce volatile query plans between restarts. Enabling innodb_stats_on_metadata makes access to information_schema slower, much slower, as I wrote few years ago.

Is there a better way? It turns out there is. You can look into the tablespaces information table using INNODB_SYS_TABLESPACES to see the actual file size. Unlike index_length and data_length, INNODB_SYS_TABLESPACES is updated in real time with no special configuration required:

mysql> select * from INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES where name='sbinnodb/sbtest1' G *************************** 1. row ***************************         SPACE: 42          NAME: sbinnodb/sbtest1          FLAG: 33   FILE_FORMAT: Barracuda    ROW_FORMAT: Dynamic     PAGE_SIZE: 16384 ZIP_PAGE_SIZE: 0    SPACE_TYPE: Single FS_BLOCK_SIZE: 4096     FILE_SIZE: 245937209344 ALLOCATED_SIZE: 245937266688 1 row in set (0.00 sec)

The great thing about using this table is that it also handles new “Innodb Page Compression” properly showing the difference between file_size  (which is the logical file size on disk) and allocated_size (which is space allocated for this file and can be significantly smaller):

mysql> select * from INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES where name='sbinnodb/testcomp' G *************************** 1. row ***************************         SPACE: 48          NAME: sbinnodb/testcomp          FLAG: 33   FILE_FORMAT: Barracuda    ROW_FORMAT: Dynamic     PAGE_SIZE: 16384 ZIP_PAGE_SIZE: 0    SPACE_TYPE: Single FS_BLOCK_SIZE: 4096     FILE_SIZE: 285212672 ALLOCATED_SIZE: 113004544 1 row in set (0.00 sec)

Finally, let’s look into how different InnoDB compression variants impact the information provided in information_schema.   

If you use the old Innodb compression (Innodb Table Compression) you will see the compressed data size shown in data_length and index_length as result. For example, avg_row_length will be much lower than you would expect.

If you use the new InnoDB compression in MySQL 5.7 (Innodb Page Compression) you will see the values corresponding to file size, not allocated size as shown in information_schema.

Conclusion
Answering the trivial question “How much space does this table take on disk?” is really not a simple request in MySQL – look into the obvious place and you’re likely to get the wrong value. Look at INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES to get actual file size value for InnoDB tables.  

EXPLAIN FORMAT=JSON has details for subqueries in HAVING, nested selects and subqueries that update values

January 25, 2016 - 2:18pm

Over several previous blog posts, we’ve already discussed what information the EXPLAIN FORMAT=JSON output provides for some subqueries. You can review those discussions here, here and here. EXPLAIN FORMAT=JSON shows many details that you can’t get with other commands. Let’s now finish this topic and discuss the output for the rest of the subquery types.

First, let’s look at the subquery in the HAVING clause, such as in the following example:

select count(emp_no), salary from salaries group by salary having salary > ALL (select avg(s) from (select dept_no, sum(salary) as s from salaries join dept_emp using (emp_no) group by dept_no) t )

This example prints the number of employees and their salaries, if their salary is greater than the average salary in their department. EXPLAIN FORMAT=JSON provides a lot details on how this subquery is optimized:

mysql> explain format=json select count(emp_no), salary from salaries group by salary having salary > ALL (select avg(s) from (select dept_no, sum(salary) as s from salaries join dept_emp using (emp_no) group by dept_no) t)G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "3073970.40" }, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "2557022.00" }, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2557022, "rows_produced_per_join": 2557022, "filtered": "100.00", "cost_info": { "read_cost": "5544.00", "eval_cost": "511404.40", "prefix_cost": "516948.40", "data_read_per_join": "39M" }, "used_columns": [ "emp_no", "salary", "from_date" ] }, "having_subqueries": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 2, "cost_info": { "query_cost": "771970.25" }, "table": { "table_name": "t", "access_type": "ALL", "rows_examined_per_scan": 3087841, "rows_produced_per_join": 3087841, "filtered": "100.00", "cost_info": { "read_cost": "154402.05", "eval_cost": "617568.20", "prefix_cost": "771970.25", "data_read_per_join": "94M" }, "used_columns": [ "dept_no", "s" ], "materialized_from_subquery": { "using_temporary_table": true, "dependent": false, "cacheable": true, "query_block": { "select_id": 3, "cost_info": { "query_cost": "1019140.27" }, "grouping_operation": { "using_filesort": false, "nested_loop": [ { "table": { "table_name": "dept_emp", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no", "dept_no" ], "key": "dept_no", "used_key_parts": [ "dept_no" ], "key_length": "4", "rows_examined_per_scan": 331570, "rows_produced_per_join": 331570, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "737.00", "eval_cost": "66314.00", "prefix_cost": "67051.00", "data_read_per_join": "5M" }, "used_columns": [ "emp_no", "dept_no" ] } }, { "table": { "table_name": "salaries", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.dept_emp.emp_no" ], "rows_examined_per_scan": 9, "rows_produced_per_join": 3087841, "filtered": "100.00", "cost_info": { "read_cost": "334520.92", "eval_cost": "617568.35", "prefix_cost": "1019140.27", "data_read_per_join": "47M" }, "used_columns": [ "emp_no", "salary", "from_date" ] } } ] } } } } } } ] } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)`,`employees`.`salaries`.`salary` AS `salary` from `employees`.`salaries` group by `employees`.`salaries`.`salary` having <not>((`employees`.`salaries`.`salary` <= <max>(/* select#2 */ select avg(`t`.`s`) from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t`)))

We see that the subquery in the HAVING clause is not dependent, but cacheable:

"having_subqueries": [ { "dependent": false, "cacheable": true,

It has its own query block:

"query_block": { "select_id": 2,

Which accesses table “t”:

"table": { "table_name": "t", "access_type": "ALL", "rows_examined_per_scan": 3087841, "rows_produced_per_join": 3087841, "filtered": "100.00", "cost_info": { "read_cost": "154402.05", "eval_cost": "617568.20", "prefix_cost": "771970.25", "data_read_per_join": "94M" }, "used_columns": [ "dept_no", "s" ],

Table “t” was also materialized from the subquery:

], "materialized_from_subquery": { "using_temporary_table": true, "dependent": false, "cacheable": true, "query_block": { "select_id": 3,

Another kind of subquery is in the SELECT list. If we want to compare the salary of an employee with the average salary in the company, for example, we can use the query select emp_no, salary, (select avg(salary) from salaries) from salaries. Lets examine the EXPLAIN output:

mysql> explain format=json select emp_no, salary, (select avg(salary) from salaries) from salariesG *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "516948.40" }, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2557022, "rows_produced_per_join": 2557022, "filtered": "100.00", "cost_info": { "read_cost": "5544.00", "eval_cost": "511404.40", "prefix_cost": "516948.40", "data_read_per_join": "39M" }, "used_columns": [ "emp_no", "salary" ] }, "select_list_subqueries": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 2, "cost_info": { "query_cost": "516948.40" }, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2557022, "rows_produced_per_join": 2557022, "filtered": "100.00", "cost_info": { "read_cost": "5544.00", "eval_cost": "511404.40", "prefix_cost": "516948.40", "data_read_per_join": "39M" }, "used_columns": [ "salary" ] } } } ] } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary`,(/* select#2 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`) AS `(select avg(salary) from salaries)` from `employees`.`salaries`

EXPLAIN FORMAT=JSON in this case shows that the subquery is part of the first query_block, not dependent and cacheable.

The last type of subquery I want to discuss is the subquery updating values. For example, I added a new column to the titles table from the standard employees database:

mysql> alter table titles add column full_title varchar(100); Query OK, 0 rows affected (24.42 sec) Records: 0 Duplicates: 0 Warnings: 0

Now I want full_title to contain both the department’s name and title, separated by a space. I can use UPDATE with the subquery to achieve this:

update titles set full_title=concat((select dept_name from departments join dept_emp using(dept_no) where dept_emp.emp_no=titles.emp_no and dept_emp.to_date='9999-01-01') ,' ', title) where to_date = '9999-01-01';

To find out how it is optimized, we can use EXPLAIN FORMAT=JSON:

mysql> explain format=json update titles set full_title=concat((select dept_name from departments join dept_emp using(dept_no) where dept_emp.emp_no=titles.emp_no and dept_emp.to_date='9999-01-01') ,' ', title) where to_date = '9999-01-01'G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "table": { "update": true, "table_name": "titles", "access_type": "index", "key": "PRIMARY", "used_key_parts": [ "emp_no", "title", "from_date" ], "key_length": "59", "rows_examined_per_scan": 442843, "filtered": "100.00", "using_temporary_table": "for update", "attached_condition": "(`employees`.`titles`.`to_date` = '9999-01-01')" }, "update_value_subqueries": [ { "dependent": true, "cacheable": false, "query_block": { "select_id": 2, "cost_info": { "query_cost": "1.35" }, "nested_loop": [ { "table": { "table_name": "dept_emp", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no", "dept_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.titles.emp_no" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 0, "filtered": "10.00", "cost_info": { "read_cost": "1.00", "eval_cost": "0.02", "prefix_cost": "1.22", "data_read_per_join": "1" }, "used_columns": [ "emp_no", "dept_no", "to_date" ], "attached_condition": "(`employees`.`dept_emp`.`to_date` = '9999-01-01')" } }, { "table": { "table_name": "departments", "access_type": "eq_ref", "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", "used_key_parts": [ "dept_no" ], "key_length": "4", "ref": [ "employees.dept_emp.dept_no" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 0, "filtered": "100.00", "cost_info": { "read_cost": "0.11", "eval_cost": "0.02", "prefix_cost": "1.35", "data_read_per_join": "5" }, "used_columns": [ "dept_no", "dept_name" ] } } ] } } ] } } 1 row in set, 1 warning (0.00 sec) Note (Code 1276): Field or reference 'employees.titles.emp_no' of SELECT #2 was resolved in SELECT #1

We can see in this output that the subquery is dependent, not cacheable, and will be executed for each row that needs to be updated.

Conclusion: EXPLAIN FORMAT=JSON  provides various information about all kind of subqueries.

Peter Zaitsev webinar January 27th: Compression In Open Source Databases

January 22, 2016 - 3:44pm

Percona invites you to attend a webinar Wednesday, January 27th, with CEO Peter Zaitsev: Compression In Open Source Databases. Register now!

Data growth has been tremendous in the last decade and shows no signs of stopping. To deal with this trend database technologies have implemented a number of approaches, and data compression is by far the most common and important. Compression in open source databases is complicated, and there are a lot of different approaches – each with their own implications.

In this talk we will perform a survey of compression in some of the most popular open source database engines including: Innodb, TokuDB, MongoDB, WiredTiger, RocksDB, and PostgreSQL.

Important information:

Webinar: Compression In Open Source Databases

Presenter: Peter Zaitsev, CEO, Percona

Date: Wednesday, January 27, 2016

Time: 10:00am PST (UTC – 8)

Register now, and we look forward to seeing you there!

About Peter Zaitsev, CEO Percona:

Peter co-founded Percona in 2006, assuming the role of CEO. Percona helps companies of all sizes maximize their success with MySQL. Percona was named to the Inc. 5000 in 2013. Peter was an early employee at MySQL AB, eventually leading the company’s High Performance Group. A serial entrepreneur, Peter co-founded his first startup while attending Moscow State University where he majored in Computer Science. As CEO of Percona, Peter enjoys mixing business leadership with hands on technical expertise. Peter is co-author of High Performance MySQL published by O’Reilly, one of the most popular books on MySQL performance. Peter blogs regularly on MySQLPerformanceBlog.com and speaks frequently at conferences. Peter lives in North Carolina with his wife and two children. In his spare time, Peter enjoys travel and spending time outdoors.

Percona Live Data Performance Conference 2016: news you need to know!

January 22, 2016 - 9:08am

The Percona Live Data Performance Conference 2016 is rapidly approaching, and we’re looking forward to providing an outstanding experience April 18-21 for all whom attend.

Percona Live is the premier event for the rich and diverse open source community and businesses that thrive in the MySQL and NoSQL marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, and CEOs representing organizations from industry giants such as Oracle to start-ups. Vendors increasingly rely on the conference as a major opportunity to connect with potential high-value customers from around the world.

Below are some highlights for the upcoming conference regarding the conference schedule, Tutorial sessions, Birds of a Feather talks, and Lightning talks.

Conference Schedule

Percona Live is packed with engaging sessions, helpful tutorials, and brief talks that will both enlighten and entertain attendees — featuring the best and brightest from the database and open source communities! Below are just a few of the exciting talks that will happen at the Percona Live Data Performance Conference 2016. (You can find the full schedule here.):

Session: Dirty Little Secrets

Speakers:   Jeremy Tinley, Sr. MySQL Operations Engineer, Etsy

Jenni Snyder, MySQL DBA, Yelp

Jonah Berquist, Database Infrastructure Engineer, GitHub

Geoffrey Anderson, Database Operations Engineer, Box

Silvia Botros, Sr. MySQL DBA, SendGrid

Shlomi Noach, Sr. Systems Engineer, GitHub

Session: What’s New in MySQL

Speakers:   Geir Høydalsvik, Sr. Software Development Director, Oracle

Simon Mudd, DBA, booking.com

Session: Espresso: LinkedIn’s distributed document store on top of MySQL

Speakers:   Yun Sun, Staff Software Engineer, LinkedIn

Eun-Gyu Kim, Staff Software Engineer, LinkedIn

Davi Arnaut Software Engineer, LinkedIn

Session: Shifting the Paradigm: MongoDB and the MEAN Stack

Speakers:   Kat Styons, Senior Full Stack Developer, The Washington Post

Sruti Cheedalla, Senior Web Developer, The Washington Post

 

The full schedule can be found here.

Tutorials

Percona Live tutorial sessions provided expert insight into various technology topics. The Percona Live tutorial schedule is also up, you can find it here.

Birds of a Feather

Birds of a Feather (BOF) sessions enable attendees with interests in the same project or topic to enjoy some quality face time. BOFs can be organized for individual projects or broader topics (e.g., best practices, open data, standards). Any attendee or conference speaker can propose and moderate an engaging BOF. Percona will post the selected topics and moderators online and provide a meeting space and time. The BOF sessions will be held Tuesday, April 19, 2016 at 6:00 p.m. The deadline for BOF submissions is February 7.

Lightning Talks

Lightning Talks provide an opportunity for attendees to propose, explain, exhort, or rant on any MySQL, NoSQL or data-in-the-cloud-related topic for five minutes. Topics might include a new idea, successful project, cautionary story, quick tip, or demonstration.   The deadline for submitting a Lightning Talk topic is February 7, 2016.

All submissions will be reviewed, and the top ten will be selected to present during one of the scheduled breakout sessions during the week. Lighthearted, fun or otherwise entertaining submissions are highly welcome.

 

We’re looking forward to seeing you at Percona Live!

 

Tired of MySQL Making You Wait? Webinar: Questions and Answers

January 21, 2016 - 7:33am

We’d like to thank everybody for joining us on January 7th for our “Tired of MySQL Making You Wait?” webinar with Pecona’s Alexander Rubin, Principal Consultant and SolarWinds’ Janis Griffin, Database Evangelist.

Too often developers and DBAs struggle to pinpoint the root cause of performance issues and then spend too much time in trying to fix them. In the webinar, we discussed how you can significantly increase the performance of your applications while also reducing database response time.

You can find an archived version of the webinar here.

Below are the questions that were asked during the webinar, with responses from Alex and Janis. If you need further clarification, please respond in the comments.

Thanks again, and we look forward to you joining us at our next webinar (with Percona CEO Peter Zaitsev), Compression In Open Source Databases!

 

Q: Are there special tuning tips for Galera Cluster?

A: Since Galera Cluster (Percona XtraDB Cluster) is based on MySQL, all query tuning tips will apply as well. There are a number of Galera Cluster configuration tips available: for example the blog post at this link talks about tuning the PXC for maximum write throughput: https://www.percona.com/blog/2015/06/03/optimizing-percona-xtradb-cluster-write-hotspots/

 

Q: Does DPA support Galera Cluster ?

A: Yes, DPA has the ability to group the cluster together to see load balancing, top 15 SQLs across the cluster, plus the top wait states.

 

Q: Can I create a covered index when I have “group by” and “order by” instructions together?

A: Yes, you can create a covered index and MySQL will use it to satisfy the query (you will see “using index”). If you have “group by” and “order by” on a different columns, however, MySQL will still have to perform a filesort and create a temp table. To create this index, specify all the following fields in your query in the index:

  1. All fields in the “where” condition
  2. The “group by” fields
  3. The “order by” fields
  4. The fields that the query is selecting.

Please note the limitations of such approach described here: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html

 

Q: Can we use DPA with Azure MySQL?

A: Yes, DPA will monitor, tune and analyze the SQL server performance running on Microsoft Azure.

 

Q: Do you know if MariaDB has or is planning to follow with these virtual fields and/or SYS schema enhancements from MySQL 5.7?

A: MariaDB has had virtual or computed columns since version 5.2. I don’t believe MariaDB comes with the sys schema already installed, but you can download and install it.

 

Q: Does DPA support PostgreSQL? If not, is it in the roadmap?

A: Currently, DPA does not support PostgresSQL. However, we continually re-evaluate it with each new release.

 

Q: Does DPA support RDS instances?

A: Yes, DPA supports the monitoring of RDS instances.

 

Q: Does the performance schema show any information about how the load data is performing?

A: MySQL 5.5 performance_schema became available in 5.5.3 and has only 11 tables. Most of the tables deal with wait events and file information. In addition, you would need turn on the consumers and enable the instrumentation of the wait events. Once you’ve done that, you will be able to see the threads and what they are waiting on.

 

Q: I didn’t understand the reasoning that leads to the index on ORDER BY. I can’t link it to the previous slide query.

A: I assume this question is about the ORDER BY + LIMIT optimization. When you create an index on the ORDER BY field only, MySQL can start reading the whole table in the order of the index. As the index is sorted, it can start fetching the rows and filter out the rows that don’t match the ORDER BY condition. As there is a LIMIT N on the query, MySQL will stop after fetching N rows.

 

Q: How can I analyze parts of a stored procedure that runs nightly to see where by bottlenecks are? It has 100+ update queries that it performs every night to build a table with one million plus rows.

A: You can do it using the slow query log in Percona Server (5.5/5.6) and/or Performance Schema in MySQL 5.7. If you are running Percona Server, you can enable extended stored procedures logging as described here: https://www.percona.com/doc/percona-server/5.6/diagnostics/slow_extended.html. Another way is using a deprecated “show profile” method as described here: https://www.percona.com/blog/2009/01/19/profiling-mysql-stored-routines/

 

Q: How will DPA use the index when there are more than five columns in the “where” conditions? How would you create indexes?

A: I would suggest checking the “cardinality” of the fields (= number of unique values). Usually (unless you create a covered index or are optimizing the group by) it makes much more sense to limit the number of fields in an index, and only include the fields with the high cardinality. For example, PRIMARY KEY or UNIQUE INDEX works best, whereas the “gender” field (with only two unique values, “male” and “female”) would not be very useful.

 

Q: How would the analytics tool work in an open stack VM environment, where we have 100 database servers?

A: One installation of DPA can monitor hundreds of database servers. In fact, we have several very large companies that monitor 1000s of servers worldwide.

 

Q: If you have a small table with only 100 records, is it worth creating indexes on specific fields or just do a table scan?

A: If the table is only 100 records and you are not joining it with other tables, it usually does not make sense to add indexes. But because the table is so small it doesn’t really matter either way.

 

Q: Is the SolarWinds tool better than MONyog, and how expensive is the license cost for this?

A: MONyog is also a monitoring tool, but it doesn’t have the advisors, alarms, granularity, history, or customizations that DPA gives you. The retail cost per server is currently $1,995 per monitored server, but is heavily discounted the more you purchase.

 

Q: In many cases, due to the randomness and complexity of queries thrown at various tables, I end up creating a lot of indexes. At what point would there be too many indexes? Should I then create MySQL views instead of indexes? Should one use MySQL views at all to optimize searches?

A: First of all there are no “materialized views” in MySQL, so it is not a useful replacement for indexes. You can create “summary” tables manually, which will usually help a lot. Although it is hard to say when you have too many indexes, lots of indexes can decrease the performance of your insert/update/delete operations, as well as confuse MySQL. So a great many indexes might cause MySQL to start choosing a wrong index when doing selects.

 

Q: Sometime, we need to add indices for different queries for the same table. Eventually, the table has too many indices. Any suggestion for such cases?

A: See the response to the previous question.

 

Q: Is there a way in DPA to see what queries are currently running? In other words, to know about slow queries as they run rather than only knowing about them historically?

A: Yes. In the “Current” dashboard, click the “Currently Active Sessions” box. With this option, you can sort by longest running, etc.

 

Q: Why is delay indexed in the composite key? It only covers the query, but the temp table can be avoided by the first two fields?

A: You are referring to this example:

mysql> alter table ontime_2012 add key covered(dayofweek, Carrier, DepDelayMinutes); explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2012 where dayofweek =7 group by CarrierG ...                     possible_keys: DayOfWeek,covered           key: covered       key_len: 2           ref: const          rows: 905138         Extra: Using where; Using index

The reason we add DepDelayMinutes is to make the index covered, so MySQL will be able to satisfy the query with an index only.

 

MongoDB revs you up: What storage engine is right for you? (Part 3)

January 20, 2016 - 6:22am
Differentiating Between MongoDB Storage Engines: RocksDB

In this series of posts, we discussed what a storage engine is, and how you can determine the characteristics of one versus the other:

“A database storage engine is the underlying software that a DBMS uses to create, read, update and delete data from a database. The storage engine should be thought of as a “bolt on” to the database (server daemon), which controls the database’s interaction with memory and storage subsystems.”

Generally speaking, it’s important to understand what type of work environment the database is going to interact with, and to select a storage engine that is tailored to that environment.

The first post looked at MMAPv1, the original default engine for MongoDB (through release 3.0). The second post examined WiredTiger, the new default MongoDB engine.

This post will cover RocksDB. RocksDB builds on LevelDB, Google’s open source key value database library. It was designed to address several scenarios:

  1. Scale to run on servers with many CPU cores.
  2. Use fast storage efficiently.
  3. Be flexible to allow for innovation.
  4. Support IO-bound, in-memory, and write-once workloads.

RocksDB

Find it in: Percona Builds

RocksDB, designed originally at Facebook, uses LSM trees to store data, unlike most other storage engines which are using B-Trees.

LSM trees are designed to amortize the cost of writes: data is written to log files that are sequentially written to disk and never modified. Then a background thread merges the log files (compaction) into a tree like structure. With this design a single I/O can flush to disk tens or hundreds of write operations.

The tradeoff is that reading a document is more complex and therefore slower than for a B-Tree; because we don’t know in advance in which log file the latest version of the data is stored, we may need to read multiple files to perform a single read. RocksDB uses bloom filters and fractional cascading to minimize the impact of these issues.

As far as workload fit, RocksDB can provide very good insert and query performance while providing compression ratios that are typically better than wiredTiger and slightly worse than PerconaFT. Also, RocksDB is theoretically better than PerconaFT at keeping up with the frequent and heavy delete workloads that accompany TTL indexes in high insert workloads.

Percona is excited to offer enterprise support for RocksDB! RocksDB as part of our MongoDB support options: https://www.percona.com/services/support/rocksdb-support.

Conclusion

Most people don’t know that they have a choice when it comes to storage engines, and that the choice should be based on what the database workload will look like. Percona’s Vadim Tkachenko performed an excellent benchmark test comparing the performances of, PerconaFT and WiredTiger to help specifically differentiate between these engines.

In Part Four of this blog series, we’ll take a closer look at Percona’s MongoDB storage engine: PerconaFT.

Part 1: Intro and the MMAPv1 storage engine.

Part 2: WiredTiger storage engine.



General Inquiries

For general inquiries, please send us your question and someone will contact you.