Buy Percona ServicesBuy Now!
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 1 hour 1 min ago

Percona Monitoring Plugins 1.1.7 release

December 9, 2016 - 11:03am

Percona announces the release of Percona Monitoring Plugins 1.1.7.

Changelog

  • New Nagios script pmp-check-mongo.py for MongoDB.
  • Added MySQL socket and flag options to Cacti PHP script.
  • Added disk volume check on “Mounted on” in addition to “Filesystem” to Cacti PHP script to allow monitoring of tmpfs mounts.
  • Allow delayed slave to have SQL thread stopped on pmp-check-mysql-replication-delay check.
  • Fix for –unconfigured flag of pmp-check-mysql-replication-delay.
  • Fix for max_duration check of pmp-check-mysql-innodb when system and MySQL timezones mismatch.
  • Fix rare nrpe broken pipe error on pmp-check-unix-memory check.
  • Updated package spec files.

A new tarball is available from downloads area or in packages from our software repositories. The plugins are fully supported for customers with a Percona Support contract and free installation services are provided as part of some contracts. You can find links to the documentation, forums and more at the project homepage.

About Percona Monitoring Plugins
Percona Monitoring Plugins are monitoring and graphing components designed to integrate seamlessly with widely deployed solutions such as Nagios, Cacti and Zabbix.

Percona Toolkit 2.2.20 is now available

December 9, 2016 - 10:32am

Percona announces the availability of Percona Toolkit 2.2.20. Released December 9, 2016, Percona Toolkit is a collection of advanced command-line tools that perform a variety of MySQL server and system tasks that DBAs find too difficult or complex for to perform manually. Percona Toolkit, like all Percona software, is free and open source.

This release is the current GA (Generally Available) stable release in the 2.2 series. Downloads are available here and from the Percona Software Repositories.

New Features:
  • 1636068: New --pause-file option has been implemented for pt-online-schema-change. When used pt-online-schema-change will pause while the specified file exists.
  • 1638293 and 1642364: pt-online-schema-change now supports adding and removing the DATA DIRECTORY to a new table with the --data-dir and --remove-data-dir options.
  • 1642994: Following schemas/tables have been added to the default ignore list: mysql.gtid_execution, sys.sys_config, mysql.proc, mysql.inventory, mysql.plugin, percona.* (including checksums, DSNs table), test.*, and percona_schema.*
  • 1643940: pt-summary now provides information about Transparent huge pages.
  • 1604834: New --preserve-embedded-numbers option was implemented for pt-query-digest which can be used to preserve numbers in database/table names when fingerprinting queries.
Bugs Fixed:
  • 1613915: pt-online-schema-change could miss the data due to the way ENUM values are sorted.
  • 1625005: pt-online-schema-change didn’t apply underscores to foreign keys individually.
  • 1566556: pt-show-grants didn’t work correctly with MariaDB 10 (Daniël van Eeden).
  • 1634900: pt-upgrade would fail when log contained SELECT...INTO queries.
  • 1639052: pt-table-checksum now automatically excludes checking schemas named percona and percona_schema which aren’t consistent across the replication hierarchy.
  • 1635734: pt-slave-restart --config did not recognize = as a separator.
  • 1362942: pt-slave-restart would fail on MariaDB 10.0.13.

Find release details in the release notes and the 2.2.20 milestone at Launchpad. Report bugs on the Percona Toolkit launchpad bug tracker

Tuning Linux for MongoDB: Automated Tuning on Redhat and CentOS

December 8, 2016 - 2:34pm

In a previous blog post: “Tuning Linux for MongoDB,” I covered several tunings for an efficient MongoDB deployment on Linux in Production. This post expands on that one.

While I felt the tuning Linux for MongoDB was a very useful blog post that results in a great baseline tuning, something bugged me about how much effort and touch-points were required to achieve an efficient Linux installation for MongoDB. More importantly, I noticed some cases where the tunings (example: changes to disk I/O scheduler in /etc/udev.d) were ignored on some recent RedHat and CentOS versions. With these issues in mind, I started to investigate better solutions for achieving the tuned baseline.

Tuned

In RedHat (and thus CentOS) 7.0, a daemon called “tuned” was introduced as a unified system for applying tunings to Linux. tuned operates with simple, file-based tuning “profiles” and provides an admin command-line interface named “tuned-adm” for applying, listing and even recommending tuned profiles.

Some operational benefits of tuned:

  • File-based configuration – Profile tunings are contained in a simple, consolidated files
  • Swappable profiles – Profiles are easily changed back/forth
  • Standards compliance – Using tuned profiles ensures tunings are not overridden or ignored

Note: If you use configuration management systems like Puppet, Chef, Salt, Ansible, etc., I suggest you configure those systems to deploy tunings via tuned profiles instead of applying tunings directly, as tuned will likely start to fight this automation, overriding the changes.

The default available tuned profiles (as of  RedHat 7.2.1511) are:

  • balanced
  • desktop
  • latency-performance
  • network-latency
  • network-throughput
  • powersave
  • throughput-performance
  • virtual-guest
  • virtual-host

The profiles that are generally interesting for database usage are:

  • latency-performance

    “A server profile for typical latency performance tuning. This profile disables dynamic tuning mechanisms and transparent hugepages. It uses the performance governer for p-states through cpuspeed, and sets the I/O scheduler to deadline.network-latency.”

  • throughput-performance

    “A server profile for typical throughput performance tuning. It disables tuned and ktune power saving mechanisms, enables sysctl settings that improve the throughput performance of your disk and network I/O, and switches to the deadline scheduler. CPU governor is set to performance.”

  • network-latency – Includes “latency-performance,” disables transparent_hugepages, disables NUMA balancing and enables some latency-based network tunings.
  • network-throughput – Includes “throughput-performance” and increases network stack buffer sizes.

I find “network-latency” is the closest match to our recommended tunings, but some additional changes are still required.

The good news is tuned was designed to be flexible, so I decided to make a MongoDB-specific profile: enter “tuned-percona-mongodb”.

tuned-percona-mongodb

tuned-percona-mongodb: https://github.com/Percona-Lab/tuned-percona-mongodb

“tuned-percona-mongodb” is a performance-focused tuned profile for MongoDB on Linux, and is currently considered experimental (no gurantees/warranties). It’s hosted in our Percona-Lab Github repo.

tuned-percona-mongodb applies the following tunings (from the previous tuning article) on a Redhat/CentOS 7+ host:

  • Disabling of transparent huge pages
  • Kernel network tunings (sysctls)
  • Virtual memory dirty ratio changes (sysctls)
  • Virtual memory “swappiness” (sysctls)
  • Block-device readahead settings (on all disks except /dev/sda by default)
  • Block-device I/O scheduler (on all disks except /dev/sda by default)

The following tunings that our previous tuning article didn’t cover are also applied:

After a successful deployment of this profile, only these recommendations are outstanding:

  1. Filesystem type and mount options:
    Tuned does not handle filesystem mount options, this needs to be done manually in /etc/fstab. To quickly summarize: we recommend the XFS or EXT4 filesystem type for MongoDB data when using MMAPv1 or RocksDB storage engines, and XFS ONLY when using WiredTiger. For all filesystems, using the mount options “rw,noatime” will reduce some activity.
  2. NUMA disabling or interleaving:
    Tuned does not handle NUMA settings and these still need to be handled via the MongoDB init script or the BIOS on/off switch.
  3. Linux ulimits:
    Tuned does not set Linux ulimit settings. However, Percona Server for MongoDB RPM packages do this for you at startup! See “LimitNOFILE” and “LimitNPROC” in “/usr/lib/systemd/system/mongod.service” for more information.
  4. NTP server:
    Tuned does not handle installation of RPM packages or enabling of services. You will need to install the “ntp” package and enable/start the “ntpd” service manually:

    sudo yum install ntp sudo systemctl enable ntpd sudo systemctl start ntpd

tuned-percona-mongodb: Installation

The installation of this profile is as simple as checking-out the repository with a “git” command and then running “sudo make enable”, full output here:

$ git clone https://github.com/Percona-Lab/tuned-percona-mongodb $ cd tuned-percona-mongodb $ sudo make enable if [ -d /etc/tuned ]; then cp -dpR percona-mongodb /etc/tuned/percona-mongodb; echo "### 'tuned-percona-mongodb' is installed. Enable with 'make enable'."; else echo "### ERROR: cannot find tuned config dir at /etc/tuned!"; exit 1; fi ### 'tuned-percona-mongodb' is installed. Enable with 'make enable'. tuned-adm profile percona-mongodb tuned-adm active Current active profile: percona-mongodb

In the example above you can see “percona-mongodb” is now the active tuned profile on the system (mentioned on the last output line).

The tuned profile files are installed to “/etc/tuned/percona-mongodb”, as seen here:

$ ls -alh /etc/tuned/percona-mongodb/*.* -rwxrwxr-x. 1 root root 677 Nov 22 20:00 percona-mongodb.sh -rw-rw-r--. 1 root root 1.4K Nov 22 20:00 tuned.conf

Let’s check that the “deadline” i/o scheduler is now the current scheduler on any disk that isn’t /dev/sda (“sdb” used below):

$ cat /sys/block/sdb/queue/scheduler noop [deadline] cfq

Transparent huge pages should be disabled (it is!):

$ cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] $ cat /sys/kernel/mm/transparent_hugepage/defrag always madvise [never]

Block-device readahead should be 32 (16kb) on /dev/sdb (looks good!):

$ blockdev --getra /dev/sdb 32

That was easy!

tuned-percona-mongodb: Uninstallation

To uninstall the profile, run “sudo make uninstall” in the github checkout directory:

if [ -d /etc/tuned/percona-mongodb ]; then echo "### Disabling tuned profile 'tuned-percona-mongodb'"; echo "### Changing tuned profile to 'latency-performance', adjust if necessary after!"; tuned-adm profile latency-performance; tuned-adm active; else echo "tuned-percona-mongodb profile not installed!"; fi ### Disabling tuned profile 'tuned-percona-mongodb' ### Changing tuned profile to 'latency-performance', adjust if necessary after! Current active profile: latency-performance if [ -d /etc/tuned/percona-mongodb ]; then rm -rf /etc/tuned/percona-mongodb; fi

Note: the uninstallation will enable the “latency-performance” tuned profile, change this after the uninstall if needed

To confirm the uninstallation, let’s check if the block-device readahead is set back to default (256/128kb):

$ sudo blockdev --getra /dev/sdb 256

Uninstall complete.

Conclusion

So far tuned shows a lot of promise for tuning Linux for MongoDB, providing a single, consistent interface for tuning the Linux operating system. In the future, I would like to see the documentation for tuned improve. However, its simplicity makes the need for documentation rarely necessary.

As mentioned, after applying “tuned-percona-mongodb” you still need to configure an NTP server, NUMA (in some cases) and the filesystem type+tunings manually. The majority of the time, effort and room for mistakes is greatly reduced using this method.

If you have any issues with this profile for tuning Linux for MongoDB, or have any questions, please create a Github issue at this URL: https://github.com/Percona-Lab/tuned-percona-mongodb/issues/new.

Links

First MongoDB replica-set Configuration for MySQL DBAs

December 7, 2016 - 3:23pm

In this blog post, we will work on the first replica-set configuration for MySQL DBAs. We will map as many names as possible and compare how the databases work.

Replica-sets are the most common MongoDB deployment nowadays. One of the most frequent questions is: How do you deploy a replica-set? In this blog, the setup we’ll use compares the MongoDB replica-set to a standard MySQL master-slave replication not using GTID.

replica-set

The replica-set usually consists of 3+ instances in different hosts that communicate with each other through both dedicated connections and heartbeat packages. The latter checks the other instances’ health in order to keep the high availability of the replica-sets. The names are slightly different: while “primary” corresponds to “master” in MySQL, “secondary” corresponds to “slave.” MongoDB only supports a single master — different from MySQL, which can have more than one depending on how you set it.

master-slave

Unlike MySQL, MongoDB does not use files to replicate each other (such as binary log or relay log files). All the statements that should be replicated are in the oplog.rs collection. This collection is a capped collection, which means it handles a limited number of documents. Therefore, when it becomes full new content replaces old documents. The amount of data that the oplog.rs can keep is called the “oplog window,” and it is measured in seconds. If a secondary node is delayed for longer than the oplog can handle, a new initial sync is needed. The same happens in MySQL when a slave tries to read binary logs that have been deleted. 

When the replica-set is initialized, all the inserts, updates and deletes are saved in a database called “local” in a collection called oplog.rs. The replica-set initialization can be compared to enabling bin logs in the MySQL configuration.

Now let’s point out the most important differences between such databases: the way they handle replication, and how they keep high availability.

For a standard MySQL replication we need a to enable the binlog in the config file, perform a backup, be aware of the binlog position, restore this backup in a server with a different server id, and finally start the slave thread in the slave. On the other hand, in MongoDB you only need a primary that has been previously configured with the replSet parameter, and then add the new secondaries with the same replSet parameter. No backup needed, no restore needed, no oplog position needed.

Unlike MySQL, MongoDB is capable of electing a new primary when the primary fails. This process is called election, and each instance will vote for a new primary based on how up-to-date they are without human intervention. This is why at least three instances are necessary for a reliable production replica-set. The election is based on votes, and for a secondary to become primary it needs the majority of votes – at least two out of three votes/boxes are required. We can also have an arbiter dedicated to voting only – it does not handle any data, but only decides which secondary should receive a vote. Most drivers are capable of changing the master once we need to pass the replica-set name in the connection string, and with this information drivers map primary and secondary on the fly using the result of rs.config().

Note: There are a few tools capable of emulating this behavior in MySQL. One example is: https://www.percona.com/blog/2016/09/02/mha-quickstart-guide/

Maintaining Replica-sets

After deploying a replica-set, we should monitor it. There are a couple of commands that identify not only the available hosts, but also the replication status. They edit such replication as well.

The command rs.status() will show all the details of the replication, such as the replica-set name, all the hosts that belong to this replica-set, and their status. This command is similar to “show slave hosts” in MySQL.

In addition, the command rs.printSlaveReplicationInfo() shows how delayed the secondaries are. It can be compared to “show slave status” in MySQL.

Replica-sets can be managed online by the command rs.config(). Passing the replica-set name as a parameter in the mongod process, or in the config file, is the only necessary action to start a replica-set. All the other configs can be managed using rs.config().

Step-by-Step How to Start Your First Replica-Set:

Please follow the following instructions to start testing replica-set with three nodes, using all the commands we’ve talked about.

For a production installation, please follow instructions on how to use our repositories here.

Download Percona Server for MongoDB:

$ cd ~ wget https://www.percona.com/downloads/percona-server-mongodb-3.2/percona-server-mongodb-3.2.10-3.0/binary/tarball/percona-server-mongodb-3.2.10-3.0-trusty-x86_64.tar.gz tar -xvzf percona-server-mongodb-3.2.10-3.0-trusty-x86_64.tar.gz mv percona-server-mongodb-3.2.10-3.0 mongodb

Create folders:

cd mongodb/bin mkdir data1 data2 data3

Generate the configs file:

(This is a simple config file, and almost all parameters are the default, so please edit the database directory first.)

for i in {1..3}; do echo echo 'storage: dbPath: "'$(pwd)'/data'$i'" systemLog: destination: file path: "'$(pwd)'/data'$i'/mongodb.log" logAppend: true processManagement: fork: true net: port: '$(( 27017 + $i -1 ))' replication: replSetName: "rs01"' > config$i.cfg; done

Starting MongoDB’s:

  •  Before initializing any MongoDB instance, confirm if the config files exist:

percona@mongo32:~/mongodb/bin$ ls -lah *.cfg config1.cfg config2.cfg config3.cfg

  • Then start mongod process and repeat for the others:

percona@mongo32:~/mongodb/bin$ ./mongod -f config1.cfg 2016-11-10T16:56:12.854-0200 I STORAGE [main] Counters: 0 2016-11-10T16:56:12.855-0200 I STORAGE [main] Use SingleDelete in index: 0 about to fork child process, waiting until server is ready for connections. forked process: 1263 child process started successfully, parent exiting percona@mongo32:~/mongodb/bin$ ./mongod -f config2.cfg 2016-11-10T16:56:21.992-0200 I STORAGE [main] Counters: 0 2016-11-10T16:56:21.993-0200 I STORAGE [main] Use SingleDelete in index: 0 about to fork child process, waiting until server is ready for connections. forked process: 1287 child process started successfully, parent exiting percona@mongo32:~/mongodb/bin$ ./mongod -f config3.cfg 2016-11-10T16:56:24.250-0200 I STORAGE [main] Counters: 0 2016-11-10T16:56:24.250-0200 I STORAGE [main] Use SingleDelete in index: 0 about to fork child process, waiting until server is ready for connections. forked process: 1310 child process started successfully, parent exiting

Initializing a replica-set:

  • Connect to the first MongoDB:

$ ./mongo > rs.initiate() { "info2" : "no configuration specified. Using a default configuration for the set", "me" : "mongo32:27017", "ok" : 1 }

  • Add a new member

rs01:PRIMARY> rs.add('mongo32:27018') // replace to your hostname, localhost is not allowed. { "ok" : 1 } rs01:PRIMARY> rs.add('mongo32:27019') { "ok" : 1 } rs01:PRIMARY> rs.status() { "set" : "rs01", "date" : ISODate("2016-11-10T19:40:08.190Z"), "myState" : 1, "term" : NumberLong(1), "heartbeatIntervalMillis" : NumberLong(2000), "members" : [ { "_id" : 0, "name" : "mongo32:27017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 2636, "optime" : { "ts" : Timestamp(1478806805, 1), "t" : NumberLong(1) }, "optimeDate" : ISODate("2016-11-10T19:40:05Z"), "electionTime" : Timestamp(1478804218, 2), "electionDate" : ISODate("2016-11-10T18:56:58Z"), "configVersion" : 3, "self" : true }, { "_id" : 1, "name" : "mongo32:27018", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 44, "optime" : { "ts" : Timestamp(1478806805, 1), "t" : NumberLong(1) }, "optimeDate" : ISODate("2016-11-10T19:40:05Z"), "lastHeartbeat" : ISODate("2016-11-10T19:40:07.129Z"), "lastHeartbeatRecv" : ISODate("2016-11-10T19:40:05.132Z"), "pingMs" : NumberLong(0), "syncingTo" : "mongo32:27017", "configVersion" : 3 }, { "_id" : 2, "name" : "mongo32:27019", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 3, "optime" : { "ts" : Timestamp(1478806805, 1), "t" : NumberLong(1) }, "optimeDate" : ISODate("2016-11-10T19:40:05Z"), "lastHeartbeat" : ISODate("2016-11-10T19:40:07.130Z"), "lastHeartbeatRecv" : ISODate("2016-11-10T19:40:06.239Z"), "pingMs" : NumberLong(0), "configVersion" : 3 } ], "ok" : 1 }

  • Check replication lag:

$ mongo rs01:PRIMARY> rs.printSlaveReplicationInfo() source: mongo32:27018 syncedTo: Thu Nov 10 2016 17:40:05 GMT-0200 (BRST) 0 secs (0 hrs) behind the primary source: mongo32:27019 syncedTo: Thu Nov 10 2016 17:40:05 GMT-0200 (BRST) 0 secs (0 hrs) behind the primary

  • Start an election:

$mongo rs01:PRIMARY> rs.stepDown() 2016-11-10T17:41:27.271-0200 E QUERY [thread1] Error: error doing query: failed: network error while attempting to run command 'replSetStepDown' on host '127.0.0.1:27017': DB.prototype.runCommand@src/mongo/shell/db.js:135:1 DB.prototype.adminCommand@src/mongo/shell/db.js:153:16 rs.stepDown@src/mongo/shell/utils.js:1182:12 @(shell):1:1 2016-11-10T17:41:27.274-0200 I NETWORK [thread1] trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed 2016-11-10T17:41:27.275-0200 I NETWORK [thread1] reconnect 127.0.0.1:27017 (127.0.0.1) ok rs01:SECONDARY> rs01:SECONDARY> rs.status() { "set" : "rs01", "date" : ISODate("2016-11-10T19:41:39.280Z"), "myState" : 2, "term" : NumberLong(2), "heartbeatIntervalMillis" : NumberLong(2000), "members" : [ { "_id" : 0, "name" : "mongo32:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 2727, "optime" : { "ts" : Timestamp(1478806805, 1), "t" : NumberLong(1) }, "optimeDate" : ISODate("2016-11-10T19:40:05Z"), "configVersion" : 3, "self" : true }, { "_id" : 1, "name" : "mongo32:27018", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 135, "optime" : { "ts" : Timestamp(1478806805, 1), "t" : NumberLong(1) }, "optimeDate" : ISODate("2016-11-10T19:40:05Z"), "lastHeartbeat" : ISODate("2016-11-10T19:41:37.155Z"), "lastHeartbeatRecv" : ISODate("2016-11-10T19:41:37.155Z"), "pingMs" : NumberLong(0), "configVersion" : 3 }, { "_id" : 2, "name" : "mongo32:27019", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 94, "optime" : { "ts" : Timestamp(1478806897, 1), "t" : NumberLong(2) }, "optimeDate" : ISODate("2016-11-10T19:41:37Z"), "lastHeartbeat" : ISODate("2016-11-10T19:41:39.151Z"), "lastHeartbeatRecv" : ISODate("2016-11-10T19:41:38.354Z"), "pingMs" : NumberLong(0), "electionTime" : Timestamp(1478806896, 1), "electionDate" : ISODate("2016-11-10T19:41:36Z"), "configVersion" : 3 ], "ok" : 1 } rs01:SECONDARY> exit

Shut down instances:

$ killall mongod

Hopefully, this was helpful. Please post any questions in the comments section.

Percona Server for MongoDB 3.2.11-3.1 is now available

December 7, 2016 - 9:07am

Percona announces the release of Percona Server for MongoDB 3.2.11-3.1 on December 7, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.2.11-3.1 is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocks, Percona Memory Engine, and PerconaFT storage engine, as well as enterprise-grade features like external authentication and audit logging at no extra cost. Percona Server for MongoDB requires no changes to MongoDB applications or code.

NOTE: We deprecated the PerconaFT storage engine. It will not be available in future releases.

This release is based on MongoDB 3.2.11 and includes the following additional fixes:

  • PSMDB-93: Fixed hang during shutdown of mongod when started with the --storageEngine=PerconaFT and --nojournal options
  • PSMDB-92: Added Hot Backup to Ubuntu/Debian packages
  • PSMDB-83: Updated default configuration file to include recommended settings templates for various storage engines
  • Added support for Ubuntu 16.10 (Yakkety Yak)
  • Added binary tarballs for Ubuntu 16.04 LTS (Xenial Xerus)

The release notes are available in the official documentation.

 

Webinar Thursday, December 8: Virtual Columns in MySQL and MariaDB

December 6, 2016 - 4:14pm

Please join Federico Razzoli, Consultant at Percona, on Thursday, December 8, 2016, at 8 AM PT / 11 AM ET (UTC – 8) as he presents Virtual Columns in MySQL and MariaDB.

MariaDB 5.2 and MySQL 5.7 introduced virtual columns, with different implementations.Their features and limitations are similar, but not identical. The main difference is that only MySQL allows you to build an index on a non-persistent column.

In this talk, we’ll present some use cases for virtual columns. These cases include query simplification and UNIQUE constraints based on an SQL expression. In particular, we will see how to use them to index JSON data in MySQL, or dynamic columns in MariaDB.

Performance and limitations will also be discussed.

Sign up for the webinar here.

Federico Razzoli is a relational databases lover and open source supporter. He is a MariaDB Community Ambassador and wrote “Mastering MariaDB” in 2014. Currently, he works for Percona as a consultant.

Percona Live 2017 Open Source Database Conference Tutorial Schedule is Live!

December 5, 2016 - 12:43pm

We are excited to announce that the tutorial schedule for the Percona Live 2017 Open Source Database Conference is up!

The Percona Live 2017 Open Source Database Conference 2017 is April 24th – 27th, at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Click through to the tutorial link right now, look them over, and pick which sessions you want to attend. Discounted passes available below!

Tutorial List: Early Bird Discounts

Just a reminder to everyone out there: our Early Bird discount rate for the Percona Live Open Source Database Conference 2017 is only available ‘til January 8, 2017, 11:30 pm PST! This rate gets you all the excellent and amazing opportunities that Percona Live offers, at a very reasonable price!

Sponsor Percona Live

Become a conference sponsor! We have sponsorship opportunities available for this annual MySQL, MongoDB and open source database event. Sponsors become a part of a dynamic and growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who attend the event.

MongoDB Troubleshooting: My Top 5

December 5, 2016 - 11:38am

In this blog post, I’ll discuss my top five go-to tips for MongoDB troubleshooting.

Every DBA has a war chest of their go-to solutions for any support issues they run into for a specific technology. MongoDB is no different. Even if you have picked it because it’s a good fit and it runs well for you, things will change. When things change – sometimes there is a new version of your application, or a new version of the database itself – you need to have a solid starting place.

To help new DBA’s, I like to point out my top five things that cover the bulk of requests a DBA might need to work on.

Table of Contents

jQuery('.toc').attr('target','_self');

Common greps to use

This issue is all about what are some ways to pair down the error log and make it a bit more manageable. The error log is a slew of information and sometimes, without grep, it’s challenging to correlate some events.

Is an index being built?

As a DBA you will often get a call saying the database has “stopped.” The developer might say, “I didn’t change anything.” Looking at the error log is a great first port of call. With this particular grep, you just want to see if all index builds were done, if a new index was built and is still building, or an index was removed. This will help catch all of the cases in question.

>grep -i index mongod.log 2016-11-11T17:08:53.731+0000 I INDEX [conn458] build index on: samples.col1 properties: { v: 1, key: { friends: 1.0 }, name: "friends_1", ns: "samples.col1" } 2016-11-11T17:08:53.733+0000 I INDEX [conn458] building index using bulk method 2016-11-11T17:08:56.045+0000 I - [conn458] Index Build: 24700/1000000 2% 2016-11-11T17:08:59.004+0000 I - [conn458] Index Build: 61000/1000000 6% 2016-11-11T17:09:02.001+0000 I - [conn458] Index Build: 103200/1000000 10% 2016-11-11T17:09:05.013+0000 I - [conn458] Index Build: 130800/1000000 13% 2016-11-11T17:09:08.013+0000 I - [conn458] Index Build: 160300/1000000 16% 2016-11-11T17:09:11.039+0000 I - [conn458] Index Build: 183100/1000000 18% 2016-11-11T17:09:14.009+0000 I - [conn458] Index Build: 209400/1000000 20% 2016-11-11T17:09:17.007+0000 I - [conn458] Index Build: 239400/1000000 23% 2016-11-11T17:09:20.010+0000 I - [conn458] Index Build: 264100/1000000 26% 2016-11-11T17:09:23.001+0000 I - [conn458] Index Build: 286800/1000000 28% 2016-11-11T17:09:30.783+0000 I - [conn458] Index Build: 298900/1000000 29% 2016-11-11T17:09:33.015+0000 I - [conn458] Index Build: 323900/1000000 32% 2016-11-11T17:09:36.000+0000 I - [conn458] Index Build: 336600/1000000 33% 2016-11-11T17:09:39.000+0000 I - [conn458] Index Build: 397000/1000000 39% 2016-11-11T17:09:42.000+0000 I - [conn458] Index Build: 431900/1000000 43% 2016-11-11T17:09:45.002+0000 I - [conn458] Index Build: 489100/1000000 48% 2016-11-11T17:09:48.003+0000 I - [conn458] Index Build: 551200/1000000 55% 2016-11-11T17:09:51.004+0000 I - [conn458] Index Build: 567700/1000000 56% 2016-11-11T17:09:54.004+0000 I - [conn458] Index Build: 589600/1000000 58% 2016-11-11T17:10:00.929+0000 I - [conn458] Index Build: 597800/1000000 59% 2016-11-11T17:10:03.008+0000 I - [conn458] Index Build: 633100/1000000 63% 2016-11-11T17:10:06.001+0000 I - [conn458] Index Build: 647200/1000000 64% 2016-11-11T17:10:09.008+0000 I - [conn458] Index Build: 660000/1000000 66% 2016-11-11T17:10:12.001+0000 I - [conn458] Index Build: 672300/1000000 67% 2016-11-11T17:10:15.009+0000 I - [conn458] Index Build: 686000/1000000 68% 2016-11-11T17:10:18.001+0000 I - [conn458] Index Build: 706100/1000000 70% 2016-11-11T17:10:21.006+0000 I - [conn458] Index Build: 731400/1000000 73% 2016-11-11T17:10:24.006+0000 I - [conn458] Index Build: 750900/1000000 75% 2016-11-11T17:10:27.000+0000 I - [conn458] Index Build: 773900/1000000 77% 2016-11-11T17:10:30.000+0000 I - [conn458] Index Build: 821800/1000000 82% 2016-11-11T17:10:33.026+0000 I - [conn458] Index Build: 843800/1000000 84% 2016-11-11T17:10:36.008+0000 I - [conn458] Index Build: 874000/1000000 87% 2016-11-11T17:10:43.854+0000 I - [conn458] Index Build: 896600/1000000 89% 2016-11-11T17:10:46.009+0000 I - [conn458] Index Build: 921800/1000000 92% 2016-11-11T17:10:49.000+0000 I - [conn458] Index Build: 941600/1000000 94% 2016-11-11T17:10:52.011+0000 I - [conn458] Index Build: 955700/1000000 95% 2016-11-11T17:10:55.007+0000 I - [conn458] Index Build: 965500/1000000 96% 2016-11-11T17:10:58.046+0000 I - [conn458] Index Build: 985200/1000000 98% 2016-11-11T17:11:01.002+0000 I - [conn458] Index Build: 995000/1000000 99% 2016-11-11T17:11:13.000+0000 I - [conn458] Index: (2/3) BTree Bottom Up Progress: 8216900/8996322 91% 2016-11-11T17:11:14.021+0000 I INDEX [conn458] done building bottom layer, going to commit 2016-11-11T17:11:14.023+0000 I INDEX [conn458] build index done. scanned 1000000 total records. 140 secs 2016-11-11T17:11:14.035+0000 I COMMAND [conn458] command samples.$cmd command: createIndexes { createIndexes: "col1", indexes: [ { ns: "samples.col1", key: { friends: 1.0 }, name: "friends_1" } ] } keyUpdates:0 writeConflicts:0 numYields:0 reslen:173 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, MMAPV1Journal: { acquireCount: { w: 9996326 }, acquireWaitCount: { w: 1054 }, timeAcquiringMicros: { w: 811319 } }, Database: { acquireCount: { w: 1, W: 1 } }, Collection: { acquireCount: { W: 1 } }, Metadata: { acquireCount: { W: 12 } }, oplog: { acquireCount: { w: 1 } } } 140306ms

What’s happening right now?

Like with the above index example, this helps you remove many of the messages you might not care about, or you want to block off. MongoDB does have some useful sub-component tags in the logs, such as “ReplicationExecutor” and “connXXX” that can be helpful, but I find it helpful to remove the noisy lines as opposed to the log facility types. In this example, I opted to also not have “| grep -v connection” – typically I will look at the log with connections first to see if they are acting funny, and filter those out to see the core data of what is happening. If you only want to see the long queries and command, replace “ms” with “connection” to make them easier to find.

>grep -v conn mongod.log | grep -v auth | grep -vi health | grep -v ms 2016-11-11T14:41:06.376+0000 I REPL [ReplicationExecutor] This node is localhost:28001 in the config 2016-11-11T14:41:06.377+0000 I REPL [ReplicationExecutor] transition to STARTUP2 2016-11-11T14:41:06.379+0000 I REPL [ReplicationExecutor] Member localhost:28003 is now in state STARTUP 2016-11-11T14:41:06.383+0000 I REPL [ReplicationExecutor] Member localhost:28002 is now in state STARTUP 2016-11-11T14:41:06.385+0000 I STORAGE [FileAllocator] allocating new datafile /Users/dmurphy/Github/dbmurphy/MongoDB32Labs/labs/rs2-1/local.1, filling with zeroes... 2016-11-11T14:41:06.586+0000 I STORAGE [FileAllocator] done allocating datafile /Users/dmurphy/Github/dbmurphy/MongoDB32Labs/labs/rs2-1/local.1, size: 256MB, took 0.196 secs 2016-11-11T14:41:06.610+0000 I REPL [ReplicationExecutor] transition to RECOVERING 2016-11-11T14:41:06.614+0000 I REPL [ReplicationExecutor] transition to SECONDARY 2016-11-11T14:41:08.384+0000 I REPL [ReplicationExecutor] Member localhost:28003 is now in state STARTUP2 2016-11-11T14:41:08.386+0000 I REPL [ReplicationExecutor] Standing for election 2016-11-11T14:41:08.388+0000 I REPL [ReplicationExecutor] Member localhost:28002 is now in state STARTUP2 2016-11-11T14:41:08.390+0000 I REPL [ReplicationExecutor] not electing self, localhost:28002 would veto with 'I don't think localhost:28001 is electable because the member is not currently a secondary (mask 0x8)' 2016-11-11T14:41:08.391+0000 I REPL [ReplicationExecutor] not electing self, we are not freshest 2016-11-11T14:41:10.387+0000 I REPL [ReplicationExecutor] Standing for election 2016-11-11T14:41:10.389+0000 I REPL [ReplicationExecutor] replSet info electSelf 2016-11-11T14:41:10.393+0000 I REPL [ReplicationExecutor] received vote: 1 votes from localhost:28003 2016-11-11T14:41:10.395+0000 I REPL [ReplicationExecutor] replSet election succeeded, assuming primary role 2016-11-11T14:41:10.396+0000 I REPL [ReplicationExecutor] transition to PRIMARY 2016-11-11T14:41:10.631+0000 I REPL [rsSync] transition to primary complete; database writes are now permitted 2016-11-11T14:41:12.390+0000 I REPL [ReplicationExecutor] Member localhost:28003 is now in state SECONDARY 2016-11-11T14:41:12.393+0000 I REPL [ReplicationExecutor] Member localhost:28002 is now in state SECONDARY versus 2016-11-11T14:41:12.393+0000 I REPL [ReplicationExecutor] Member localhost:28002 is now in state SECONDARY 2016-11-11T14:41:36.433+0000 I NETWORK [conn3] end connection 127.0.0.1:65497 (1 connection now open) 2016-11-11T14:41:36.433+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49191 #8 (3 connections now open) 2016-11-11T14:41:36.490+0000 I NETWORK [conn2] end connection 127.0.0.1:65496 (1 connection now open) 2016-11-11T14:41:36.490+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49192 #9 (3 connections now open) 2016-11-11T14:41:54.480+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49257 #10 (3 connections now open) 2016-11-11T14:41:54.486+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49258 #11 (4 connections now open) 2016-11-11T14:42:06.493+0000 I NETWORK [conn8] end connection 127.0.0.1:49191 (3 connections now open) 2016-11-11T14:42:06.494+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49290 #12 (5 connections now open) 2016-11-11T14:42:06.550+0000 I NETWORK [conn9] end connection 127.0.0.1:49192 (3 connections now open) 2016-11-11T14:42:06.550+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49294 #13 (5 connections now open) 2016-11-11T14:42:36.550+0000 I NETWORK [conn12] end connection 127.0.0.1:49290 (3 connections now open) 2016-11-11T14:42:36.550+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49336 #14 (5 connections now open) 2016-11-11T14:42:36.601+0000 I NETWORK [conn13] end connection 127.0.0.1:49294 (3 connections now open) 2016-11-11T14:42:36.601+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49339 #15 (5 connections now open) 2016-11-11T14:43:06.607+0000 I NETWORK [conn14] end connection 127.0.0.1:49336 (3 connections now open) 2016-11-11T14:43:06.608+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49387 #16 (5 connections now open) 2016-11-11T14:43:06.663+0000 I NETWORK [conn15] end connection 127.0.0.1:49339 (3 connections now open) 2016-11-11T14:43:06.663+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49389 #17 (5 connections now open) 2016-11-11T14:43:36.655+0000 I NETWORK [conn16] end connection 127.0.0.1:49387 (3 connections now open) 2016-11-11T14:43:36.656+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49436 #18 (5 connections now open) 2016-11-11T14:43:36.718+0000 I NETWORK [conn17] end connection 127.0.0.1:49389 (3 connections now open) 2016-11-11T14:43:36.719+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49439 #19 (5 connections now open) 2016-11-11T14:44:06.705+0000 I NETWORK [conn18] end connection 127.0.0.1:49436 (3 connections now open) 2016-11-11T14:44:06.705+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49481 #20 (5 connections now open) 2016-11-11T14:44:06.786+0000 I NETWORK [conn19] end connection 127.0.0.1:49439 (3 connections now open) 2016-11-11T14:44:06.786+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49483 #21 (5 connections now open) 2016-11-11T14:44:36.757+0000 I NETWORK [conn20] end connection 127.0.0.1:49481 (3 connections now open) 2016-11-11T14:44:36.757+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49526 #22 (5 connections now open) 2016-11-11T14:44:36.850+0000 I NETWORK [conn21] end connection 127.0.0.1:49483 (3 connections now open)

Did any elections happen? Why did they happen?

While this isn’t the most common command to run, it is very helpful if you aren’t using Percona Monitoring and Management (PMM) to track the historical frequency of elections. In this example, we want up to 20 lines before and after the word “SECONDARY”, which typically guards when a step-down or election takes place. Then you can see around that time if a command was issued, did a network error occur, was there a heartbeat failure or other such scenario.

grep -i SECONDARY -A20 -B20 2016-11-11T14:44:38.622+0000 I COMMAND [conn22] Attempting to step down in response to replSetStepDown command 2016-11-11T14:44:38.625+0000 I REPL [ReplicationExecutor] transition to SECONDARY 2016-11-11T14:44:38.627+0000 I NETWORK [conn10] end connection 127.0.0.1:49253 (4 connections now open) 2016-11-11T14:44:38.627+0000 I NETWORK [conn11] end connection 127.0.0.1:49254 (4 connections now open) 2016-11-11T14:44:38.630+0000 I NETWORK [thread1] trying reconnect to localhost:27001 (127.0.0.1) failed 2016-11-11T14:44:38.628+0000 I NETWORK [conn22] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:49506] 2016-11-11T14:44:38.630+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49529 #25 (5 connections now open) 2016-11-11T14:44:38.633+0000 I NETWORK [thread1] reconnect localhost:27001 (127.0.0.1) ok 2016-11-11T14:44:40.567+0000 I REPL [ReplicationExecutor] replSetElect voting yea for localhost:27002 (1) 2016-11-11T14:44:42.223+0000 I REPL [ReplicationExecutor] Member localhost:27002 is now in state PRIMARY 2016-11-11T14:44:44.314+0000 I NETWORK [initandlisten] connection accepted from 127.0.0.1:49538 #26 (4 connections now open)

Is replication lagged, do I have enough oplog?

Always write a single test document just to ensure replication has a recent write:

db.getSiblingDB('repltest').col.insert({x:1}); db.getSiblingDB('repltest').dropDatabase();

Checking lag information:

rs1:PRIMARY> db.printSlaveReplicationInfo() source: localhost:27002 syncedTo: Fri Nov 11 2016 17:11:14 GMT+0000 (GMT) 0 secs (0 hrs) behind the primary source: localhost:27003 syncedTo: Fri Nov 11 2016 17:11:14 GMT+0000 (GMT) 0 secs (0 hrs) behind the primary

Oplog Size and Range:

rs1:PRIMARY> db.printReplicationInfo() configured oplog size: 192MB log length start to end: 2154secs (0.6hrs) oplog first event time: Fri Nov 11 2016 16:35:20 GMT+0000 (GMT) oplog last event time: Fri Nov 11 2016 17:11:14 GMT+0000 (GMT) now: Fri Nov 11 2016 17:16:46 GMT+0000 (GMT)

Taming the profiler

MongoDB is filled with tons of data in the profiler. I have highlighted some key points to know:

{ "queryPlanner" : { "mongosPlannerVersion" : 1, "winningPlan" : { "stage" : "SINGLE_SHARD", "shards" : [ { "shardName" : "rs3", "connectionString" : "rs3/localhost:29001,localhost:29002,localhost:29003", "serverInfo" : { "host" : "Davids-MacBook-Pro-2.local", "port" : 29001, "version" : "3.0.11", "gitVersion" : "48f8b49dc30cc2485c6c1f3db31b723258fcbf39" }, "plannerVersion" : 1, "namespace" : "blah.foo", "indexFilterSet" : false, "parsedQuery" : { "name" : { "$eq" : "Bob" } }, "winningPlan" : { "stage" : "COLLSCAN", "filter" : { "name" : { "$eq" : "Bob" } }, "direction" : "forward" }, "rejectedPlans" : [ ] } ] } }, "executionStats" : { "nReturned" : 0, "executionTimeMillis" : 0, "totalKeysExamined" : 0, "totalDocsExamined" : 1, "executionStages" : { "stage" : "SINGLE_SHARD", "nReturned" : 0, "executionTimeMillis" : 0, "totalKeysExamined" : 0, "totalDocsExamined" : 1, "totalChildMillis" : NumberLong(0), "shards" : [ { "shardName" : "rs3", "executionSuccess" : true, "executionStages" : { "stage" : "COLLSCAN", "filter" : { "name" : { "$eq" : "Bob" } }, "nReturned" : 0, "executionTimeMillisEstimate" : 0, "works" : 3, "advanced" : 0, "needTime" : 2, "needFetch" : 0, "saveState" : 0, "restoreState" : 0, "isEOF" : 1, "invalidates" : 0, "direction" : "forward", "docsExamined" : 1 } } ] }, "allPlansExecution" : [ { "shardName" : "rs3", "allPlans" : [ ] } ] }, "ok" : 1 }

Metric Description Filter Formulated query that was run. Right above it you can find the parsed query. These should be the same. It’s useful to know what the engine was sent in the end. nReturned Number of documents to return via the cursor to the client running the query/command. executionTimeMillis This used just to be called “ms”, but it means how long did this operation take. Typically you would measure this like a slow query in any database. total(Keys|Docs)Examined Unlike returned, this is what might be considered since not all indexes have perfect coverage, and sometimes you scan many documents to find no results. stage While poorly named, this will tell you if a collection scan (table scan) or index is used to answer a given operation. In the case of an index, it will say the name.

 

CurrentOp and killOp explained

When using db.CurrentOp() to see what is running, I frequently include db.currentOp(true) so that I can see everything and not just limited items. This makes the currentOp function look and act much more like SELECT * from information_schema.processlist in MySQL. One significant difference that commonly catches a new DBA off guard is the killing of operations between MySQL and MongoDB. While Mongo does have a handy db.killOp(<op_id>) function, it is important to know that unlike MySQL – which immediately kills the thread running the process – MongoDB is a bit different. When you run killOp(), MongoDB appends “killed: true” into the document structure. When the next yield occurs (if it occurs), it will tell the operation to quit. This is also how a shutdown works: if it seems like it’s not shutting down, it might be waiting for an operation to yield and notice the shutdown request.

I’m not arguing that this is bad or good, just different from MySQL and something of which you should be aware. One thing to note, however, is that MongoDB has great built in HA. Sometimes it is better to cause an election and let the drivers gracefully handle things, rather than running the killOp command (unless it’s a write, then you should always try and use  killOp).

Conclusion

I hope you have found some of this insightful. Look for future posts from the MongoDB team around other MongoDB areas we like to look at (or in different parts of the system) to help ourselves and clients get to the root of an issue.

Business Continuity and MySQL Backups

December 2, 2016 - 2:30pm

This blog post discusses the business continuity plan around MySQL backups, and how organizations should think about them.

During the years I’ve worked in IT, I’ve learned that backups sometimes are a conceptual subject in organizations. Many companies have them, but don’t document the associated business continuation plan for them. I experienced this the hard way many many years ago, somewhere around when MySQL 5.0 was still widely used.

In most organizations, there are a couple of business continuity subjects that should be described internally. For example, what is the recovery time objective and what is the recovery point objective. Let’s go a bit deeper into both concepts:

Recovery Point Objective:

A recovery point objective describes the utter limit of time data can be lost during a major incident. For example, recovery while a massive data center failure happens. One of the questions you should ask prior to these situations is what is a tolerable time point for lost information? 

If you have a recovery point objective of over a day, your daily backup routines might cover this. However, if you have a recovery point objective that is more stringent, you might be forced to have some additional tools like binary streaming or incremental backup.

Recovery Time Objective

This second term and concept is also essential in building a business continuity plan. Your environment has to remain active to generate traffic and, potentially, revenue.

What are the requirements promised to your customers? Are there any SLA’s described with the customer, or is it best effort? If it’s best effort, what would be the tipping point for your users to start using an alternative service from your competitor. These are all factors to consider while determining your RTO.

In Short

If the recovery point objective and recovery time objective are stringent, this might mean additional costs might be required when buying hardware, or perhaps having a secondary data center becomes mandatory. However, it’s a cost/value discussion: what makes your company lose revenue, and what is acceptable during a crisis?

Based on your business continuity requirements, you can potentially build your DR plans. Make sure your business continuity requirements builds the DR plan, and not vice versa.

What tools do you have at your disposal to create sensible MySQL backups? Logical backups

MySQLdump. Remember mysqldump, the original tool included in MySQL? The good thing about mysqldump is that you can actually read and even edit the output of the backup before potentially restoring data, which can prove interesting during development work.

mysqldump’s biggest negative is that it’s not scalable, nor fast for backing up large amounts of data. Additionally, restoring data is even slower as you must replay the complete dataset on your new MySQL database servers (rebuild indexes, large IO, etc.).

mysqldump’s advantages include the convenience and flexibility of viewing or even editing the output before restoring. It gives you the ability to clone databases for development, and produce slight variations of an existing database for testing.

mydumper. This tool is comparable to mysqldump, however it does it in parallel, which provides significant benefits in backup time and restoration time.

Binary backups

Binary backups refers to copies made of the entire MySQL dataset. Binary backups are typically faster compared to logical backups, especially  on larger datasets. Several tools come to mind in these cases.

Percona XtrabackupAn opensource binary backup solution for InnoDB. The good thing about XtraBackup is that it is non-locking when using MySQL with the InnoDB storage engine. 

MySQL Enterprise BackupAn InnoDB hot backup solution that is included in the subscription level of MySQL enterprise. 

These tools can offer you incremental and daily backups, however they still don’t bring you point-in-time recovery. If your recovery point objective is very limited, it might mean that that you require to externally store (backup) your binary logs and replay them on your restored database. Keep in mind that this factor potentially impacts your recovery time objective.

Delayed Slaves

This concept is not a backup, but this technology might help you to recover your database and limit the recovery time significantly.

Conclusion

We’ve discussed having a business continuity requirement list, and some potential tools that might assist you in covering them (at least on the MySQL level). One of the last items that is important is actual testing. The number of companies that require data recovery and then notice that their backups are corrupted are way too numerous.

Make sure your organization tests their backups regularly. Are you sure they work properly? Make sure that you perform regression tests for new code – for example on a restoration set of the backups.

If you make sure you trust your backups, you might sleep better at night!   ;-). 

Make MySQL 8.0 Better Through Better Benchmarking

December 2, 2016 - 11:58am

This blog post discusses how better MySQL 8.0 benchmarks can improve MySQL in general.

Like many in MySQL community, I’m very excited about what MySQL 8.0 offers. There are a lot of great features and architecture improvements. Also like many in the MySQL community, I would like to see MySQL 8.0 perform better. Better performance is what we always want (and expect) from new database software releases.

Rarely do performance improvements happen by accident – they require running benchmarks, finding bottlenecks and eliminating them. This is the area where I think things could use improvement.

If you come to the MySQL Keynote at Oracle OpenWorld, or if you go to MySQL Benchmarks Page, you find a very limited set of benchmarks. They mostly focus around sysbench performance, with large numbers of connections and large numbers of cores. I’m not convinced this effort is the best use of our time these days.

Don’t get me wrong: as one of the original designers of sysbench, it is a great and simple tool that helps spot many bottlenecks. I still use it to find performance issues. But it is only one tool, which is by no means provides full coverage of real-world MySQL workloads.

I agree with Mark Callaghan (see discussion here): we need to run more benchmarks using a wider set of circumstances, to ensure there are no regressions in new releases. This will help move MySQL performance forward for real users.      

Here are some specific ideas on how I think we could benchmark MySQL 8.0 better:

  1. Focus on production-recommended settings. Way too often we see benchmarks run with the doublewrite buffer disabled, InnoDB checksum disabled and no binary log (like in this benchmark). While they produce some excitingly high numbers, they have little practical value for real workloads. At very least I would very much like to see separate numbers for the “street legal car,” versus one designed to set a record on the salt flats.
  2. Go beyond sysbench. Sysbench focuses on PK-only based access for very simple tables, and does not even do JOINs as part of its workload. I would like to see more benchmarks that have tables with many indexes, using secondary key lookups and joins, involving rows with many fields, and medium and large size blobs that are common bottlenecks. We also need more database features covered. Are foreign keys or triggers getting faster or slower? What about stored procedure execution performance? I would love to see these get covered. Mark Callaghan suggests LinkBench, which I think is a fine benchmark to add, but it shouldn’t be the only one.
  3. Do more with sysbench. Sysbench could get more done and cover more important workloads. Workloads with data fitting in memory and not fitting in memory should be shown separately. Testing performance with large numbers of tables is also very important – many MySQL installations for SaaS applications run tens of thousands of tables (sometimes going into millions). I would also suggest running more injection benchmarks with sysbench, as they are more representative of the real world.
  4. Look at latency, not just throughput. The benchmarks results we commonly see are focused on the throughput over a long period of time, without looking at the latency and how performance changes over time. Stalls and performance dips are well known in the MySQL space – especially the famous InnoDB checkpointing woes (though this issue has gotten a lot better) There are other cases and circumstances where stalls and slowdowns can happen.  
  5. Measure resource consumption. I very much like how Mark Callaghan shows the CPU usage and IO usage per transaction/operation, so we can get a better idea of efficiency.
  6. Concurrency. Recently, the focus has been on very high concurrency in terms of connections and active connections, typically on very big iron (using as many as 72 cores). And as much as this is important to “future-proofing” MySQL as we get more and more cores per socket every year, it should not be the only focus. In fact, it is extremely rare for me to see sustained loads of more than 20-40  “threads running” for well-configured systems. With modern solutions like ProxySQL, you can restrict concurrency to the most optimal levels for your server through multiplexing. Not to mention the thread pool, which is available in MySQL Enterprise, Percona Server and MariaDB. I would like to see a much more focused benchmark at medium-to-low concurrency. The fact that single thread performance has gotten slower in every Major MySQL version is not a good thing. As MySQL currently runs a single query in a single thread, it impacts query latencies in many real-world situations.
  7. Virtualization. We need more benchmarks in virtualized environments, as virtualization and the cloud are where most workloads are these days (by number). Yes, big iron and bare metal are where you get the best performance, but it’s not where most users are running MySQL. Whenever you are looking at full blown virtualization or containers, the performance profile can be substantially different from bare metal. Virtualized instances often have smaller CPU cores – getting the best performance with 8-16 virtual cores might be a more relevant data set for many than the performance with 100+ cores.
  8. SSL and encryption. MySQL 5.7 was all about security. We’re supposed to be able to enable SSL easily, but was any work done on making it cheap? The benchmark Ernie Souhrada did a few years back showed a pretty high overhead (in MySQL 5.6). We need more focus on SSL performance, and getting it would allow more people to run MySQL with SSL. I would also love to see more benchmarks with encryption enabled, to understand better how much it costs to have your data encrypted “at rest,” and in what cases.
  9. Protocol X and MySQL Doc Store. These were added after MySQL 5.7 GA, so it would be unfair to complain about the lack of benchmarks comparing the performance of those versus previous versions. But if Protocol X is the future, some benchmarks are in order. It would be great to have official numbers on the amount of overhead using MySQL Doc Store has compared to SQL (especially since we know that queries are converted to SQL for execution).
  10. Replication benchmarks. There are a lot of great replication features in newer MySQL versions: statement/row/mixed, GTID or no GTID, chose multiple formats for row events, enable various forms of semi-sync replication, two ways of parallel replication and multi-source replication. Additionally, MySQL group replication is on the way. There seems to be very little comprehensive benchmarks for these features, however. We really need to understand how they scale and perform under various workloads.
  11. Mixed workloads.  Perhaps one of the biggest differences between benchmarks and real production environments is that in benchmarks the same workload often is used over and over, while in the real world there is a mix of “application workloads.” The real world also has additional tasks such as backups, reporting or running “online” ALTER TABLE operations. Practical performance is performance you can count on while also serving these types of background activities. Sometimes you can get a big surprise from the severity of impact from such background activities.
  12. Compression benchmarks. There have been some InnoDB compression benchmarks (both for new and old methods), but they are a completely separate set of benchmarks that are hard to put in context with everything else. For example, do they scale well with high numbers of connections and large numbers of cores?
  13. Long-running benchmarks. A lot of the benchmarks run are rather short. Many of the things that affect performance take time to accumulate: memory fragmentation on the process (or OS Kernel) side, disk fragmentation and database fragmentation. For a database that is expected to run many months without restarting, it would be great to see some benchmark runs that last several days/weeks to check long term stability, or if there is a regression or resource leak.
  14. Complex queries. While MySQL is not an analytical database, it would still be possible to run complex queries with JOINs while the MySQL optimizer team provides constant improvements to the optimizer. It would be quite valuable to see how optimizer improvements affect query execution. We want to see how these improvements affect scalability with hardware and concurrency as well..

These are just some of ideas on what could be done. Of course, there are only so many things the performance engineering team can focus at the time: one can’t boil the ocean! My main suggestion is this: we have done enough deep optimizing of primary key lookups with sysbench on high concurrency and monster hardware, and it’s time to go wider. This ensures that MySQL doesn’t falter with poor performance on commonly run workloads. Benchmarks like these have much more practical value than beating one million primary key selects a second on a single server.

Managing Replication with Percona XtraDB Cluster

December 1, 2016 - 2:56pm

This blog post discusses managing replication with Percona XtraDB Cluster.

Recently a customer asked me to setup replication between two distinct Percona XtraDB Clusters located in geographically separate data centers. The customer goal was to use one of the clusters only in case of disaster recovery. They tried extending the cluster, but because of the WAN latency impact on their writes and the requirements of a node in a third data center for quorum, they walked away from that setup. Since they were not concerned about losing a few transactions in case of a major disaster, they were OK with regular MySQL replication using GTIDs.

Easy enough right! Both clusters are cloud-based, of course, and the provider can stop/restart any node on short notice. This setup caused some concern for the customer around how to handle replication. Since they don’t have dedicated personnel to monitor replication, or at least handle alerts, they asked if we could find a way to automate the process. So, here we go!

We all try to solve the problems with the tools we know. In my case, I like Pacemaker a lot. So using Pacemaker was my first thought. In a cloud environment, a Pacemaker setup is not easy (wouldn’t that be a cluster in a cluster… a bit heavy). But wait! Percona XtraDB Cluster with Galera replication is already handling quorum, and it provides a means of exchanging information between the nodes. Why not use that?

We can detect quorum status the same way the clustercheck scripts do it. To exchange messages, why don’t we simply write to a table. The Galera replication will update the other nodes. I went on and wrote a bash script that is called by cron every minute. The script monitors the node state and the content of the table. If all is right, it updates the table to report its presence (and if it is acting as a slave or not). The script validates the presence of a slave in the cluster. If no reporting slave is found, the script proceeds to the “election” of a new slave, based on the wsrep_local_index value. Basically, the script is a big bunch of “if” statements. The script is here, and the basic documentation on how to set it up here

Of course, if it works for one cluster, it can work for two. I have configured my customer’s two Percona XtraDB Clusters in a master-to-master relationship using this script. I ran through a bunch of failure scenario cases. The script survived all of them! But of course, this is new. If you are going to implement this solution, run your own set of tests! If you find any problem, file an issue on GitHub. I’ll be happy to fix it!

Database Daily Ops Series: GTID Replication and Binary Logs Purge

December 1, 2016 - 9:43am

This blog continues the ongoing series on daily operations and GTID replication.

In this blog, I’m going to investigate why the error below has been appearing in a special environment I’ve been working with on the last few days:

Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

The error provides the right message, and explains what is going on. But sometimes, it can be a bit tricky to solve this issue: you need additional information discovered after some tests and readings. We try and keep Managed Services scripted, in the sense that our advice and best practices are repeatable and consistent. However, some additional features and practices can be added depending on the customer situation.

Some time ago one of our customer’s database servers presented the above message. At that point, we could see the binary log files in a compressed form on master (gzipped). Of course, MySQL can’t identify a compressed file with a .gz extension as a binary log. We uncompressed the file, but replication presented the same problem – even after uncompressing the file and making sure the UUID of the current master and the TRX_ID were there. Obviously, I needed to go and investigate the problem to see what was going on.

After some reading, I re-read the below:

When the server starts, the global value of gtid_purged, which was called before as gtid_lost, is initialized to the set of GTIDs contained by the Previous_gtid_log_event of the oldest binary log. When a binary log is purged, gtid_purged is re-read from the binary log that has now become the oldest one.

=> https://dev.mysql.com/doc/refman/5.6/en/replication-options-gtids.html#sysvar_gtid_purged

That makes me think: if something is compressing binlogs on the master without purging them as expected by the GTID mechanism, it’s not going to be able to re-read existing GTIDs on disk. When the slave replication threads restarts, or the DBA issues commands like reset slave and reset master (to clean out the increased GTID sets on Executed_Gtid_Set from the SHOW SLAVE STATUS command, for example), this error can occur. But if I compress the file:

  • Will the slave get lost and not find all the needed GTIDs on the master after a reset slave/reset master?
  • If I purge the logs correctly, using PURGE BINARY LOGS, will the slave be OK when restarting replication threads?

Test 1: Compressing the oldest binary log file on master, restarting slave threads

I would like to test this very methodically. We’ll create one GTID per binary log, and then I will compress the oldest binary log file in order to make it unavailable for the slaves. I’m working with three virtual machines, one master and two slaves. On the second slave, I’m going to run the following sequence: stop slave, reset slave, reset master, start slave, and then, check the results. Let’s see what happens.

On master (tool01):

tool01 [(none)]:> show master logs; +-------------------+-----------+ | Log_name          | File_size | +-------------------+-----------+ | mysqld-bin.000001 |       341 | | mysqld-bin.000002 |       381 | | mysqld-bin.000003 |       333 | +-------------------+-----------+ 3 rows in set (0.00 sec) tool01 [(none)]:> show binlog events in 'mysqld-bin.000001'; +-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ | Log_name          | Pos | Event_type     | Server_id | End_log_pos | Info                                                              | +-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ | mysqld-bin.000001 |   4 | Format_desc    |         1 |         120 | Server ver: 5.6.32-log, Binlog ver: 4                             | | mysqld-bin.000001 | 120 | Previous_gtids |         1 |         151 |                                                                   | | mysqld-bin.000001 | 151 | Gtid           |         1 |         199 | SET @@SESSION.GTID_NEXT= '4fbe2d57-5843-11e6-9268-0800274fb806:1' | | mysqld-bin.000001 | 199 | Query          |         1 |         293 | create database wb01                                              | | mysqld-bin.000001 | 293 | Rotate         |         1 |         341 | mysqld-bin.000002;pos=4                                           | +-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ 5 rows in set (0.00 sec) tool01 [(none)]:> show binlog events in 'mysqld-bin.000002'; +-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ | Log_name          | Pos | Event_type     | Server_id | End_log_pos | Info                                                              | +-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ | mysqld-bin.000002 |   4 | Format_desc    |         1 |         120 | Server ver: 5.6.32-log, Binlog ver: 4                             | | mysqld-bin.000002 | 120 | Previous_gtids |         1 |         191 | 4fbe2d57-5843-11e6-9268-0800274fb806:1                            | | mysqld-bin.000002 | 191 | Gtid           |         1 |         239 | SET @@SESSION.GTID_NEXT= '4fbe2d57-5843-11e6-9268-0800274fb806:2' | | mysqld-bin.000002 | 239 | Query          |         1 |         333 | create database wb02                                              | | mysqld-bin.000002 | 333 | Rotate         |         1 |         381 | mysqld-bin.000003;pos=4                                           | +-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ 5 rows in set (0.00 sec) tool01 [(none)]:> show binlog events in 'mysqld-bin.000003'; +-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ | Log_name          | Pos | Event_type     | Server_id | End_log_pos | Info                                                              | +-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ | mysqld-bin.000003 |   4 | Format_desc    |         1 |         120 | Server ver: 5.6.32-log, Binlog ver: 4                             | | mysqld-bin.000003 | 120 | Previous_gtids |         1 |         191 | 4fbe2d57-5843-11e6-9268-0800274fb806:1-2                          | | mysqld-bin.000003 | 191 | Gtid           |         1 |         239 | SET @@SESSION.GTID_NEXT= '4fbe2d57-5843-11e6-9268-0800274fb806:3' | | mysqld-bin.000003 | 239 | Query          |         1 |         333 | create database wb03                                              | +-------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ 4 rows in set (0.00 sec)

Here we see that each existing binary log file has just one transaction. That will make it easier to compress the oldest binary log, and then disappear with part of the existing GTIDs. When the slave connects to a master, it will first send all the Executed_Gtid_Set, and then the master sends all the missing IDs to the slave. As Stephane Combaudon said, we will force it to happen! Slave database servers are both currently in the same position:

tool02 [(none)]:> show slave statusG *************************** 1. row ***************************                Slave_IO_State: Waiting for master to send event                   Master_Host: 192.168.0.10                   Master_User: repl                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File: mysqld-bin.000003           Read_Master_Log_Pos: 333                Relay_Log_File: mysqld-relay-bin.000006                 Relay_Log_Pos: 545         Relay_Master_Log_File: mysqld-bin.000003              Slave_IO_Running: Yes             Slave_SQL_Running: Yes             ...            Retrieved_Gtid_Set: 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 4fbe2d57-5843-11e6-9268-0800274fb806:1-3              tool03 [(none)]:> show slave statusG *************************** 1. row ***************************                Slave_IO_State: Waiting for master to send event                   Master_Host: 192.168.0.10                   Master_User: repl                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File: mysqld-bin.000003           Read_Master_Log_Pos: 333                Relay_Log_File: mysqld-relay-bin.000008                 Relay_Log_Pos: 451         Relay_Master_Log_File: mysqld-bin.000003              Slave_IO_Running: Yes             Slave_SQL_Running: Yes ...            Retrieved_Gtid_Set: 4fbe2d57-5843-11e6-9268-0800274fb806:1-3             Executed_Gtid_Set: 4fbe2d57-5843-11e6-9268-0800274fb806:1-3

Now, we’ll compress the oldest binary log on master:

[root@tool01 mysql]# ls -lh | grep mysqld-bin. -rw-rw---- 1 mysql mysql  262 Nov 11 13:55 mysqld-bin.000001.gz #: this is the file containing 4fbe2d57-5843-11e6-9268-0800274fb806:1 -rw-rw---- 1 mysql mysql  381 Nov 11 13:55 mysqld-bin.000002 -rw-rw---- 1 mysql mysql  333 Nov 11 13:55 mysqld-bin.000003 -rw-rw---- 1 mysql mysql   60 Nov 11 13:55 mysqld-bin.index

On tool03, which is the database server that will be used, we will execute the replication reload:

tool03 [(none)]:> stop slave; reset slave; reset master; start slave; Query OK, 0 rows affected (0.01 sec) Query OK, 0 rows affected (0.03 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.02 sec) tool03 [(none)]:> show slave statusG *************************** 1. row ***************************                Slave_IO_State:                   Master_Host: 192.168.0.10                   Master_User: repl                   Master_Port: 3306                 Connect_Retry: 60               Master_Log_File:           Read_Master_Log_Pos: 4                Relay_Log_File: mysqld-relay-bin.000002                 Relay_Log_Pos: 4         Relay_Master_Log_File:              Slave_IO_Running: No             Slave_SQL_Running: Yes               Replicate_Do_DB:           Replicate_Ignore_DB:            Replicate_Do_Table:        Replicate_Ignore_Table:       Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:                    Last_Errno: 0                    Last_Error:                  Skip_Counter: 0           Exec_Master_Log_Pos: 0               Relay_Log_Space: 151               Until_Condition: None                Until_Log_File:                 Until_Log_Pos: 0            Master_SSL_Allowed: No            Master_SSL_CA_File:            Master_SSL_CA_Path:               Master_SSL_Cert:             Master_SSL_Cipher:                Master_SSL_Key:         Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No                 Last_IO_Errno: 1236                 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'                Last_SQL_Errno: 0                Last_SQL_Error:   Replicate_Ignore_Server_Ids:              Master_Server_Id: 1                   Master_UUID: 4fbe2d57-5843-11e6-9268-0800274fb806              Master_Info_File: /var/lib/mysql/master.info                     SQL_Delay: 0           SQL_Remaining_Delay: NULL       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it            Master_Retry_Count: 86400                   Master_Bind:       Last_IO_Error_Timestamp: 161111 14:47:13      Last_SQL_Error_Timestamp:                Master_SSL_Crl:            Master_SSL_Crlpath:            Retrieved_Gtid_Set:             Executed_Gtid_Set:                 Auto_Position: 1 1 row in set (0.00 sec)

Bingo! We broke the replication streaming on the slave. Now we know that the missing GTID on the master was due to the compressed file, and wasn’t able to be passed along to the connecting slave during their negotiation. Additionally, @@GTID_PURGED was not reloaded as per what the online manual said. The test done and we confirmed the theory (if you have additional comments, enter it at the end of the blog).

Test 2: Purge the oldest file on master and reload replication on slave

Let’s make it as straightforward as possible. The purge can be done manually using the PURGE BINARY LOGS command to get it done a proper way as the binary log index file should be considered a part of this purge operation as well (it should be edited to remove the file name index entry together with the log file on disk). I’m going to execute the same as before, but include purging the file manually with the mentioned command.

tool01 [(none)]:> show master logs; +-------------------+-----------+ | Log_name | File_size | +-------------------+-----------+ | mysqld-bin.000001 | 341 | | mysqld-bin.000002 | 381 | | mysqld-bin.000003 | 333 | +-------------------+-----------+ 3 rows in set (0.00 sec) tool01 [(none)]:> purge binary logs to 'mysqld-bin.000002'; Query OK, 0 rows affected (0.01 sec) tool01 [(none)]:> show master logs; +-------------------+-----------+ | Log_name | File_size | +-------------------+-----------+ | mysqld-bin.000002 | 381 | | mysqld-bin.000003 | 333 | +-------------------+-----------+ 2 rows in set (0.00 sec)

Now, we’ll execute the commands to check how it goes:

tool03 [(none)]:> stop slave; reset slave; reset master; start slave; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.02 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.02 sec) tool03 [(none)]:> show slave statusG *************************** 1. row *************************** Slave_IO_State: Master_Host: 192.168.0.10 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: Read_Master_Log_Pos: 4 Relay_Log_File: mysqld-relay-bin.000002 Relay_Log_Pos: 4 Relay_Master_Log_File: Slave_IO_Running: No Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 0 Relay_Log_Space: 151 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 Master_UUID: 4fbe2d57-5843-11e6-9268-0800274fb806 Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: 161111 16:35:02 Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 1 1 row in set (0.00 sec)

The GTID on the purged file is needed by the slave. In both cases, we can set the @@GTID_PURGED as below with the transaction that we know was purged, and move forward with replication:

tool03 [(none)]:> stop slave; set global gtid_purged='4fbe2d57-5843-11e6-9268-0800274fb806:1'; Query OK, 0 rows affected, 1 warning (0.00 sec) Query OK, 0 rows affected (0.01 sec) tool03 [(none)]:> start slave; Query OK, 0 rows affected (0.01 sec)

The above adjusts the GTID on @@GTID_PURGED to just request the existing GTIDs, using the oldest existing GTID minus one to make the slave start the replication from the oldest existing GTID. In our scenario above, the replica restarts replication from 4fbe2d57-5843-11e6-9268-0800274fb806:2, which lives on binary log file mysqld-bin.000002. Replication is fixed, as its threads can restart processing the data streaming coming from master.

You will need to execute additional steps in checksum and sync for the set of transactions that were jumped when setting a new value for @@GTID_PURGED. If replication continues to break after restarting, I advise you rebuild the slave (possibly the subject of future blog).

Good explanations about this can be found on the below bug, reported by the Facebook guys and Laurynas Biveinis, the Percona Server Lead (who clarified the issue):

  • MySQL Bugs: #72635: Data inconsistencies when master has truncated binary log with GTID after crash;
  • MySQL Bugs: #73032: Setting gtid_purged may break auto_position and thus slaves;

Conclusion

Be careful when purging or doing something manually with binary logs, because @@GTID_PURGED needs to be automatically updated when binary logs are purged. It seems to happen only when expire_logs_days is set to purge binary logs. Yet you need to be careful when trusting this variable, because it doesn’t consider fraction of days, depending the number of writes on a database server, it can get disks full in minutes. This blog showed that even housekeeping scripts and the PURGER BINARY LOGS command were able to make it happen.

Galera Cache (gcache) is finally recoverable on restart

November 30, 2016 - 2:38pm

This post describes how to recover Galera Cache (or gcache) on restart.

Recently Codership introduced (with Galera 3.19) a very important and long awaited feature. Now users can recover Galera cache on restart.

Need

If you gracefully shutdown cluster nodes one after another, with some lag time between nodes, then the last node to shutdown holds the latest data. Next time you restart the cluster, the last node shutdown will be the first one to boot. Any followup nodes that join the cluster after the first node will demand an SST.

Why SST, when these nodes already have data and only few write-sets are missing? The DONOR node caches missing write-sets in Galera cache, but on restart this cache is wiped clean and restarted fresh. So the DONOR node doesn’t have a Galera cache to donate missing write-sets.

This painful set up made it necessary for users to think and plan before gracefully taking down the cluster. With the introduction of this new feature, the user can retain the Galera cache.

How does this help ?

On restart, the node will revive the galera-cache. This means the node can act as a DONOR and service missing write-sets (facilitating IST, instead of using SST). This option to retain the galera-cache is controlled by an option named gcache.recover=yes/no. The default is NO (Galera cache is not retained). The user can set this option for all nodes, or selective nodes, based on disk usage.

gcache.recover in action

The example below demonstrates how to use this option:

  • Let’s say the user has a three node cluster (n1, n2, n3), with all in sync.
  • The user gracefully shutdown n2 and n3.
  • n1 is still up and running, and processes some workload, so now n1 has latest data.
  • n1 is eventually shutdown.
  • Now the user decides to restart the cluster. Obviously, the user needs to start n1 first, followed by n2/n3.
  • n1 boots up, forming an new cluster.
  • n2 boots up, joins the cluster, finds there are missing write-sets and demands IST but given that n1 doesn’t have a gcache, it falls back to SST.

n2 (JOINER node log):

2016-11-18 13:11:06 3277 [Note] WSREP: State transfer required: Group state: 839028c7-ad61-11e6-9055-fe766a1886c3:4680 Local state: 839028c7-ad61-11e6-9055-fe766a1886c3:3893

n1 (DONOR node log), gcache.recover=no:

2016-11-18 13:11:06 3245 [Note] WSREP: IST request: 839028c7-ad61-11e6-9055-fe766a1886c3:3893-4680|tcp://192.168.1.3:5031 2016-11-18 13:11:06 3245 [Note] WSREP: IST first seqno 3894 not found from cache, falling back to SST

Now let’s re-execute this scenario with gcache.recover=yes.

n2 (JOINER node log):

2016-11-18 13:24:38 4603 [Note] WSREP: State transfer required: Group state: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:1495 Local state: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:769 .... 2016-11-18 13:24:41 4603 [Note] WSREP: Receiving IST: 726 writesets, seqnos 769-1495 .... 2016-11-18 13:24:49 4603 [Note] WSREP: IST received: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:1495

n1 (DONOR node log):

2016-11-18 13:24:38 4573 [Note] WSREP: IST request: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:769-1495|tcp://192.168.1.3:5031 2016-11-18 13:24:38 4573 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

You can also validate this by checking the lowest write-set available in gcache on the DONOR node.

mysql> show status like 'wsrep_local_cached_downto'; +---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | wsrep_local_cached_downto | 1 | +---------------------------+-------+ 1 row in set (0.00 sec)

So as you can see, gcache.recover could restore the cache on restart and help service IST over SST. This is a major resource saver for most of those graceful shutdowns.

gcache revive doesn’t work if . . .

If gcache pages are involved. Gcache pages are still removed on shutdown, and the gcache write-set until that point also gets cleared.

Again let’s see and example:

  • Let’s assume the same configuration and workflow as mentioned above. We will just change the workload pattern.
  • n1, n2, n3 are in sync and an average-size workload is executed, such that the write-set fits in the gcache. (seqno=1-x)
  • n2 and n3 are shutdown.
  • n1 continues to operate and executes some average size workload followed by a huge transaction that results in the creation of a gcache page. (1-x-a-b-c-h) [h represent transaction seqno]
  • Now n1 is shutdown. During shutdown, gcache pages are purged (irrespective of the keep_page_sizes setting).
  • The purge ensures that all the write-sets that has seqno smaller than gcache-page-residing write-set are purged, too. This effectively means (1-h) everything is removed, including (a,b,c).
  • On restart, even though n1 can revive the gcache it can’t revive anything, as all the write-sets are purged.
  • When n2 boots up, it requests IST, but n1 can’t service the missing write-set (a,b,c,h). This causes SST to take place.
Summing it up

Needless to say, gcache.recover is a much needed feature, given it saves SST pain. (Thanks Codership.) It would be good to see if the feature can be optimized to work with gcache pages.

And yes, Percona XtraDB Cluster inherits this feature in its upcoming release.

Using the InnoDB Buffer Pool Pre-Load Feature in MySQL 5.7

November 30, 2016 - 1:16pm

In this blog post, I’ll discuss how to use the InnoDB buffer pool pre-load feature in MySQL 5.7

Starting MySQL 5.6, you can configure MySQL to save the contents of your InnoDB buffer pool and load it on startup. Starting in MySQL 5.7, this is the default behavior. Without any special effort, MySQL saves and restores a portion of buffer pool in the default configuration. We made a similar feature available in Percona Server 5.5 – so the concept has been around for quite a while.

Frankly, time has reduced the need for this feature. Five years ago, we would typically store databases on spinning disks. These disks often took quite a long time to warm up with normal database workloads, which could lead to many hours of poor performance after a restart. With the rise of SSDs, warm up happens faster and reduces the penalty from not having data in the buffer pool. Typically, a system reaches 90% of its fully warmed up performance in 10 minutes or less. But since it takes virtually no effort to use, saving the contents of the InnoDB buffer pool is a great feature to enable by default.

This blog post looks into some issues with this feature that might not be totally obvious from its name or documentation.

#1 

By default, MySQL only saves 25% of the most actively accessed pages (by the LRU) in the InnoDB buffer pool (not the whole buffer pool).

This is a reasonable choice for many use cases: it saves the most valuable pages, which can then be loaded faster than if you try to load every page in the buffer pool (many of which might not be relevant for continuing workload).

You can change this number by setting the innodb_buffer_pool_dump_pct variable. If you’re using InnoDB essentially as an in-memory database, and want to ensure all data is memory resident and can be accessed without a disk read, set it to 100.     

Note that this variable is based on the actual amount of data present in memory, not the buffer pool size, For example, if you have a 100GB buffer pool but it only contains 10GB of data, by default only 25% of 10GB (2.5GB) gets saved. (As the manual explains, it will not take nearly as much on disk as only the page identifiers are stored, not full page contents.)

#2

MySQL starts and becomes accessible through the network before the buffer pool load on startup is complete. Immediately before the start, a lot of resources fetch buffer pool contents from the disk as quickly as possible, possibly affecting performance. If you have multiple MySQL nodes – like using MySQL Replication or running Percona XtraDB Cluster – you might consider bringing them back for production traffic only after the buffer pool load operation completes. You can monitor the buffer pool load progress by watching the GLOBAL STATUS variable:

Buffer pool load is in progress:

| Innodb_buffer_pool_load_status          | Loaded 403457/419487 pages         |

Buffer pool load is complete:

| Innodb_buffer_pool_load_status          | Buffer pool(s) load completed at 161123  9:18:57 |

As a side note, it would be great if MySQL would provide a clearer concept of the “State” of the node: being UP versus being READY to serve the traffic in an optimal way are often not the same.

#3

InnoDB’s buffer pool preload is not very efficient, at least with fast storage. In my test environment, with a rather capable NVMe storage, I get a more than 400MB/sec warmup rate if I run read-only sysbench workload. The InnoDB’s buffer pool preload warmup rate is around 100MB/sec or so.  I would guess the problem is that it doesn’t drive as many parallel IO requests as SSD storage needs to perform optimally. I did not investigate it further.

#4

Innodb buffer pool save/restore only stores the buffer pool contents on a clear shutdown.  If the server crashes MySQL still does a buffer pool preload, but with the content information saved on last clean shutdown (stored in the ib_buffer_pool  file). This might end up wasting time loading data that is not relevant for the current workload. Periodically running the following ensures a fresh set of pages is available for a quick warmup, even if MySQL crashed:

SET GLOBAL innodb_buffer_pool_dump_now=ON;

This preserves the current list of buffer pool pages.

Note that while you (hopefully) do not see your MySQL crash that often, the same issue exists with backups, MySQL slave cloning with Percona XtraBackup, or LVM snapshot. This causes these operations to be less efficient.

I hope the observations in this blog help you put this feature to better use!

Percona XtraBackup 2.4.5 is now available

November 29, 2016 - 10:26am

Percona announces the GA release of Percona XtraBackup 2.4.5 on November 29th, 2016. You can download it from our download site and from apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

New features:
  • Percona XtraBackup now supports SHA256 passwords. Using the SHA256 algorithm requires either SSL encrypted connection, or using public key encryption for password exchange which is only available when both client and server are linked with OpenSSL.
  • Percona XtraBackup now supports Command Options for Secure Connections.
  • NOTE: Due to xbcrypt format changes, backups encrypted with this Percona XtraBackup version will not be recoverable by older versions.
Bugs fixed:
  • Percona XtraBackup would crash while preparing the backup, during the shutdown, when the master thread was performing a checkpoint and purge thread was expecting that all other threads completed or were idle. Bug fixed #1618555.
  • Safe slave backup algorithm performed too short delays between retries which could cause backups to fail on a busy server. Bug fixed #1624473.
  • Percona XtraBackup didn’t check the logblock checksums. Bug fixed #1633448.
  • Fixed new compilation warnings with GCC 6. Bug fixed #1641612.
  • xbcrypt was not setting the Initialization Vector (IV) correctly (and thus is was not using an IV). This was causing the same ciphertext to be generated across different runs (for the same message/same key). The IV provides the extra randomness to ensure that the same ciphertext is not generated across runs. Bug fixed #1643949.
  • target-dir was no longer relative to the current directory but to datadir instead. Bug fixed #1611568.
  • Backup would still succeed even if xtrabackup would fail to write the metadata. Bug fixed #1623210.
  • xbcloud now supports EMC ECS Swift API Authorization requests. Bugs fixed #1638017 and #1638020 (Txomin Barturen).
  • Some older versions of MySQL did not bother to initialize page type field for pages which are not index pages (see upstream #76262 for more information). Having this page type uninitialized could cause xtrabackup to crash on prepare. Bug fixed #1641426.
  • Percona XtraBackup would fail to backup MariaDB 10.2 with the unsupported server version error message. Bug fixed #1602842.

Other bugs fixed: #1639764, #1639767, #1641596, and #1641601.

Release notes with all the bugfixes for Percona XtraBackup 2.4.5 are available in our online documentation. Please report any bugs to the launchpad bug tracker.

Percona XtraBackup 2.3.6 is now available

November 29, 2016 - 9:55am

Percona announces the release of Percona XtraBackup 2.3.6 on November 29, 2016. Downloads are available from our download site or Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

This release is the current GA (Generally Available) stable release in the 2.3 series.

New Features
  • Percona XtraBackup now supports SHA256 passwords. Using the SHA256 algorithm requires either SSL encrypted connection, or using public key encryption for password exchange which is only available when both client and server are linked with OpenSSL.
  • Percona XtraBackup now supports Command Options for Secure Connections.
  • NOTE: Due to xbcrypt format changes, backups encrypted with this Percona XtraBackup version will not be recoverable by older versions.
Bugs Fixed:
  • Fixed intermittent assertion failures that were happening when Percona XtraBackup couldn’t correctly identify server version. Bug fixed #1568009.
  • Safe slave backup algorithm performed too short delays between retries which could cause backups to fail on a busy servers. Bug fixed #1624473.
  • Fixed new compilation warnings with GCC 6. Bug fixed #1641612.
  • xbcrypt was not setting the Initialization Vector (IV) correctly (and thus is was not using an IV). This was causing the same ciphertext to be generated across different runs (for the same message/same key). The IV provides the extra randomness to ensure that the same ciphertext is not generated across runs. Bug fixed #1643949.
  • Backup would still succeed even if XtraBackup would fail to write the metadata. Bug fixed #1623210.
  • xbcloud now supports EMC ECS Swift API Authorization requests. Bugs fixed #1638017 and #1638020 (Txomin Barturen).
  • Percona XtraBackup would fail to backup MariaDB 10.2 with the unsupported server version error message. Bug fixed #1602842.

Other bugs fixed: #1639764 and #1639767.

Release notes with all the bugfixes for Percona XtraBackup 2.3.6 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

Percona Server for MongoDB 3.0.14-1.9 is now available

November 28, 2016 - 10:18am

Percona announces the release of Percona Server for MongoDB 3.0.14-1.9 on November 28, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.0.14-1.9 is an enhanced, open source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.0 protocol and drivers. Based on MongoDB 3.0.14, it extends MongoDB with MongoRocks and PerconaFT storage engines, as well as features like external authentication and audit logging. Percona Server for MongoDB requires no changes to MongoDB applications or code.

NOTE: PerconaFT has been deprecated and will be removed in the future.

This release includes all changes from MongoDB 3.0.13 and MongoDB 3.0.14. We implemented no additional fixes or features.

You can find the release notes in the official documentation.

Percona Server 5.7.16-10 is now available

November 28, 2016 - 9:58am

Percona announces the GA release of Percona Server 5.7.16-10 on November 28, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.7.16, including all the bug fixes in it, Percona Server 5.7.16-10 is the current GA release in the Percona Server 5.7 series. Percona’s provides completely open-source and free software. Find release details in the 5.7.16-10 milestone at Launchpad.

Deprecated Features:
  • Metrics for scalability measurement feature is now deprecated. Users who have installed this plugin but are not using its capability are advised to uninstall the plugin due to known crashing bugs.
Bugs Fixed
  • When a stored routine would call an administrative command such as OPTIMIZE TABLE, ANALYZE TABLE, ALTER TABLE, CREATE/DROP INDEX, etc. the effective value of log_slow_sp_statements was overwritten by the value of log_slow_admin_statements. Bug fixed #719368.
  • Server wouldn’t start after crash with with innodb_force_recovery set to 6 if parallel doublewrite file existed. Bug fixed #1629879.
  • Thread Pool thread limit reached and failed to create thread messages are now printed on the first occurrence as well. Bug fixed #1636500.
  • INFORMATION_SCHEMA.TABLE_STATISTICS and INFORMATION_SCHEMA.INDEX_STATISTICS tables were not correctly updated for TokuDB. Bug fixed #1629448.

Other bugs fixed: #1633061, #1633430, and #1635184.

The release notes for Percona Server 5.7.16-10 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

Percona Server 5.6.34-79.1 is now available

November 28, 2016 - 9:21am

Percona announces the release of Percona Server 5.6.34-79.1 on November 23, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.6.34, including all the bug fixes in it, Percona Server 5.6.34-79.1 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.6.34-79.1 milestone on Launchpad.

Deprecated features:
  • Metrics for scalability measurement feature is now deprecated. Users who have installed this plugin but are not using its capability are advised to uninstall the plugin due to known crashing bugs.
Bugs fixed:
  • When a stored routine would call an administrative command such as OPTIMIZE TABLE, ANALYZE TABLE, ALTER TABLE, CREATE/DROP INDEX, etc. the effective value of log_slow_sp_statements was overwritten by the value of log_slow_admin_statements. Bug fixed #719368.
  • Thread Pool thread limit reached and failed to create thread messages are now printed on the first occurrence as well. Bug fixed #1636500.
  • INFORMATION_SCHEMA.TABLE_STATISTICS and INFORMATION_SCHEMA.INDEX_STATISTICS tables were not correctly updated for TokuDB. Bug fixed #1629448.

Other bugs fixed: #1633061, #1633430, and #1635184.

Release notes for Percona Server 5.6.34-79.1 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

Percona Server for MySQL 5.5.53-38.5 is now available

November 23, 2016 - 10:59am

Percona announces the release of Percona Server for MySQL 5.5.53-38.4 on November 18, 2016. Based on MySQL 5.5.53, including all the bug fixes in it, Percona Server for MySQL 5.5.53-38.5 is now the current stable release in the 5.5 series.

Percona Server for MySQL is open-source and free. You can find release details in the 5.5.53-38.5 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

New Features:
  • Metrics for scalability measurement feature is being built by default but deprecated. Users who have installed this plugin but are not using its capability are advised to uninstall the plugin due to known crashing bugs.

Find the release notes for Percona Server for MySQL 5.5.53-38.5 in our online documentation. Report bugs on the launchpad bug tracker.



General Inquiries

For general inquiries, please send us your question and someone will contact you.