Buy Percona SupportEmergency? Get 24/7 Help Now!
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 11 hours 19 min ago

Percona Live Europe featured talk with Anthony Yeh — Launching Vitess: How to run YouTube’s MySQL sharding engine

September 22, 2016 - 1:59pm

Welcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Anthony Yeh, Software Engineer, Google. His talk will be on Launching Vitess: How to run YouTube’s MySQL sharding engine. Vitess is YouTube’s solution for scaling MySQL horizontally through sharding, built as a general-purpose, open-source project. Now that Vitess 2.0 has reached general availability, they’re moving beyond “getting started” guides and working with users to develop and document best practices for launching Vitess in their own production environments.

I had a chance to speak with Anthony and learn a bit more about Vitess:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Anthony: Before joining YouTube as a software engineer, I worked on photonic integrated circuits as a graduate student researcher at U.C. Berkeley. So I guess you could say I took a rather circuitous path to the database field. My co-presenter Dan and I have that in common. If you see him at the conference, I recommend asking him about his story.

I don’t actually think of myself as being in database development though; that’s probably more Sugu‘s area. I treat Vitess as just another distributed system, and my job is to make it more automated, more reliable, and easier to administer. My favorite part of this job is when open-source contributors send us new features and plug-ins, and all I have to do is review them. Keep those pull requests coming!

Percona: Your talk is going to be on “Launching Vitess: How to run YouTube’s MySQL sharding engine.” How has Vitess moved from a YouTube fix to a viable enterprise data solution?

Anthony: I joined Vitess a little over two years ago, right when they decided to expand the team’s focus to include external usability as a key goal. The idea was to transform Vitess from a piece of YouTube infrastructure that happens to be open-source, into an open-source solution that YouTube happens to use.

At first, the biggest challenge was getting people to tell us what they needed to make Vitess work well in their environments. Attending Percona Live is a great way to keep a pulse on how the industry uses MySQL, and talk with exactly the people who can give us that feedback. Progress really picked up early this year when companies like Flipkart and Pixel Federation started not only trying out Vitess on their systems, but contributing back features, plug-ins, and connectors.

My half of the talk will summarize all the things we’ve learned from these early adopters about migrating to Vitess and running it in various environments. We also convinced one of our Site Reliability Engineers to give the second half of the talk, to share firsthand what it’s like to run Vitess in production.

Percona: What new features and fixes can people look forward to in the latest release?

Anthony: The biggest new feature in Vitess 2.0 is something that was codenamed “V3” (sorry about the naming confusion). In a nutshell, this completes the transition of all sharding logic from the app into Vitess: at first you had to give us a shard name, then you just had to tell us the sharding key value. Now you just send a regular query and we do the rest.

To make this possible, Vitess has to parse and analyze the query, for which it then builds a distributed execution plan. For queries served by a single shard, the plan collapses to a simple routing decision without extra processing. But for things like cross-shard joins, Vitess will generate new queries and combine results from multiple shards for you, in much the same way your app would otherwise do it.

Percona: Why is sharding beneficial to databases? Are there pros and cons to sharding?

Anthony: The main pro for sharding is horizontal scalability, the holy grail of distributed databases. It offers the promise of a magical knob that you simply turn up when you need more capacity. The biggest cons have usually been that it’s a lot of work to make your app handle sharding, and it multiplies the operational overhead as you add more and more database servers.

The goal of Vitess is to create a generalized solution to these problems, so we can all stop building one-off sharding layers within our apps, and replace a sea of management scripts with a holistic, self-healing distributed database.

Percona: Vitess is billed as being for web applications based in cloud and dedicated hardware infrastructures. Was it designed specifically for one or the other, and does it work better for certain environments?

Anthony: Vitess started out on dedicated YouTube hardware and later moved into Borg, which is Google’s internal precursor to Kubernetes. So we know from experience that it works in both types of environments. But like any distributed system, there are lots of benefits to running Vitess under some kind of cluster orchestration system. We provide sample configs to get you started on Kubernetes, but we would love to also have examples for other orchestration platforms like Mesos, Swarm, or Nomad, and we’d welcome contributions in this area.

Percona: What are you most looking forward to at Percona Live Data Performance Conference 2016?

Anthony: I hope to meet people who have ideas about how to make Vitess better, and I look forward to learning more about how others are solving similar problems.

You can read more about Anthony and Vitess on the Vitess blog.

Want to find out more about Anthony, Vitess, YouTube and and sharding? Register for Percona Live Europe 2016, and come see his talk Launching Vitess: How to run YouTube’s MySQL sharding engine.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

Percona XtraDB Cluster 5.5.41-25.11.1 is now available

September 22, 2016 - 10:56am

Percona announces the new release of Percona XtraDB Cluster 5.5.41-25.11.1 (rev. 855) on September 22, 2016. Binaries are available from the downloads area or our software repositories.

Bugs Fixed:
  • Due to security reasons ld_preload libraries can now only be loaded from the system directories (/usr/lib64, /usr/lib) and the MySQL installation base directory. This fix also addresses issue with where limiting didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Sixth Annual Percona Live Open Source Database Conference 2017 Call for Speakers Now Open

September 22, 2016 - 9:44am

The Call for Speakers for Percona Live Open Source Database Conference 2017 is open and accepting proposals through Oct. 31, 2016.

The Percona Live Open Source Database Conference 2017 is the premier event for the diverse and active open source community, as well as businesses that develop and use open source software. Topics for the event will focus on three key areas – MySQL, MongoDB and Open Source Databases – and the conference sessions will feature a range of in-depth discussions and hands-on tutorials.

The 2017 conference will feature four formal tracks – Developer, Operations, Business/Case Studies, and Wildcard – that will explore a variety of new and trending topics, including big data, IoT, analytics, security, scalability and performance, architecture and design, operations and management and development. Speaker proposals are welcome on these topics as well as on a variety of related technologies, including MySQL, MongoDB, Amazon Web Services (AWS), OpenStack, Redis, Docker and many more. The conference will also feature sponsored talks.

Percona Live Open Source Database Conference 2017 will take place April 24-27, 2017 at The Hyatt Regency Santa Clara and Santa Clara Convention Center. Sponsorship opportunities are still available, and Super Saver Registration Discounts can be purchased through Nov. 13, 2016 at 11:30 p.m. PST.

Click here to see all the submission criteria, and to submit your talk.

Sponsorships

Sponsorship opportunities for Percona Live Open Source Database Conference 2017 are available and offer the opportunity to interact with more than 1,000 DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event.

Planning to Attend?

Super Saver Registration Discounts for Percona Live Open Source Database Conference 2017 are available through Nov. 13, 2016 at 11:30 p.m. PST.

Visit the Percona Live Open Source Database Conference 2017 website for more information about the conference. Interested community members can also register to receive email updates about Percona Live Open Source Database Conference 2017.

Percona XtraDB Cluster 5.6.30-25.16.3 is now available

September 21, 2016 - 11:22am

Percona  announces the new release of Percona XtraDB Cluster 5.6 on September 21, 2016. Binaries are available from the downloads area or our software repositories.

Percona XtraDB Cluster 5.6.30-25.16.3 is now the current release, based on the following:

  • Percona Server 5.6.30-76.3
  • Galera Replication library 3.16
  • Codership wsrep API version 25
Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Percona Server 5.7.14-8 is now available

September 21, 2016 - 11:11am

Percona announces the GA release of Percona Server 5.7.14-8 on September 21, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.7.14, including all the bug fixes in it, Percona Server 5.7.14-8 is the current GA release in the Percona Server 5.7 series. Percona’s provides completely open-source and free software. Find release details in the 5.7.14-8 milestone at Launchpad.

Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

The release notes for Percona Server 5.7.14-8 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

Percona Server 5.6.32-78.1 is now available

September 21, 2016 - 11:04am

Percona announces the release of Percona Server 5.6.32-78.1 on September 21st, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Based on MySQL 5.6.32, including all the bug fixes in it, Percona Server 5.6.32-78.1 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release are available in the 5.6.32-78.1 milestone on Launchpad.

Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Release notes for Percona Server 5.6.32-78.1 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

Percona Server 5.5.51-38.2 is now available

September 21, 2016 - 10:58am

Percona announces the release of Percona Server 5.5.51-38.2 on September 21, 2016. Based on MySQL 5.5.51, including all the bug fixes in it, Percona Server 5.5.51-38.2 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. You can find release details of the release in the 5.5.51-38.2 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:
  • Limiting ld_preload libraries to be loaded from specific directories in mysqld_safe didn’t work correctly for relative paths. Bug fixed #1624247.
  • Fixed possible privilege escalation that could be used when running REPAIR TABLE on a MyISAM table. Bug fixed #1624397.
  • The general query log and slow query log cannot be written to files ending in .ini and .cnf anymore. Bug fixed #1624400.
  • Implemented restrictions on symlinked files (error_log, pid_file) that can’t be used with mysqld_safe. Bug fixed #1624449.

Other bugs fixed: #1553938.

Find the release notes for Percona Server 5.5.51-38.2 in our online documentation. Report bugs on the launchpad bug tracker.

Regular Expressions Tutorial

September 21, 2016 - 6:48am

This blog post highlights a video on how to use regular expressions.

It’s been a while since I did the MySQL QA and Bash Training Series. The 13 episodes were quite enjoyable to make, and a lot of people watched the video’s and provided great feedback.

In today’s new video, I’d like to briefly go over regular expressions. The session will cover the basics of regular expressions, and then some. I’ll follow up later with a more advanced regex session too.

Regular expressions are very versatile, and once you know how to use them – especially as a script developer or software coder – you will return to them again and again. Enjoy!

Presented by Roel Van de Paar. Full-screen viewing @ 720p resolution recommended

 

Webinar Thursday September 22 – Black Friday and Cyber Monday: How to Avoid an E-Commerce Disaster

September 21, 2016 - 6:11am

Join Percona’s Sr. Technical Operations Architect, Tim Vaillancourt on Thursday, September 22, at 10 am PDT (UTC-7) for the webinar Black Friday and Cyber Monday: How to Avoid an E-Commerce Disaster. This webinar will provide some best practices to ensure the performance of your system under high-traffic conditions.

Can your retail site handle the traffic deluge on the busiest shopping day of the year?

Black Friday and Cyber Monday is mere months away. Major retailers have already begun stress-testing their e-commerce sites to make sure they can handle the load. Failure to accommodate the onslaught of post-Thanksgiving shoppers might result in both embarrassing headlines and millions of dollars in lost revenue. Our advice to retailers: September stress tests are essential to a glitch-free Black Friday.

This webinar will cover:

  • Tips to avoid bottlenecks in data-driven apps
  • Techniques to allow an app to grow and shrink for large events/launches
  • Solutions to alleviate load on an app’s database
  • Developing and testing scalable apps
  • Deployment strategies to avoid downtime
  • Creating lighter, faster user facing requests

For more ideas on how to optimize your E-commerce database, read Tim’s blog post here.

Please register here.

Timothy Vaillancourt, Senior Technical Operations Architect

Tim joined Percona in 2016 as Sr. Technical Operations Architect for MongoDB with a goal to make the operations of MongoDB as smooth as possible. With experience operating infrastructures in industries such as government, online marketing/publishing, SaaS and gaming, combined with experience tuning systems from the hard disk all the way up to the end-user, Tim has spent time in nearly every area of the modern IT stack with many lessons learned.

Tim is based in Amsterdam, NL and enjoys traveling, coding and music. Before Percona Tim was the Lead MySQL DBA of Electronic Arts’ DICE studios, helping some of the largest games in the world (“Battlefield” series, “Mirrors Edge” series, “Star Wars: Battlefront”) launch and operate smoothly while also leading the automation of MongoDB deployments for EA systems. Before the role of DBA at EA’s DICE studio, Tim served as a subject matter expert in NoSQL databases, queues and search on the Online Operations team at EA SPORTS. Before moving to the gaming industry, Tim served as a Database/Systems Admin operating a large MySQL-based SaaS infrastructure at AbeBooks/Amazon Inc.

MongoDB point-in-time backups made easy

September 20, 2016 - 4:03pm

In this blog post we’ll look at MongoDB point-in-time backups, and work with them.

Mongodump is the base logical backup tool included with MongoDB. It takes a full BSON copy of database/collections, and optionally includes a log of changes during the backup used to make it consistent to a point in time. Mongorestore is the tool used to restore logical backups created by Mongodump. I’ll use these tools in the steps in this article to restore backed-up data. This article assumes a mongodump-based backup that was taken consistently with oplog changes (by using the command flag “–oplog”), and the backup is being restored to a MongoDB instance.

In this example, a mongodump backup is gathered and restored for the base collection data, and separately the oplogs/changes necessary to restore the data to a particular point-in-time are collected and applied to this data.

Note: Percona developed a backup tool named mongodb_consistent_backup, which is a wrapper for ‘mongodump’ with added cluster-wide backup consistency. The backups created by mongodb_consistent_backup (in Dump/Mongodump mode) can be restored using the same steps as a regular “mongodump” backup.

Stages Stage 1: Get a Mongodump Backup Mongodump Command Flags –host/–port (and –user/–password)

Required, even if you’re using the default host/port (localhost:27017).  If authorization is enabled, add –user/–password flags also.

–oplog

Required for any replset member! Causes “mongodump” to capture the oplog change log during the backup for consistent to one point in time.

–gzip

Optional. For mongodump >= 3.2, enables inline compression on the backup files.

Steps
  1. Get a mongodump backup via (pick one):
    • Running “mongodump” with the correct flags/options to take a backup (w/oplog) of the data: $ mongodump --host localhost --port 27017 --oplog --gzip 2016-08-15T12:32:28.930+0200 writing wikipedia.pages to 2016-08-15T12:32:31.932+0200 [#########...............] wikipedia.pages 674/1700 (39.6%) 2016-08-15T12:32:34.931+0200 [####################....] wikipedia.pages 1436/1700 (84.5%) 2016-08-15T12:32:37.509+0200 [########################] wikipedia.pages 2119/1700 (124.6%) 2016-08-15T12:32:37.510+0200 done dumping wikipedia.pages (2119 documents) 2016-08-15T12:32:37.521+0200 writing captured oplog to 2016-08-15T12:32:37.931+0200 [##......................] .oplog 44/492 (8.9%) 2016-08-15T12:32:39.648+0200 [########################] .oplog 504/492 (102.4%) 2016-08-15T12:32:39.648+0200 dumped 504 oplog entries
    • Use the latest daily automatic backup, if it exists.
Stage 2: Restore the Backup Data Steps
  1. Locate the shard PRIMARY member.
  2. Triple check you’re restoring the right backup to the right shard/host!
  3. Restore a mongodump-based backup to the PRIMARY node using the steps in this article: Restore a Mongodump Backup.
  4. Check for errors.
  5. Check that all SECONDARY members are in sync with the PRIMARY.
Stage 3: Get Oplogs for Point-In-Time-Recovery

In this stage, we will gather the changes needed to roll the data forward from the time of backup to the time/oplog-position to which we would like to restore.

In this example below, let’s pretend someone accidentally deleted an entire collection at oplog timestamp: “Timestamp(1470923942, 3)” and we want to fix it. If we decrement the Timestamp increment (2nd number) of “Timestamp(1470923942, 3)” we will have the last change before the accidental command, which in this case is: “Timestamp(1470923942, 2)“. Using the timestamp, we can capture and replay the oplogs from when the backup occurred to just before the issue/error.

A start and end timestamp are required to get the oplog data. In all cases, this will need to be gathered manually, case-by-case.

Helper Script #!/bin/bash # # This tool will dump out a BSON file of MongoDB oplog changes based on a range of Timestamp() objects. # The captured oplog changes can be applied to a host using 'mongorestore --oplogReplay --dir /path/to/dump'. set -e TS_START=$1 TS_END=$2 MONGODUMP_EXTRA=$3 function usage_exit() { echo "Usage $0: [Start-BSON-Timestamp] [End-BSON-Timestamp] [Extra-Mongodump-Flags (in quotes for multiple)]" exit 1 } function check_bson_timestamp() { local TS=$1 echo "$TS" | grep -qP "^Timestamp(d+,sd+)$" if [ $? -gt 0 ]; then echo "ERROR: Both timestamp fields must be in BSON Timestamp format, eg: 'Timestamp(########, #)'!" usage_exit fi } if [ -z "$TS_START" ] || [ -z "$TS_END" ]; then usage_exit else check_bson_timestamp "$TS_START" check_bson_timestamp "$TS_END" fi MONGODUMP_QUERY='{ "ts" : { "$gte" : '$TS_START' }, "ts" : { "$lte" : '$TS_END' } }' MONGODUMP_FLAGS='--db=local --collection=oplog.rs' [ ! -z "$MONGODUMP_EXTRA" ] && MONGODUMP_FLAGS="$MONGODUMP_FLAGS $MONGODUMP_EXTRA" if [ -d dump ]; then echo "'dump' subdirectory already exists! Exiting!" exit 1 fi echo "# Dumping oplogs from '$TS_START' to '$TS_END'..." mkdir dump mongodump $MONGODUMP_FLAGS --query "$MONGODUMP_QUERY" --out - >dump/oplog.bson if [ -f dump/oplog.bson ]; then echo "# Done!" else echo "ERROR: Cannot find oplog.bson file! Exiting!" exit 1 fi

 

Script Usage: $ ./dump_oplog_range.sh Usage ./dump_oplog_range.sh: [Start-BSON-Timestamp] [End-BSON-Timestamp] [Extra-Mongodump-Flags (in quotes for multiple)]

 

Steps
  1. Find the PRIMARY member that contains the oplogs needed for the PITR restore.
  2. Determine the “end” Timestamp() needed to restore to. This oplog time should be before the problem occurred.
  3. Determine the “start” Timestamp() from right before the backup was taken.
    1. This timestamp doesn’t need to be exact, so something like a Timestamp() object equal-to “a few min before the backup started” is fine, but the more accurate you are, the fewer changes you’ll need to re-apply (which saves on restore time).
  4. Use the MongoToolsAndSnippets script: “get_oplog_range.sh (above in “Helper Script”) to dump the oplog time-ranges you need to restore to your chosen point-in-time. In this example I am gathering the oplog between two point-in-times (also passing in –username/–password flags in quotes the 3rd parameter):
    1. The starting timestamp: the BSON timestamp from before the mongodump backup in “Stage 2: Restore Collection Data” was taken, in this example. “Timestamp(1470923918, 0)” is a time a few seconds before my mongodump was taken (does not need to be exact).
    2. The end timestamp: the end BSON Timestamp to restore to, in this example. “Timestamp(1470923942, 2)” is the last oplog-change BEFORE the problem occurred.

    Example:

    $ wget -q https://raw.githubusercontent.com/percona/MongoToolsAndSnippets/master/rdba/dump_oplog_range.sh $ bash ./dump_oplog_range.sh 'Timestamp(1470923918, 0)' 'Timestamp(1470923942, 2)' '--username=secret --password=secret --host=mongo01.example.com --port=27024' # Dumping oplogs from 'Timestamp(1470923918, 0)' to 'Timestamp(1470923942, 2)'... 2016-08-12T13:11:17.676+0200    writing local.oplog.rs to stdout 2016-08-12T13:11:18.120+0200    dumped 22 documents # Done!

    Note: all additional mongodump flags (optional 3rd field) must be in quotes!

  5. Double check it worked by looking for the ‘oplog.bson‘ file and checking that the file has some data in it (168mb in the below example):

    $ ls -alh dump/oplog.bson -rw-rw-r--. 1 tim tim 168M Aug 12 13:11 dump/oplog.bson

     

Stage 4: Apply Oplogs for Point in Time Recovery (PITR)

In this stage, we apply the time-range-based oplogs gathered in Stage 3 to the restored data set to bring it from the time of the backup to a particular point in time before a problem occurred.

Mongorestore Command Flags –host/–port (and –user/–password)

Required, even if you’re using the default host/port (localhost:27017).  If authorization is enabled, add –user/–password flags also.

–oplogReplay

Required. This is needed to replay the oplogs in this step.

–dir

Required. The path to the mongodump data.

Steps
  1. Copy the “dump” directory containing only the “oplog.bson”. file (captured in Stage 3) to the host that needs the oplog changes applied (the restore host).
  2. Run “mongorestore” on the “dump” directory to replay the oplogs into the instance. Make sure the “dump” dir contains only “oplog.bson”! $ mongorestore --host localhost --port 27017 --oplogReplay --dir ./dump 2016-08-12T13:12:28.105+0200    building a list of dbs and collections to restore from dump dir 2016-08-12T13:12:28.106+0200    replaying oplog 2016-08-12T13:12:31.109+0200    oplog   80.0 MB 2016-08-12T13:12:34.109+0200    oplog   143.8 MB 2016-08-12T13:12:35.501+0200    oplog   167.8 MB 2016-08-12T13:12:35.501+0200    done
  3. Validate the data was restored with the customer or using any means possible (examples: .count() queries, some random .find() queries, etc.).

Percona Live Europe featured talk with Marc Berhault — Inside CockroachDB’s Survivability Model

September 20, 2016 - 9:41am

Welcome to another Percona Live Europe featured talk with Percona Live Europe 2016: Amsterdam speakers! In this series of blogs, we’ll highlight some of the speakers that will be at this year’s conference. We’ll also discuss the technologies and outlooks of the speakers themselves. Make sure to read to the end to get a special Percona Live Europe registration bonus!

In this Percona Live Europe featured talk, we’ll meet Marc Berhault, Engineer at Cockroach Labs.His talk will be on Inside CockroachDB’s Survivability Model. This talk takes a deep dive into CockroachDB, a database whose “survive and thrive” model aims to bring the best aspects of Google’s next generation database, Spanner, to the rest of the world via open source.

I had a chance to speak with Marc and learn a bit more about these questions:

Percona: Give me a brief history of yourself: how you got into database development, where you work, what you love about it.

Marc: I started out as a Site Reliability Engineer managing Google’s storage infrastructure (GFS). Back in those days, keeping a cluster up and running mostly meant worrying about the masters.

I then switched to a developer role on Google’s next-generation storage system, which replaced the single write master with sharded metadata handlers. This increased the reliability of the entire system considerably, allowing for machine and network failures. SRE concerns gradually shifted away from machine reliability towards more interesting problems, such as multi-tenancy issues (quotas, provisioning, isolation) and larger scale failures.

After leaving Google, I found myself back in a world where one had to worry about a single machine all over again – at least when running your own infrastructure. I kept hearing the same story: a midsize company starts to grow out of its single-machine database and starts trimming the edges. This means moving tables to other hosts, shrinking schemas, etc., in order to avoid the dreaded “great sharding of the monolithic table,” often accompanied by its friends: cross-shard coordination layer and production complexity.

This was when I joined Cockroach Labs, a newly created startup with the goal of bringing a large-scale, transactional, strongly consistent database to the world at large. After contributing to various aspects of the projects, I switched my focus to production: adding monitoring, working on deployment, and of course rolling out our test clusters.

Percona: Your talk is called “Inside CockroachDB’s Survivability Model.” Define “survivability model”, and why it is important to database environments.

Marc: The survivability model in CockroachDB is centered around data redundancy. By default, all data is replicated three times (this is configurable) and is only considered written if a quorum exists. When a new node holding one of the copies of the data becomes unavailable, a node is picked and given a snapshot of the data.

This redundancy model has been widely used in distributed systems, but rarely with strongly consistent databases. CockroachDB’s approach provides strong consistency as well as transactions across the distributed data. We see this as a critical component of modern databases: allowing scalability while guaranteeing consistency.

Percona: What are the workloads and database environments that are best suited for a CockroachDB deployment? Do you see an expansion of the solution to encompass other scenarios?

Marc: CockroachDB is a beta product and is still in development. We expect to be out of beta by the end of 2016. Ideal workloads are those requiring strong consistency – those applications that manage critical data. However, strong consistency comes at a cost, usually directly proportional to latency between nodes and replication factor. This means that a widely distributed CockroachDB cluster (e.g., across multiple regions) will incur high write latencies, making it unsuitable for high-throughput operations, at least in the near term.

Percona: What is changing in the way businesses use databases that keeps you awake at night? How do you think CockroachDB is addressing those concerns?

Marc: In recent years, more and more businesses have been reaching the limits of what their single-machine databases can handle. This has forced many to implement their own transactional layers on top of disjoint databases, at the cost of longer development time and correctness.

CockroachDB attempts to find a solution to this problem by allowing a strongly consistent, transactional database to scale arbitrarily.

Percona: What are looking forward to the most at Percona Live Europe this year?

Marc: This will be my first time at a Percona Live conference, so I’m looking forward to hearing from other developers and learning what challenges other architects and DBAs are facing in their own work.

You can read more about Marc’s thoughts on CockroachDB at their blog.

Want to find out more about Marc, CoachroachDB and survivability? Register for Percona Live Europe 2016, and come see his talk Inside CockroachDB’s Survivability Model.

Use the code FeaturedTalk and receive €25 off the current registration price!

Percona Live Europe 2016: Amsterdam is the premier event for the diverse and active open source database community. The conferences have a technical focus with an emphasis on the core topics of MySQL, MongoDB, and other open source databases. Percona live tackles subjects such as analytics, architecture and design, security, operations, scalability and performance. It also provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs. This conference is an opportunity to network with peers and technology professionals by bringing together accomplished DBA’s, system architects and developers from around the world to share their knowledge and experience. All of these people help you learn how to tackle your open source database challenges in a whole new way.

This conference has something for everyone!

Percona Live Europe 2016: Amsterdam is October 3-5 at the Mövenpick Hotel Amsterdam City Centre.

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

Open Source Databases at Percona Live Europe, Amsterdam

September 19, 2016 - 3:18pm

In this blog post, I’ll review some of the open source database technologies discussions at Percona Live Europe.

I’ve already written about the exciting PostgreSQL and MongoDB content at Percona Live Europe in Amsterdam, and now I’m going to highlight some of our open source database content.  

In the last five years, the open source database community has been flourishing. There has been an explosion of creativity and innovation. The community has created many niche (and not so niche) solutions for various problems.

As a software engineer or architect, the number of available database options might excite you. You might also be intimidated about how to make the right technology choice. At Percona Live Europe, we have introductory talks for the relevant technologies that we find particularly interesting. These talks will help expand your knowledge about the available solutions and lessen intimidation at the same time.

I’m looking forward to the exciting technologies and talks that we’ll cover this year, such as:

For talks and tutorials on specific uses cases, check out the following sessions:

  • RocksDB is a very cool write optimized (LSM) storage engine, one of the few that has been in more than one database. In addition to the RocksDB-based systems inside Facebook, it can be used with MongoDB as MongoRocks and MySQL as MyRocks. It is also used inside next-generation database systems such as CockroachDB and TiDB. We have a lot of talks about RocksDB and related integrations, ranging from a MyRocks Tutorial by Yoshinori Matsunobu, to talk about MongoRocks by Igor Canadi, and a performance-focused talk by Mark Callaghan.
  • Elastic is the leading technology for open source full-text search implementations (hence previous name ElasticSearch) — but it is much more than that. ElasticSearch, Kibana, Logstash and Beats allow you to get data from a variety of data searches and analyze and visualize it. Philip Krenn will talk about full-text search in general in his Full-Text Search Explained talk, as well as talk in more details about ElasticSearch in ElasticSearch for SQL Users.
  • I am sure you’ve heard about Redis, the Swiss army knife of different data structures and operations. Redis covers many typical data tasks, from caching to maintaining counters and queues. Justin Starry will talk about Redis at Scale in Powering Million of live streams at Periscope, and Itamar Haber will talk about Extending Redis with Modules to make Redis an even more powerful data store.
  • Apache Spark is another technology you’ve surely heard about. Apache Spark adoption has skyrocketed in recent years due to its high-performance in-memory data analyses, replacing or supplementing Hadoop installations. We will hear about Badoo’s experience processing 11 billion events a day with Spark with Alexander Krasheninnikov, and also learn how to use Spark with MongoDB, MySQL and Redis with Tim Vaillancourt.
  • Apache Cassandra is a database focused on high availability and high performance, even when replicating among several data centers. When you think “eventual consistency,” perhaps Cassandra is the first technology that comes to mind. Cassandra allows you to do some impressive things, and Duy Hai Doan will show us some of them in his talk 7 things in Cassandra that you cannot find in RDBMS.
  • ClickHouse is a new guy on the block, but I’m very excited about this distributed column store system for high-performance analytics. Built by the Yandex team to power real-time analytics on the scale of trillions of database records, ClickHouse went open source earlier this year. Victor Tarnavsky will share more details in his talk.
  • Apache Ignite is another new but very exciting technology. Described as in-memory data fabric, it can be used for a variety of applications to supplement or replace relational databases — ranging from advanced data caching strategies to parallel in-memory processing of large quantities of data. Christos Erotocritou will talk about some of these use cases in his talk Turbocharge Your SQL Queries In-Memory with Apache Ignite.
  • RethinkDB is an interesting OpenSource NoSQL database built from the ground up for scalable real-time applications. The end-to-end real-time data streaming feature is really cool, and allows you build interactive real-time applications much easier. Ilya Verbitskiy will talk about RethinkDB in his Agile web-development with RethinkDB talk.
  • CockroachDB is a distributed database focused on survivability and high performance (borrowing some ideas from Google’s innovative Spanner database). Marc Berhault will talk database rocket science in his Inside CockroachDB’s Survivability Model.
  • TiDB is another open source NewSQL database, inspired by Google Spanner and F1. It can use a variety of storage engines for data store, and it supports MySQL wire protocol to ease application migration. Max Liu explains How TiDB was built in his talk.
  • ToroDB is a very interesting piece of technology. It is protocol-compatible with MongoDB, but stores data through a relational format in PostgreSQL. This can offer substantial space reduction and performance improvements for some workloads. Álvaro Hernández from 8Kdata will discuss this technology in his ToroDB: All your MongoDB data are belong to SQL talk.

As you can see we cover a wealth of exciting open source database technologies at Percona Live Europe. Do not miss a chance to expand your database horizons and learn about new developments in the industry. There is still time to register! Use the code PZBlog for a €30 discount off your registration price!

Amsterdam eWeek

Percona Live Europe 2016 is part of Amsterdam eWeek. Amsterdam eWeek provides a platform for national and international companies that focus on online marketing, media and technology and for business managers and entrepreneurs who use them, whether it comes to retail, healthcare, finance, game industry or media. Check it out!

Percona Server for MongoDB 3.2.9-2.1 is now available

September 19, 2016 - 9:17am

Percona announces the release of Percona Server for MongoDB 3.2.9-2.1 on September 19, 2016. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB 3.2.9-2.1 is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocks, Percona Memory Engine, and PerconaFT storage engine, as well as enterprise-grade features like external authentication and audit logging at no extra cost. Percona Server for MongoDB requires no changes to MongoDB applications or code.

Note:

We deprecated the PerconaFT storage engine. It will not be available in future releases.

This release is based on MongoDB 3.2.9. There are no additional improvements or new features on top of those upstream fixes.

The release notes are available in the official documentation.

 

Consul, ProxySQL and MySQL HA

September 16, 2016 - 8:20am

When it comes to “decision time” about which type of MySQL HA (high-availability) solution to implement, and how to architect the solution, many questions come to mind. The most important questions are:

  • “What are the best tools to provide HA and Load Balancing?”
  • “Should I be deploying this proxy tool on my application servers or on a standalone server?”.

Ultimately, the best tool really depends on the needs of your application and your environment. You might already be using specific tools such as Consul or MHA, or you might be looking to implement tools that provide richer features. The dilemma of deploying a proxy instance per application host versus a standalone proxy instance is usually a trade-off between “a less effective load balancing algorithm” or “a single point of failure.” Neither are desirable, but there are ways to implement a solution that balances all aspects.

In this article, we’ll go through a solution that is suitable for an application that has not been coded to split reads and writes over separate MySQL instances. An application like this would rely on a proxy or 3rd party tool to split reads/writes, and preferably a solution that has high-availability at the proxy layer. The solution described here is comprised of ProxySQLConsul and Master High Availability (MHA). Within this article, we’ll focus on the configuration required for ProxySQL and Consul since there are many articles that cover MHA configuration (such as Miguel’s recent MHA Quick Start Guide blog post).

When deploying Consul in production, a minimum of 3x instances are recommended – in this example, the Consul agents run on the Application Server (appserver) as well as on the two “ProxySQL servers” mysql1 and mysql2 (which act as the HA proxy pair). This is not a hard requirement, and these instances can easily run on another host or docker container. MySQL is deployed locally on mysql1 and mysql2, however this could just as well be 1..n separate standalone DB server instances:

So let’s move on to the actual configuration of this HA solution, starting with Consul.

Installation of Consul:

Firstly, we’ll need to install the required packages, download the Consul archive and perform the initial configuration. We’ll need to perform the same installation on each of the nodes (i.e., appserver, mysql1 and mysql2).

### Install pre-requisite packages: sudo yum -y install wget unzip bind-utils dnsmasq ### Install Consul: sudo useradd consul sudo mkdir -p /opt/consul /etc/consul.d sudo touch /var/log/consul.log /etc/consul.d/proxysql.json cd /opt/consul sudo wget https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_linux_amd64.zip sudo unzip consul_0.6.4_linux_amd64.zip sudo ln -s /opt/consul/consul /usr/bin/consul sudo chown consul:consul -R /etc/consul* /opt/consul* /var/log/consul.log

Configuration of Consul on Application Server (used as ‘bootstrap’ node):

Now, that we’re done with the installation on each of the hosts, let’s continue with the configuration. In this example we’ll bootstrap the Consul cluster using “appserver”:

### Edit configuration files $ sudo vi /etc/consul.conf { "datacenter": "dc1", "data_dir": "/opt/consul/", "log_level": "INFO", "node_name": "agent1", "server": true, "ui": true, "bootstrap": true, "client_addr": "0.0.0.0", "advertise_addr": "192.168.1.119" ## Add server IP here } ###### $ sudo vi /etc/consul.d/proxysql.json {"services": [ { "id": "proxy1", "name": "proxysql", "address": "192.168.1.120", "tags": ["mysql"], "port": 6033, "check": { "script": "mysqladmin ping --host=192.168.1.120 --port=6033 --user=root --password=123", "interval": "3s"} }, { "id": "proxy2", "name": "proxysql", "address": "192.168.1.121", "tags": ["mysql"], "port": 6033, "check": { "script": "mysqladmin ping --host=192.168.1.121 --port=6033 --user=root --password=123", "interval": "3s"} } ] } ###### ### Start Consul agent $ sudo su - consul -c 'consul agent -config-file=/etc/consul.conf -config-dir=/etc/consul.d > /var/log/consul.log &' ### Setup DNSMASQ (as root) echo "server=/consul/127.0.0.1#8600" > /etc/dnsmasq.d/10-consul service dnsmasq restart ### Remember to add the localhost as a DNS server (this step can vary ### depending on how your DNS servers are managed... here I'm just ### adding the following line to resolve.conf: sudo vi /etc/resolve.conf #... snippet ...# nameserver 127.0.0.1 #... snippet ...# ### Restart dnsmasq sudo service dnsmasq restart

The service should now be started, and you can verify this in the logs in “/var/log/consul.log”.

Configuration of Consul on Proxy Servers:

The next item is to configure each of the proxy Consul agents. Note that the “agent name” and the “IP address” need to be updated for each host (values for both must be unique):

### Edit configuration files $ sudo vi /etc/consul.conf { "datacenter": "dc1", "data_dir": "/opt/consul/", "log_level": "INFO", "node_name": "agent2", ### Agent node name must be unique "server": true, "ui": true, "bootstrap": false, ### Disable bootstrap on joiner nodes "client_addr": "0.0.0.0", "advertise_addr": "192.168.1.xxx", ### Set to local instance IP "dns_config": { "only_passing": true } } ###### $ sudo vi /etc/consul.d/proxysql.json {"services": [ { "id": "proxy1", "name": "proxysql", "address": "192.168.1.120", "tags": ["mysql"], "port": 6033, "check": { "script": "mysqladmin ping --host=192.168.1.120 --port=6033 --user=root --password=123", "interval": "3s"} }, { "id": "proxy2", "name": "proxysql", "address": "192.168.1.121", "tags": ["mysql"], "port": 6033, "check": { "script": "mysqladmin ping --host=192.168.1.121 --port=6033 --user=root password=123", "interval": "3s"} } ] } ###### ### Start Consul agent: $ sudo su - consul -c 'consul agent -config-file=/etc/consul.conf -config-dir=/etc/consul.d > /var/log/consul.log &' ### Join Consul cluster specifying 1st node IP e.g. $ consul join 192.168.1.119 ### Verify logs and look out for the following messages: $ cat /var/log/consul.log ==> Starting Consul agent... ==> Starting Consul agent RPC... ==> Consul agent running! Node name: 'agent2' Datacenter: 'dc1' Server: true (bootstrap: false) Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400) Cluster Addr: 192.168.1.120 (LAN: 8301, WAN: 8302) Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false Atlas: ==> Log data will now stream in as it occurs: # ... snippet ... 2016/09/05 19:48:04 [INFO] agent: Synced service 'consul' 2016/09/05 19:48:04 [INFO] agent: Synced check 'service:proxysql1' 2016/09/05 19:48:04 [INFO] agent: Synced check 'service:proxysql2' # ... snippet ...

At this point, we have Consul installed, configured and running on each of our hosts appserver (mysql1 and mysql2). Now it’s time to install and configure ProxySQL on mysql1 and mysql2.

Installation & Configuration of ProxySQL:

The same procedure should be run on both mysql1 and mysql2 hosts:

### Install ProxySQL packages and initialise ProxySQL DB sudo yum -y install https://github.com/sysown/proxysql/releases/download/v1.2.2/proxysql-1.2.2-1-centos7.x86_64.rpm sudo service proxysql initial sudo service proxysql stop ### Edit the ProxySQL configuration file to update username / password vi /etc/proxysql.cnf ### admin_variables= { admin_credentials="admin:admin" mysql_ifaces="127.0.0.1:6032;/tmp/proxysql_admin.sock" } ### ### Start ProxySQL sudo service proxysql start ### Connect to ProxySQL and configure mysql -P6032 -h127.0.0.1 -uadmin -padmin ### First we create a replication hostgroup: mysql> INSERT INTO mysql_replication_hostgroups VALUES (10,11,'Standard Replication Groups'); ### Add both nodes to the hostgroup 11 (ProxySQL will automatically put the writer node in hostgroup 10) mysql> INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.120',11,3306,1000); mysql> INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.121',11,3306,1000); ### Save server configuration mysql> LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; ### Add query rules for RW split mysql> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, cache_ttl, apply) VALUES (1, '^SELECT .* FOR UPDATE', 10, NULL, 1); mysql> INSERT INTO mysql_query_rules (active, match_pattern, destination_hostgroup, cache_ttl, apply) VALUES (1, '^SELECT .*', 11, NULL, 1); mysql> LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK; ### Finally configure ProxySQL user and save configuration mysql> INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema) VALUES ('root','123',1,10,'test'); mysql> LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK; mysql> EXIT;

MySQL Configuration:

We also need to perform one configuration step on the MySQL servers in order to create a user for ProxySQL to monitor the instances:

### ProxySQL's monitor user on the master MySQL server (default username and password is monitor/monitor) mysql -h192.168.1.120 -P3306 -uroot -p123 -e"GRANT USAGE ON *.* TO monitor@'%' IDENTIFIED BY 'monitor';"

We can view the configuration of the monitor user on the ProxySQL host by checking the global variables on the admin interface:

mysql> SHOW VARIABLES LIKE 'mysql-monitor%'; +----------------------------------------+---------+ | Variable_name | Value | +----------------------------------------+---------+ | mysql-monitor_enabled | true | | mysql-monitor_connect_timeout | 200 | | mysql-monitor_ping_max_failures | 3 | | mysql-monitor_ping_timeout | 100 | | mysql-monitor_replication_lag_interval | 10000 | | mysql-monitor_replication_lag_timeout | 1000 | | mysql-monitor_username | monitor | | mysql-monitor_password | monitor | | mysql-monitor_query_interval | 60000 | | mysql-monitor_query_timeout | 100 | | mysql-monitor_slave_lag_when_null | 60 | | mysql-monitor_writer_is_also_reader | true | | mysql-monitor_history | 600000 | | mysql-monitor_connect_interval | 60000 | | mysql-monitor_ping_interval | 10000 | | mysql-monitor_read_only_interval | 1500 | | mysql-monitor_read_only_timeout | 500 | +----------------------------------------+---------+

Testing Consul:

Now that Consul and ProxySQL are configured we can do some tests from the “appserver”. First, we’ll verify that the hosts we’ve added are both reporting [OK] on our DNS requests:

$ dig @127.0.0.1 -p 53 proxysql.service.consul ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @127.0.0.1 -p 53 proxysql.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9975 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;proxysql.service.consul. IN A ;; ANSWER SECTION: proxysql.service.consul. 0 IN A 192.168.1.121 proxysql.service.consul. 0 IN A 192.168.1.120 ;; Query time: 1 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Mon Sep 05 19:32:12 UTC 2016 ;; MSG SIZE rcvd: 158

As you can see from the output above, DNS is reporting both 192.168.120 and 192.168.1.121 as available for the ProxySQL service. As soon as the ProxySQL check fails, the nodes will no longer report in the output above.

We can also view the status of our cluster and agents through the Consul Web GUI which runs on port 8500 of all the Consul servers in this configuration (e.g. http://192.168.1.120:8500/):

Testing ProxySQL:

So now that we have this configured we can also do some basic tests to see that ProxySQL is load balancing our connections:

[percona@appserver consul.d]$ mysql -hproxysql.service.consul -e"select @@hostname" +--------------------+ | @@hostname | +--------------------+ | mysql1.localdomain | +--------------------+ [percona@appserver consul.d]$ mysql -hproxysql.service.consul -e"select @@hostname" +--------------------+ | @@hostname | +--------------------+ | mysql2.localdomain | +--------------------+

Perfect! We’re ready to use the hostname “proxysql.service.consul” to connect to our MySQL instances using a round-robin load balancing and HA proxy solution. If one of the two ProxySQL instances fails, we’ll continue communicating with the database through the other. Of course, this configuration is not limited to just two hosts, so feel free to add as many as you need. Be aware that in this example the two hosts’ replication hierarchy is managed by MHA in order to allow for master/slave promotion. By performing an automatic or manual failover using MHA, ProxySQL automatically detects the change in replication topology and redirect writes to the newly promoted master instance.

To make this configuration more durable, it is encouraged to create a more intelligent Consul check – i.e., a check that checks more than just the availability of the MySQL service (an example would be to select some data from a table). It is also recommended to fine tune the interval of the check to suit the requirements of your application.

ProxySQL and Percona XtraDB Cluster (Galera) Integration

September 15, 2016 - 3:37pm

In this post, we’ll discuss how an integrated ProxySQL and Percona XtraDB Cluster (Galera) helps manage node states and failovers.

ProxySQL is designed to not perform any specialized operation in relation to the servers with which it communicates. Instead, it uses an event scheduler to extend functionalities and cover any special needs.

Given that specialized products like Percona XtraDB Cluster are not managed by ProxySQL, they require the design and implementation of good/efficient extensions.

In this article, I will illustrate how Percona XtraDB Cluster/Galera can be integrated with ProxySQL to get the best from both.

Brief digression

Before discussing their integration, we need to review a couple of very important concepts in ProxySQL. ProxySQL has a very important logical component: Hostgroup(s) (HG).

A hostgroup is a relation of:

+-----------+       +------------------------+ |Host group +------>|Server (1:N)            | +-----------+       +------------------------+

In ProxySQL, QueryRules (QR) can be directly mapped to an HG. Using QRs, you can define a specific user to ONLY go to that HG. For instance, you may want to have user app1_user go only on servers A-B-C. Simply set a QR that says app1_user has the destination hostgroup 5, where HG 5 has the servers A-B-C:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',5,3306,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',5,3306,10); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',5,3306,10); INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',5,1);

Easy isn’t it?

Another important concept in ProxySQL also related to HG is ReplicationHostgroup(s) (RHG). This is a special HG that ProxySQL uses to automatically manage the nodes that are connected by replication and configured in Write/Read and Read_only mode.

What does this mean? Let’s say you have four nodes A-B-C-D, connected by standard asynchronous replication. A is the master and B-C-D are the slaves. What you want is to have your application pointing writes to server A, and reads to B-C (keeping D as a backup slave). Also, you don’t want to have any reads go to B-C if the replication delay is more than two seconds.

RHG, in conjunction with HG, ProxySQL can manage all this for you. Simply instruct the proxy to:

  1. Use RHG
  2. Define the value of the maximum latency

Using the example above:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.5',5,3306,10,2); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.6',5,3306,10,2); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.7',5,3306,10,2); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.8',10,3306,10,2); INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',5,1); INSERT INTO mysql_query_rules (username,destination_hostgroup,active) values('app1_user',6,1); INSERT INTO mysql_replication_hostgroups VALUES (5,6);

From now on ProxySQL will split the R/W using the RHG and the nodes defined in HG 5. The flexibility introduced by using HGs is obviously not limited to what I mention here. It will play a good part in the integration of Percona XtraDB Cluster and ProxySQL, as I illustrate below. Percona XtraDB Cluster/Galera Integration

In an XtraDB cluster, a node has many different states and conditions that affect if and how your application operates on the node.

The most common one is when a node become a DONOR. If you’ve ever installed Percona XtraDB Cluster (or any Galera implementation), you’ve faced the situation when a node become a DONOR it changes state to DESYNC. If the node is under a heavy load, the DONOR process might affect the node itself.

But that is just one of the possible node states:

  • A node can be JOINED but not synced
  • It can have wsrep_rejectqueries, wsrep_donorrejectqueries, wsrep_ready (off)
  • It can be in a different segment
  • The number of nodes per segment is relevant.

To show what can be done and how, we will use the following setup:

  • Five nodes
  • Two segments
  • Applications requiring R/W split

And two options:

  • Single writer node
  • Multiple writers node

We’ll analyze how the proxy behaves under the use of a script run by the ProxySQL scheduler.

The use of a script is necessary for ProxySQL to respond correctly to Percona XtraDB Cluster state modifications. ProxySQL comes with two scripts for Galera, both of them are too basic and don’t consider a lot of relevant conditions. I’ve written a more complete script: https://github.com/Tusamarco/proxy_sql_tools galera_check.pl

This script is a prototype and requires QA and debugging, but is still more powerful than the default ones.

The script is designed to manage X number of nodes that belong to a given HG. The script works by HG, and as such it will perform isolated actions/checks by the HG. It is not possible to have more than one check running on the same HG. The check will create a lock file {proxysql_galera_check_${hg}.pid} that will be used to prevent duplicates. galera_check will connect to the ProxySQL node and retrieve all the information regarding the nodes/proxysql configuration. It will then check in parallel each node and will retrieve the status and configuration. galera_check analyzes and manages the following node states:

  • read_only
  • wsrep_status
  • wsrep_rejectqueries
  • wsrep_donorrejectqueries
  • wsrep_connected
  • wsrep_desinccount
  • wsrep_ready
  • wsrep_provider
  • wsrep_segment
  • Number of nodes in by segment
  • Retry loop

As mentioned, the number of nodes inside a segment is relevant. If a node is the only one in a segment, the check behaves accordingly. For example, if a node is the only one in the MAIN segment, it will not put the node in OFFLINE_SOFT when the node becomes a donor, to prevent the cluster from becoming unavailable for applications.

The script allows you to declare a segment as MAIN — quite useful when managing production and DR sites, as the script manages the segment acting as main in a more conservative way. The check can be configured to perform retries after a given interval, where the interval is the time define in the ProxySQL scheduler. As such, if the check is set to have two retries for UP and three for DOWN, it will loop that number before doing anything.

Percona XtraDB Cluster/Galera performs some actions under the hood, some of them not totally correct. This feature is useful in some uncommon circumstances, where Galera behaves weirdly. For example, whenever a node is set to READ_ONLY=1, Galera desyncs and resyncs the node. A check that doesn’t take this into account sets the node to OFFLINE and back for no reason.

Another important differentiation for this check is that it use special HGs for maintenance, all in the range of 9000. So if a node belongs to HG 10, and the check needs to put it in maintenance mode, the node will be moved to HG 9010. Once all is normal again, the node will be put back on its original HG.

This check does NOT modify any node states. This means it will NOT modify any variables or settings in the original node. It will ONLY change node states in ProxySQL.

Multi-writer mode

The recommended way to use Galera is in multi-writer mode. You can then play with the weight to have a node act as MAIN node and prevent/reduce certification failures and Brutal force Abort from Percona XtraDB Cluster. Use this configuration:

Delete from mysql_replication_hostgroups where writer_hostgroup=500 ; delete from mysql_servers where hostgroup_id in (500,501); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',500,3306,1000000000); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',501,3306,100); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',500,3306,1000000); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',501,3306,1000000000); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',500,3306,100); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',501,3306,1000000000); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.8',500,3306,1); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.8',501,3306,1); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.9',500,3306,1); INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.9',501,3306,1); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL TO DISK;

In this test, we will NOT use Replication HostGroup. We will do that later when testing a single writer. For now, we’ll focus on multi-writer.

Segment 1 covers HG 500 and 501, while segment two only covers 501. Weight for the servers in HG 500 is progressive from 1 to 1 billion, in order to reduce the possible random writes on the non-main node.

As such nodes:

  • HG 500S1 192.168.1.5 – 1.000.000.000
    • S1 192.168.1.6 – 1.000.000
    • S1 192.168.1.7 – 100
    • S2 192.168.1.8 – 1
    • S2 192.168.1.9 – 1
  • HG 501S1 192.168.1.5 – 100
    • S1 192.168.1.6 – 1000000000
    • S1 192.168.1.7 – 1000000000
    • S2 192.168.1.8 – 1
    • S2 192.168.1.9 – 1

The following command shows what ProxySQL is doing:

watch -n 1 'mysql -h 127.0.0.1 -P 3310 -uadmin -padmin -t -e "select * from stats_mysql_connection_pool where hostgroup in (500,501,9500,9501) order by hostgroup,srv_host ;" -e " select hostgroup_id,hostname,status,weight,comment from mysql_servers where hostgroup_id in (500,501,9500,9501) order by hostgroup_id,hostname ;"'

Download the check from GitHub (https://github.com/Tusamarco/proxy_sql_tools) and activate it in ProxySQL. Be sure to set the parameters that match your installation:

delete from scheduler where id=10; INSERT INTO scheduler (id,active,interval_ms,filename,arg1) values (10,0,2000,"/var/lib/proxysql/galera_check.pl","-u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --debug=0 --log=/var/lib/proxysql/galeraLog"); LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;

If you want to activate it:

update scheduler set active=1 where id=10; LOAD SCHEDULER TO RUNTIME;

The following is the kind of scenario we have:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.9 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 413 | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 420 | | 500 | 192.168.1.7 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 227 | | 500 | 192.168.1.6 | 3306 | ONLINE | 0 | 10 | 10 | 0 | 12654 | 1016975 | 0 | 230 | | 500 | 192.168.1.5 | 3306 | ONLINE | 0 | 9 | 29 | 0 | 107358 | 8629123 | 0 | 206 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 4 | 6 | 0 | 12602425 | 613371057 | 34467286486 | 413 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 6 | 7 | 0 | 12582617 | 612422028 | 34409606321 | 420 | | 501 | 192.168.1.7 | 3306 | ONLINE | 0 | 6 | 6 | 0 | 18580675 | 905464967 | 50824195445 | 227 | | 501 | 192.168.1.6 | 3306 | ONLINE | 0 | 6 | 14 | 0 | 18571127 | 905075154 | 50814832276 | 230 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 10 | 0 | 169570 | 8255821 | 462706881 | 206 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

To generate a load, use the following commands (or whatever you like, but use a different one for read-only and reads/writes):

Write sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=stress_RW --mysql-password=test --mysql-db=test_galera --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=50000 --max-requests=0 --max-time=9000 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all run Read only sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=stress_RW --mysql-password=test --mysql-db=test_galera --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=50000 --max-requests=0 --max-time=9000 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10 --mysql-ignore-errors=all run

The most common thing that could happen to a cluster node is to become a donor. This is a planned activity for Percona XtraDB Cluster and is suppose to be managed in a less harmful way.

We’re going to simulate crashing a node and forcing it to elect our main node as DONOR (the one with the highest WEIGHT).

To do so, we need to have the parameter wsrep_sst_donor set.

show global variables like 'wsrep_sst_donor'; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | wsrep_sst_donor | node1 | <--- +-----------------+-------+

Activate the check if not already done:

update scheduler set active=1 where id=10;

And now run traffic. Check load:

select * from stats_mysql_connection_pool where hostgroup in (500,501,9500,9501) order by hostgroup,srv_host ; +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 10 | 0 | 30 | 0 | 112662 | 9055479 | 0 | 120 | <--- our Donor | 500 | 192.168.1.6 | 3306 | ONLINE | 0 | 10 | 10 | 0 | 12654 | 1016975 | 0 | 111 | | 500 | 192.168.1.7 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 115 | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 316 | | 500 | 192.168.1.9 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 329 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 10 | 0 | 257271 | 12533763 | 714473854 | 120 | | 501 | 192.168.1.6 | 3306 | ONLINE | 0 | 10 | 18 | 0 | 18881582 | 920200116 | 51688974309 | 111 | | 501 | 192.168.1.7 | 3306 | ONLINE | 3 | 6 | 9 | 0 | 18927077 | 922317772 | 51794504662 | 115 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 8 | 0 | 12595556 | 613054573 | 34447564440 | 316 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 3 | 6 | 0 | 12634435 | 614936148 | 34560620180 | 329 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Now on one of the nodes:

  1. Kill mysql
  2. Remove the content of the data directory
  3. Restart the node

The node will go in SST and our galera_check script will manage it:

+--------------+-------------+--------------+------------+--------------------------------------------------+ | hostgroup_id | hostname | status | weight | comment | +--------------+-------------+--------------+------------+--------------------------------------------------+ | 500 | 192.168.1.5 | OFFLINE_SOFT | 1000000000 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <---- the donor | 500 | 192.168.1.6 | ONLINE | 1000000 | | | 500 | 192.168.1.7 | ONLINE | 100 | | | 500 | 192.168.1.8 | ONLINE | 1 | | | 500 | 192.168.1.9 | ONLINE | 1 | | | 501 | 192.168.1.5 | OFFLINE_SOFT | 100 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | | 501 | 192.168.1.6 | ONLINE | 1000000000 | | | 501 | 192.168.1.7 | ONLINE | 1000000000 | | | 501 | 192.168.1.8 | ONLINE | 1 | | | 501 | 192.168.1.9 | ONLINE | 1 | | +--------------+-------------+--------------+------------+--------------------------------------------------+

We can also check the galera_check log and see what happened:

2016/09/02 16:13:27.298:[WARN] Move node:192.168.1.5;3306;500;3010 SQL: UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306' 2016/09/02 16:13:27.303:[WARN] Move node:192.168.1.5;3306;501;3010 SQL: UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

The node will remain in OFFLINE_SOFT while the other node (192.168.1.6 with the 2nd WEIGHT) serves the writes, until the node is in DONOR state.

All as expected, the node was set in OFFLINE_SOFT state, which mean the existing connections finished, while the node was not accepting any NEW connections.

As soon the node stops sending data to the Joiner, it was moved back and traffic restarted:

2016/09/02 16:14:58.239:[WARN] Move node:192.168.1.5;3306;500;1000 SQL: UPDATE mysql_servers SET status='ONLINE' WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306' 2016/09/02 16:14:58.243:[WARN] Move node:192.168.1.5;3306;501;1000 SQL: UPDATE mysql_servers SET status='ONLINE' WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 6 | 1 | 37 | 0 | 153882 | 12368557 | 0 | 72 | <--- | 500 | 192.168.1.6 | 3306 | ONLINE | 1 | 9 | 10 | 0 | 16008 | 1286492 | 0 | 42 | | 500 | 192.168.1.7 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 1398 | 112371 | 0 | 96 | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 24545 | 791 | 24545 | 122725 | 0 | 359 | | 500 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 15108 | 1214366 | 0 | 271 | | 501 | 192.168.1.5 | 3306 | ONLINE | 1 | 0 | 11 | 0 | 2626808 | 128001112 | 7561278884 | 72 | | 501 | 192.168.1.6 | 3306 | ONLINE | 5 | 7 | 20 | 0 | 28629516 | 1394974468 | 79289633420 | 42 | | 501 | 192.168.1.7 | 3306 | ONLINE | 2 | 8 | 10 | 0 | 29585925 | 1441400648 | 81976494740 | 96 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 16779 | 954 | 12672983 | 616826002 | 34622768228 | 359 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 4 | 6 | 0 | 13567512 | 660472589 | 37267991677 | 271 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

This was easy, and more or less managed by the standard script. But what would happen if my donor was set to DO NOT serve query when in the DONOR state?

Wait, what?? Yes, Percona XtraDB Cluster (and Galera in general) can be set to refuse any query when the node goes in DONOR state. If not managed this can cause issues as the node will simply reject queries (but ProxySQL sees the node as alive).

Let me show you:

show global variables like 'wsrep_sst_donor_rejects_queries'; +---------------------------------+-------+ | Variable_name | Value | +---------------------------------+-------+ | wsrep_sst_donor_rejects_queries | ON | +---------------------------------+-------+

For the moment, let’s deactivate the check. Then, do the same stop and delete of the data dir, then restart the node. SST takes place.

Sysbench will report:

ALERT: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'BEGIN' FATAL: failed to execute function `event': 3 ALERT: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'BEGIN' FATAL: failed to execute function `event': 3

But ProxySQL?

+-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 101 | 0 | 186331 | 14972717 | 0 | 118 | <-- no writes in wither HG | 500 | 192.168.1.6 | 3306 | ONLINE | 0 | 9 | 10 | 0 | 20514 | 1648665 | 0 | 171 | | | 500 | 192.168.1.7 | 3306 | ONLINE | 0 | 1 | 3 | 0 | 5881 | 472629 | 0 | 134 | | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 205451 | 1264 | 205451 | 1027255 | 0 | 341 | | | 500 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 2 | 0 | 15642 | 1257277 | 0 | 459 | - | 501 | 192.168.1.5 | 3306 | ONLINE | 1 | 0 | 13949 | 0 | 4903347 | 238627310 | 14089708430 | 118 | | 501 | 192.168.1.6 | 3306 | ONLINE | 2 | 10 | 20 | 0 | 37012174 | 1803380964 | 103269634626 | 171 | | 501 | 192.168.1.7 | 3306 | ONLINE | 2 | 11 | 13 | 0 | 38782923 | 1889507208 | 108288676435 | 134 | | 501 | 192.168.1.8 | 3306 | SHUNNED | 0 | 0 | 208452 | 1506 | 12864656 | 626156995 | 34622768228 | 341 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 3 | 6 | 0 | 14451462 | 703534884 | 39837663734 | 459 | +-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ mysql> select * from mysql_server_connect_log where hostname in ('192.168.1.5','192.168.1.6','192.168.1.7','192.168.1.8','192.168.1.9') order by time_start_us desc limit 10; +-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+ | hostname | port | time_start_us | connect_success_time_us | connect_error | +-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+ | 192.168.1.9 | 3306 | 1472827444621954 | 1359 | NULL | | 192.168.1.8 | 3306 | 1472827444618883 | 0 | Can't connect to MySQL server on '192.168.1.8' (107) | | 192.168.1.7 | 3306 | 1472827444615819 | 433 | NULL | | 192.168.1.6 | 3306 | 1472827444612722 | 538 | NULL | | 192.168.1.5 | 3306 | 1472827444606560 | 473 | NULL | <-- donor is seen as up | 192.168.1.9 | 3306 | 1472827384621463 | 1286 | NULL | | 192.168.1.8 | 3306 | 1472827384618442 | 0 | Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 107 | | 192.168.1.7 | 3306 | 1472827384615317 | 419 | NULL | | 192.168.1.6 | 3306 | 1472827384612241 | 415 | NULL | | 192.168.1.5 | 3306 | 1472827384606117 | 454 | NULL | <-- donor is seen as up +-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+ select * from mysql_server_ping_log where hostname in ('192.168.1.5','192.168.1.6','192.168.1.7','192.168.1.8','192.168.1.9') order by time_start_us desc limit 10; +-------------+------+------------------+----------------------+------------------------------------------------------+ | hostname | port | time_start_us | ping_success_time_us | ping_error | +-------------+------+------------------+----------------------+------------------------------------------------------+ | 192.168.1.9 | 3306 | 1472827475062217 | 311 | NULL | | 192.168.1.8 | 3306 | 1472827475060617 | 0 | Can't connect to MySQL server on '192.168.1.8' (107) | | 192.168.1.7 | 3306 | 1472827475059073 | 108 | NULL | | 192.168.1.6 | 3306 | 1472827475057281 | 102 | NULL | | 192.168.1.5 | 3306 | 1472827475054188 | 74 | NULL | <-- donor is seen as up | 192.168.1.9 | 3306 | 1472827445061877 | 491 | NULL | | 192.168.1.8 | 3306 | 1472827445060254 | 0 | Can't connect to MySQL server on '192.168.1.8' (107) | | 192.168.1.7 | 3306 | 1472827445058688 | 53 | NULL | | 192.168.1.6 | 3306 | 1472827445057124 | 131 | NULL | | 192.168.1.5 | 3306 | 1472827445054015 | 98 | NULL | <-- donor is seen as up +-------------+------+------------------+----------------------+------------------------------------------------------+

As you can see, all seems OK. Let’s turn on galera_check and see what happens when we run some read and write loads.

And now let me do the stop-delete-restart-SST process again:

kill -9 <mysqld_safe_pid> <mysqld_pid>; rm -fr data/*;rm -fr logs/*;sleep 2;./start

As soon as the node goes down, ProxySQL shuns the node.

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 7 | 3 | 34 | 0 | 21570 | 1733833 | 0 | 146 | | 500 | 192.168.1.6 | 3306 | ONLINE | 1 | 8 | 12 | 0 | 9294 | 747063 | 0 | 129 | | 500 | 192.168.1.7 | 3306 | ONLINE | 1 | 0 | 4 | 0 | 3396 | 272950 | 0 | 89 | | 500 | 192.168.1.8 | 3306 | SHUNNED | 0 | 0 | 1 | 6 | 12 | 966 | 0 | 326 | <-- crashed | 500 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 2 | 0 | 246 | 19767 | 0 | 286 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 2 | 0 | 772203 | 37617973 | 2315131214 | 146 | | 501 | 192.168.1.6 | 3306 | ONLINE | 9 | 3 | 12 | 0 | 3439458 | 167514166 | 10138636314 | 129 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1 | 12 | 13 | 0 | 3183822 | 155064971 | 9394612877 | 89 | | 501 | 192.168.1.8 | 3306 | SHUNNED | 0 | 0 | 1 | 6 | 11429 | 560352 | 35350726 | 326 | <-- crashed | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 312253 | 15227786 | 941110520 | 286 | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Immediately after, galera_check identifies the node is requesting the SST, and that the DONOR is our writer (given it is NOT the only writer in the HG, and it has the variable wsrep_sst_donor_rejects_queries active), it cannot be set to OFFLINE_SOFT. We do not want ProxySQL to consider it OFFLINE_HARD (because it is not).

As such, the script moves it to a special HG:

2016/09/04 16:11:22.091:[WARN] Move node:192.168.1.5;3306;500;3001 SQL: UPDATE mysql_servers SET hostgroup_id=9500 WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306' 2016/09/04 16:11:22.097:[WARN] Move node:192.168.1.5;3306;501;3001 SQL: UPDATE mysql_servers SET hostgroup_id=9501 WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

+--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+ | hostgroup_id | hostname | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment | +--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+ | 500 | 192.168.1.6 | 3306 | ONLINE | 1000000 | 0 | 1000 | 0 | 0 | 0 | | | 500 | 192.168.1.7 | 3306 | ONLINE | 100 | 0 | 1000 | 0 | 0 | 0 | | | 500 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | | 500 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | | 501 | 192.168.1.6 | 3306 | ONLINE | 1000000000 | 0 | 1000 | 0 | 0 | 0 | | | 501 | 192.168.1.7 | 3306 | ONLINE | 1000000000 | 0 | 1000 | 0 | 0 | 0 | | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1000 | 0 | 0 | 0 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | | 9500 | 192.168.1.5 | 3306 | ONLINE | 1000000000 | 0 | 1000 | 0 | 0 | 0 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <-- Special HG | 9501 | 192.168.1.5 | 3306 | ONLINE | 100 | 0 | 1000 | 0 | 0 | 0 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <-- Special HG +--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+

The Donor continues to serve the Joiner, but applications won’t see it.

What applications see is also very important. Applications doing WRITEs will see:

[ 10s] threads: 10, tps: 9.50, reads: 94.50, writes: 42.00, response time: 1175.77ms (95%), errors: 0.00, reconnects: 0.00 ... [ 40s] threads: 10, tps: 2.80, reads: 26.10, writes: 11.60, response time: 3491.45ms (95%), errors: 0.00, reconnects: 0.10 [ 50s] threads: 10, tps: 4.80, reads: 50.40, writes: 22.40, response time: 10062.13ms (95%), errors: 0.80, reconnects: 351.60 <--- Main writer moved to another HG [ 60s] threads: 10, tps: 5.90, reads: 53.10, writes: 23.60, response time: 2869.82ms (95%), errors: 0.00, reconnects: 0.00 ...

When one node shifts to another, the applications will have to manage the RE-TRY, but it will only be a short time and will cause limited impact on the production flow.

Application readers see no errors:

[ 10s] threads: 10, tps: 0.00, reads: 13007.31, writes: 0.00, response time: 9.13ms (95%), errors: 0.00, reconnects: 0.00 [ 50s] threads: 10, tps: 0.00, reads: 9613.70, writes: 0.00, response time: 10.66ms (95%), errors: 0.00, reconnects: 0.20 <-- just a glitch in reconnect [ 60s] threads: 10, tps: 0.00, reads: 10807.90, writes: 0.00, response time: 11.07ms (95%), errors: 0.00, reconnects: 0.20 [ 70s] threads: 10, tps: 0.00, reads: 9082.61, writes: 0.00, response time: 23.62ms (95%), errors: 0.00, reconnects: 0.00 ... [ 390s] threads: 10, tps: 0.00, reads: 13050.80, writes: 0.00, response time: 8.97ms (95%), errors: 0.00, reconnects: 0.00

When the Donor ends providing SST, it comes back and the script manages it. Then galera_check puts it in the right HG:

2016/09/04 16:12:34.266:[WARN] Move node:192.168.1.5;3306;9500;1010 SQL: UPDATE mysql_servers SET hostgroup_id=500 WHERE hostgroup_id=9500 AND hostname='192.168.1.5' AND port='3306' 2016/09/04 16:12:34.270:[WARN] Move node:192.168.1.5;3306;9501;1010 SQL: UPDATE mysql_servers SET hostgroup_id=501 WHERE hostgroup_id=9501 AND hostname='192.168.1.5' AND port='3306'

The crashed node is restarted by the SST process, and the node goes up. But if the level of load in the cluster is mid/high, it will remain in the JOINED state for sometime, becoming visible by the ProxySQL again. ProxySQL will not, however, correctly recognize the state.

2016-09-04 16:17:15 21035 [Note] WSREP: 3.2 (node4): State transfer from 1.1 (node1) complete. 2016-09-04 16:17:15 21035 [Note] WSREP: Shifting JOINER -> JOINED (TO: 254515)

To avoid this issue, the script will move it to a special HG, allowing it to recovery without interfering with a real load.

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 6 | 2 | 15 | 0 | 3000 | 241060 | 0 | 141 | | 500 | 192.168.1.6 | 3306 | ONLINE | 1 | 9 | 13 | 0 | 13128 | 1055268 | 0 | 84 | | 500 | 192.168.1.7 | 3306 | ONLINE | 1 | 0 | 4 | 0 | 3756 | 301874 | 0 | 106 | | 500 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 2 | 0 | 4080 | 327872 | 0 | 278 | | 501 | 192.168.1.5 | 3306 | ONLINE | 1 | 0 | 2 | 0 | 256753 | 12508935 | 772048259 | 141 | | 501 | 192.168.1.6 | 3306 | ONLINE | 4 | 8 | 12 | 0 | 5116844 | 249191524 | 15100617833 | 84 | | 501 | 192.168.1.7 | 3306 | ONLINE | 2 | 11 | 13 | 0 | 4739756 | 230863200 | 13997231724 | 106 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 496524 | 24214563 | 1496482104 | 278 | | 9500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 331 |<-- Joined not Sync | 9501 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 331 |<-- Joined not Sync +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Once the node fully recovers, galera_check puts it back in the original HG, ready serve requests:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 15 | 0 | 3444 | 276758 | 0 | 130 | | 500 | 192.168.1.6 | 3306 | ONLINE | 0 | 9 | 13 | 0 | 13200 | 1061056 | 0 | 158 | | 500 | 192.168.1.7 | 3306 | ONLINE | 0 | 0 | 4 | 0 | 3828 | 307662 | 0 | 139 | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |<-- up again | 500 | 192.168.1.9 | 3306 | ONLINE | 0 | 0 | 2 | 0 | 4086 | 328355 | 0 | 336 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 2 | 0 | 286349 | 13951366 | 861638962 | 130 | | 501 | 192.168.1.6 | 3306 | ONLINE | 0 | 12 | 12 | 0 | 5239212 | 255148806 | 15460951262 | 158 | | 501 | 192.168.1.7 | 3306 | ONLINE | 0 | 13 | 13 | 0 | 4849970 | 236234446 | 14323937975 | 139 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |<-- up again | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 507910 | 24768898 | 1530841172 | 336 | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

A summary of the logical steps is:

+---------+ | Crash | +----+----+ | v +--------+-------+ | ProxySQL | | shun crashed | | node | +--------+-------+ | | v +-----------------+-----------------+ | Donor has one of the following? | | wsrep_sst_dono _rejects_queries | | OR | | wsrep_reject_queries | +-----------------------------------+ |No |Yes v v +-----+----------+ +-----------+----+ | Galera_check | | Galera_check | | put the donor | | put the donor | | in OFFLINE_SOFT| | in special HG | +---+------------+ +-----------+----+ | | | | v v +---+--------------------------------+-----+ | Donor SST ends | +---+---------------+----------------+-----+ | | | | | | +---+------------+ | +-----------+----+ | Galera_check | | | Galera_check | | put the donor | | | put the donor | | ONLINE | | | in Original HG | +----------------+ | +----------------+ | | +------------------------------------------+ | crashed SST ends | +-------------------+----------------------+ | | +------------+-------------+ | Crashed node back but +<------------+ | Not Sync? | | +--------------------------+ | |No |Yes | | | | | | | +---------+------+ +------+---------+ | | Galera_check | | Galera_check | | | put the node | | put the node +-----+ | back orig. HG | | Special HG | +--------+-------+ +----------------+ | | | | +---------+ +------> END | +---------+

As mentioned, galera_check can manage several node states.

Another case is when we can’t have the node accept ANY queries. We might need that for several reasons, including preparing the node for maintenance (or whatever).

In Percona XtraDB Cluster (and other Galera implementations) we can set the value of wsrep_reject_queries to:

  • NONE
  • ALL
  • ALL_KILL

Let see how it works. Run some load, then on the main writer node (192.168.1.5):

set global wsrep_reject_queries=ALL;

This blocks any new query execution until the run is complete. Do a simple select on the node:

(root@localhost:pm) [test]>select * from tbtest1; ERROR 1047 (08S01): WSREP has not yet prepared node for application use

ProxySQL won’t see these conditions:

+-------------+------+------------------+----------------------+------------+ | hostname | port | time_start_us | ping_success_time_us | ping_error | +-------------+------+------------------+----------------------+------------+ | 192.168.1.5 | 3306 | 1473005467628001 | 35 | NULL | <--- ping ok | 192.168.1.5 | 3306 | 1473005437628014 | 154 | NULL | +-------------+------+------------------+----------------------+------------+ +-------------+------+------------------+-------------------------+---------------+ | hostname | port | time_start_us | connect_success_time_us | connect_error | +-------------+------+------------------+-------------------------+---------------+ | 192.168.1.5 | 3306 | 1473005467369575 | 246 | NULL | <--- connect ok | 192.168.1.5 | 3306 | 1473005407369441 | 353 | NULL | +-------------+------+------------------+-------------------------+---------------+

The script galera_check will instead manage it:

+-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | OFFLINE_SOFT | 0 | 0 | 8343 | 0 | 10821 | 240870 | 0 | 93 | <--- galera check put it OFFLINE | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 15 | 0 | 48012 | 3859402 | 0 | 38 | <--- writer | 500 | 192.168.1.7 | 3306 | ONLINE | 0 | 1 | 6 | 0 | 14712 | 1182364 | 0 | 54 | | 500 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 2 | 0 | 1092 | 87758 | 0 | 602 | | 500 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 4 | 0 | 5352 | 430152 | 0 | 238 | | 501 | 192.168.1.5 | 3306 | OFFLINE_SOFT | 0 | 0 | 1410 | 0 | 197909 | 9638665 | 597013919 | 93 | | 501 | 192.168.1.6 | 3306 | ONLINE | 2 | 10 | 12 | 0 | 7822682 | 380980455 | 23208091727 | 38 | | 501 | 192.168.1.7 | 3306 | ONLINE | 0 | 13 | 13 | 0 | 7267507 | 353962618 | 21577881545 | 54 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 241641 | 11779770 | 738145270 | 602 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 756415 | 36880233 | 2290165636 | 238 | +-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

In this case, the script will put the node in OFFLINE_SOFT, given the set global wsrep_reject_queries=ALL means do not accept NEW and complete the existing as OFFLINE_SOFT.

The script also manages the case of set global wsrep_reject_queries=ALL_KILL;. From the ProxySQL point of view, these do not exist either:

+-------------+------+------------------+----------------------+------------+ | hostname | port | time_start_us | ping_success_time_us | ping_error | +-------------+------+------------------+----------------------+------------+ | 192.168.1.5 | 3306 | 1473005827629069 | 59 | NULL |<--- ping ok | 192.168.1.5 | 3306 | 1473005797628988 | 57 | NULL | +-------------+------+------------------+----------------------+------------+ +-------------+------+------------------+-------------------------+---------------+ | hostname | port | time_start_us | connect_success_time_us | connect_error | +-------------+------+------------------+-------------------------+---------------+ | 192.168.1.5 | 3306 | 1473005827370084 | 370 | NULL | <--- connect ok | 192.168.1.5 | 3306 | 1473005767369915 | 243 | NULL | +-------------+------+------------------+-------------------------+---------------+ +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 9500 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |<--- galera check put it in special HG | 9501 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

The difference here is that the script moves the node to the special HG to isolate it, instead leaving it in the original HG.

The integration between ProxySQL and Percona XtraDB Custer (Galera) works perfectly for multi-writer if you have a script like galera_check that correctly manages the different Percona XtraDB Custer/Galera states.

ProxySQL and PXC using Replication HostGroup

Sometimes we might need to have 100% of the write going to only one node at a time. As explained above, ProxySQL uses weight to redirect a % of the load to a specific node.

In most cases, it will be enough to set the weight in the main writer to a very high value (like 10 billion) and one thousand on the next node to almost achieve a single writer.

But this is not 100% effective, it still allows ProxySQL to send a query once every X times to the other node(s). To keep it consistent with the ProxySQL logic, the solution is to use replication Hostgroups.

Replication HGs are special HGs that the proxy sees as connected for R/W operations. ProxySQL analyzes the value of the READ_ONLY variables and assigns to the READ_ONLY HG the nodes that have it enabled.

The node having READ_ONLY=0 resides in both HGs. As such the first thing we need to modify is to tell ProxySQL that HG 500 and 501 are replication HGs.

INSERT INTO mysql_replication_hostgroups VALUES (500,501,''); LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; select * from mysql_replication_hostgroups ; +------------------+------------------+---------+ | writer_hostgroup | reader_hostgroup | comment | +------------------+------------------+---------+ | 500 | 501 | | +------------------+------------------+---------+

Now whenever I set the value of READ_ONLY on a node, ProxySQL will move the node accordingly. Let see how. Current:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 6 | 1 | 7 | 0 | 16386 | 1317177 | 0 | 97 | | 500 | 192.168.1.6 | 3306 | ONLINE | 1 | 9 | 15 | 0 | 73764 | 5929366 | 0 | 181 | | 500 | 192.168.1.7 | 3306 | ONLINE | 1 | 0 | 6 | 0 | 18012 | 1447598 | 0 | 64 | | 500 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 2 | 0 | 1440 | 115728 | 0 | 341 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 1210029 | 58927817 | 3706882671 | 97 | | 501 | 192.168.1.6 | 3306 | ONLINE | 1 | 11 | 12 | 0 | 16390790 | 798382865 | 49037691590 | 181 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1 | 12 | 13 | 0 | 15357779 | 748038558 | 45950863867 | 64 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 1247662 | 60752227 | 3808131279 | 341 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 1766309 | 86046839 | 5374169120 | 422 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Set global READ_ONLY=1 on the following nodes: 192.168.1.6/7/8/9.

After:

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 10 | 0 | 20 | 0 | 25980 | 2088346 | 0 | 93 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 1787979 | 87010074 | 5473781192 | 93 | | 501 | 192.168.1.6 | 3306 | ONLINE | 4 | 8 | 12 | 0 | 18815907 | 916547402 | 56379724890 | 79 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1 | 12 | 13 | 0 | 17580636 | 856336023 | 52670114510 | 131 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 15324 | 746109 | 46760779 | 822 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 16210 | 789999 | 49940867 | 679 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

IF in this scenario a reader node crashes, the application will not suffer at all given the redundancy.

But if the writer is going to crash, THEN the issue exists because there will be NO node available to manage the failover. The solution is to either do the node election manually or to have the script elect the node with the lowest read weight in the same segment as the new writer.

Below is what happens when a node crashes (bird-eye view):

+---------+ | Crash | +----+----+ | v +--------+-------+ | ProxySQL | | shun crashed | | node | +--------+-------+ | | v +-----------------+-----------------+ +-----------> HostGroup has another active | | | Node in HG writer? | | +--+--------------+---------------+-+ | | | | | | | | | |No | |Yes | | | | | +-----v----------+ | +-----------v----+ | |ProxySQL will | | |ProxySQL will | | |stop serving | | |redirect load >--------+ | |writes | | |there | | | +----------------+ | +----------------+ | | | | | v | | +-------+--------+ | | |ProxySQL checks | | | |READ_ONLY on | | | |Reader HG | | | | | | | +-------+--------+ | | | | | v | | +-------+--------+ | | |Any Node with | | | |READ_ONLY = 0 ? | | | +----------------+ | | |No |Yes | | | | | | +----------v------+ +--v--------------+ | | |ProxySQL will | |ProxySQL will | | | |continue to | |Move node to | | +<---------<+do not serve | |Writer HG | | | |Writes | | | | | +-----------------+ +--------v--------+ | | | | +-------------------------------------------+ | +---------+ | | END <------------------------+ +---------+

The script should act immediately after the ProxySQL SHUNNED the node step, just replacing the READ_ONLY=1 with READ_ONLY=0 on the reader node with the lowest READ WEIGHT.

ProxySQL will do the rest, copying the node into the WRITER HG, keeping low weight, such that WHEN/IF the original node will comeback the new node will not compete for traffic.

Since it included that special function in the check, the feature allows automatic fail-over. This experimental feature is active only if explicitly set in the parameter that the scheduler passes to the script. To activate it add --active_failover in the scheduler. My recommendation is to have two entries in the scheduler and activate the one with --active_failover for testing, and remember to deactivate the other one.

Let see the manual procedure first:

The process is:

1 Generate some load
2 Kill the writer node
3 Manually elect a reader as writer
4 Recover crashed node

Current load:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 30324 | 2437438 | 0 | 153 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 1519612 | 74006447 | 4734427711 | 153 | | 501 | 192.168.1.6 | 3306 | ONLINE | 4 | 8 | 12 | 0 | 7730857 | 376505014 | 24119645457 | 156 | | 501 | 192.168.1.7 | 3306 | ONLINE | 2 | 10 | 12 | 0 | 7038332 | 342888697 | 21985442619 | 178 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 612523 | 29835858 | 1903693835 | 337 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 611021 | 29769497 | 1903180139 | 366 | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Kill the main node 192.168.1.5:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 501 | 192.168.1.5 | 3306 | SHUNNED | 0 | 0 | 1 | 11 | 1565987 | 76267703 | 4879938857 | 119 | | 501 | 192.168.1.6 | 3306 | ONLINE | 1 | 11 | 12 | 0 | 8023216 | 390742215 | 25033271548 | 112 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1 | 11 | 12 | 0 | 7306838 | 355968373 | 22827016386 | 135 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 638326 | 31096065 | 1984732176 | 410 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 636857 | 31025014 | 1982213114 | 328 | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ +-------------+------+------------------+----------------------+------------------------------------------------------+ | hostname | port | time_start_us | ping_success_time_us | ping_error | +-------------+------+------------------+----------------------+------------------------------------------------------+ | 192.168.1.5 | 3306 | 1473070640798571 | 0 | Can't connect to MySQL server on '192.168.1.5' (107) | | 192.168.1.5 | 3306 | 1473070610798464 | 0 | Can't connect to MySQL server on '192.168.1.5' (107) | +-------------+------+------------------+----------------------+------------------------------------------------------+ +-------------+------+------------------+-------------------------+------------------------------------------------------+ | hostname | port | time_start_us | connect_success_time_us | connect_error | +-------------+------+------------------+-------------------------+------------------------------------------------------+ | 192.168.1.5 | 3306 | 1473070640779903 | 0 | Can't connect to MySQL server on '192.168.1.5' (107) | | 192.168.1.5 | 3306 | 1473070580779977 | 0 | Can't connect to MySQL server on '192.168.1.5' (107) | +-------------+------+------------------+-------------------------+------------------------------------------------------+

When the node is killed ProxySQL, shun it and report issues with the checks (connect and ping). During this time frame the application will experience issues if it is not designed to manage the retry and eventually a queue, and it will crash.

Sysbench reports the errors:

Writes

[ 10s] threads: 10, tps: 6.70, reads: 68.50, writes: 30.00, response time: 1950.53ms (95%), errors: 0.00, reconnects: 0.00 ... [1090s] threads: 10, tps: 4.10, reads: 36.90, writes: 16.40, response time: 2226.45ms (95%), errors: 0.00, reconnects: 1.00 <-+ killing the node [1100s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | [1110s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | [1120s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | [1130s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 |-- Gap waiting for a node to become [1140s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | READ_ONLY=0 [1150s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | [1160s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects: 0.00 | [1170s] threads: 10, tps: 4.70, reads: 51.30, writes: 22.80, response time: 80430.18ms (95%), errors: 0.00, reconnects: 0.00 <-+ [1180s] threads: 10, tps: 8.90, reads: 80.10, writes: 35.60, response time: 2068.39ms (95%), errors: 0.00, reconnects: 0.00 ... [1750s] threads: 10, tps: 5.50, reads: 49.80, writes: 22.80, response time: 2266.80ms (95%), errors: 0.00, reconnects: 0.00 -- No additional errors

I decided to promote node 192.168.1.6 given the weight for readers was equal and as such no difference in this setup.

(root@localhost:pm) [(none)]>set global read_only=0; Query OK, 0 rows affected (0.00 sec)

Checking ProxySQL:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 1848 | 148532 | 0 | 40 | | 501 | 192.168.1.5 | 3306 | SHUNNED | 0 | 0 | 1 | 72 | 1565987 | 76267703 | 4879938857 | 38 | | 501 | 192.168.1.6 | 3306 | ONLINE | 2 | 10 | 12 | 0 | 8843069 | 430654903 | 27597990684 | 40 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1 | 11 | 12 | 0 | 8048826 | 392101994 | 25145582384 | 83 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 725820 | 35371512 | 2259974847 | 227 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 723582 | 35265066 | 2254824754 | 290 | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

As the READ_ONLY value is modified, ProxySQL moves it to the writer HG, and writes can take place again. At this point in time production activities are recovered.

Reads had just a minor glitch:

Reads

[ 10s] threads: 10, tps: 0.00, reads: 20192.15, writes: 0.00, response time: 6.96ms (95%), errors: 0.00, reconnects: 0.00 ... [ 410s] threads: 10, tps: 0.00, reads: 16489.03, writes: 0.00, response time: 9.41ms (95%), errors: 0.00, reconnects: 2.50 ... [ 710s] threads: 10, tps: 0.00, reads: 18789.40, writes: 0.00, response time: 6.61ms (95%), errors: 0.00, reconnects: 0.00

The glitch happened when node 192.168.1.6 was copied over to HG 500, but with no interruptions or errors. At this point let us put back the crashed node, which comes back elect Node2 (192.168.1.6) as Donor.

This was a Percona XtraDB Cluster/Galera choice, and we have to accept and manage it.

Note that the other basic scripts put the node in OFFLINE_SOFT, given the node will become a DONOR.
Galera_check will recognize that Node2 (192.168.1.6) is the only active node in the segment for that specific HG (writer), while is not the only present for the READER HG.

As such it will put the node in OFFLINE_SOFT only for the READER HG, trying to reduce the load on the node, but it will keep it active in the WRITER HG, to prevent service interruption.

Restart the node and ask for a donor:

2016-09-05 12:21:43 8007 [Note] WSREP: Flow-control interval: [67, 67] 2016-09-05 12:21:45 8007 [Note] WSREP: Member 1.1 (node1) requested state transfer from '*any*'. Selected 0.1 (node2)(SYNCED) as donor. 2016-09-05 12:21:46 8007 [Note] WSREP: (ef248c1f, 'tcp://192.168.1.8:4567') turning message relay requesting off 2016-09-05 12:21:52 8007 [Note] WSREP: New cluster view: global state: 234bb6ed-527d-11e6-9971-e794f632b140:324329, view# 7: Primary, number of nodes: 5, my index: 3, protocol version 3

galera_check  sets OFFLINE_SOFT 192.168.1.6 only for the READER HG, and ProxySQL uses the others to serve reads.

+-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 7746 | 622557 | 0 | 86 | | 501 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 1 | 147 | 1565987 | 76267703 | 4879938857 | 38 | | 501 | 192.168.1.6 | 3306 | OFFLINE_SOFT | 0 | 0 | 12 | 0 | 9668944 | 470878452 | 30181474498 | 86 | <-- Node offline | 501 | 192.168.1.7 | 3306 | ONLINE | 9 | 3 | 12 | 0 | 10932794 | 532558667 | 34170366564 | 62 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 816599 | 39804966 | 2545765089 | 229 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 814893 | 39724481 | 2541760230 | 248 | +-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

When the SST donor task is over, galera_check moves the 192.168.1.6 back ONLINE as expected. But at the same time, it moves the recovering node to the special HG to avoid to have it included in any activity until ready.

2016-09-05 12:22:36 27352 [Note] WSREP: 1.1 (node1): State transfer from 0.1 (node2) complete. 2016-09-05 12:22:36 27352 [Note] WSREP: Shifting JOINER -> JOINED (TO: 325062)

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 1554 | 124909 | 0 | 35 | | 501 | 192.168.1.6 | 3306 | ONLINE | 2 | 8 | 22 | 0 | 10341612 | 503637989 | 32286072739 | 35 | | 501 | 192.168.1.7 | 3306 | ONLINE | 3 | 9 | 12 | 0 | 12058701 | 587388598 | 37696717375 | 13 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 890102 | 43389051 | 2776691164 | 355 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 887994 | 43296865 | 2772702537 | 250 | | 9500 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 57 | <-- Special HG for recover | 9501 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 57 | <-- Special HG for recover +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

Once finally the node is in SYNC with the group, it is put back online in the READER HG and in the writer HG:

2016-09-05 12:22:36 27352 [Note] WSREP: 1.1 (node1): State transfer from 0.1 (node2) complete. 2016-09-05 12:22:36 27352 [Note] WSREP: Shifting JOINER -> JOINED (TO: 325062)

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | <-- Back on line | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 402 | 32317 | 0 | 68 | | 501 | 192.168.1.5 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 6285 | 305823 | 19592814 | 312 | <-- Back on line | 501 | 192.168.1.6 | 3306 | ONLINE | 4 | 6 | 22 | 0 | 10818694 | 526870710 | 33779586475 | 68 | | 501 | 192.168.1.7 | 3306 | ONLINE | 0 | 12 | 12 | 0 | 12492316 | 608504039 | 39056093665 | 26 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 942023 | 45924082 | 2940228050 | 617 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 939975 | 45834039 | 2935816783 | 309 | +-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+ +--------------+-------------+------+--------+------------+ | hostgroup_id | hostname | port | status | weight | +--------------+-------------+------+--------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 100 | | 500 | 192.168.1.6 | 3306 | ONLINE | 1000000000 | | 501 | 192.168.1.5 | 3306 | ONLINE | 100 | | 501 | 192.168.1.6 | 3306 | ONLINE | 1000000000 | | 501 | 192.168.1.7 | 3306 | ONLINE | 1000000000 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | +--------------+-------------+------+--------+------------+

But given it is coming back with its READER WEIGHT, it will NOT compete with the previously elected WRITER.

The recovered node will stay on “hold” waiting for a DBA to act and eventually put it back, or be set as READ_ONLY and as such be fully removed from the WRITER HG.

Let see the automatic procedure now:

For the moment, we just stick to the MANUAL failover process. The process is:

  1. Generate some load
  2. Kill the writer node
  3. Script will do auto-failover
  4. Recover crashed node

Check our scheduler config:

+----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+ | id | active | interval_ms | filename | arg1 | arg2 | arg3 | arg4 | arg5 | comment | +----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+ | 10 | 1 | 2000 | /var/lib/proxysql/galera_check.pl | -u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --active_failover --debug=0 --log=/var/lib/proxysql/galeraLog | NULL | NULL | NULL | NULL | | <--- Active | 20 | 0 | 1500 | /var/lib/proxysql/galera_check.pl | -u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --debug=0 --log=/var/lib/proxysql/galeraLog | NULL | NULL | NULL | NULL | | +----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+

The active one is the one with auto-failover. Start load and check current load:

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.5 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 952 | 76461 | 0 | 0 | | 501 | 192.168.1.5 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 53137 | 2587784 | 165811100 | 167 | | 501 | 192.168.1.6 | 3306 | ONLINE | 5 | 5 | 11 | 0 | 283496 | 13815077 | 891230826 | 109 | | 501 | 192.168.1.7 | 3306 | ONLINE | 3 | 7 | 10 | 0 | 503516 | 24519457 | 1576198138 | 151 | | 501 | 192.168.1.8 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 21952 | 1068972 | 68554796 | 300 | | 501 | 192.168.1.9 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 21314 | 1038593 | 67043935 | 289 | +-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

Kill the main node 192.168.1.5:

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+ | 500 | 192.168.1.6 | 3306 | ONLINE | 10 | 0 | 10 | 0 | 60 | 4826 | 0 | 0 | | 501 | 192.168.1.5 | 3306 | SHUNNED | 0 | 0 | 1 | 11 | 177099 | 8626778 | 552221651 | 30 | | 501 | 192.168.1.6 | 3306 | ONLINE | 3 | 7 | 11 | 0 | 956724 | 46601110 | 3002941482 | 49 | | 501 | 192.168.1.7 | 3306 | ONLINE | 2 | 8 | 10 | 0 | 1115685 | 54342756 | 3497575125 | 42 | | 501 | 192.168.1.8 | 3306 | ONLINE | 0 | 1 | 1 | 0 | 76289 | 3721419 | 240157393 | 308 | | 501 | 192.168.1.9 | 3306 | ONLINE | 1 | 0 | 1 | 0 | 75803 | 3686067 | 236382784 | 231 | +-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

When the node is killed the node is SHUNNED, but this time the script already set the new node 192.168.1.6 to ONLINE. See script log:

2016/09/08 14:04:02.494:[INFO] END EXECUTION Total Time:102.347850799561 2016/09/08 14:04:04.478:[INFO] This Node Try to become a WRITER set READ_ONLY to 0 192.168.1.6:3306:HG501 2016/09/08 14:04:04.479:[INFO] This Node NOW HAS READ_ONLY = 0 192.168.1.6:3306:HG501 2016/09/08 14:04:04.479:[INFO] END EXECUTION Total Time:71.8140602111816

More importantly, let’s look at the application experience:

Writes

[ 10s] threads: 10, tps: 9.40, reads: 93.60, writes: 41.60, response time: 1317.41ms (95%), errors: 0.00, reconnects: 0.00 [ 20s] threads: 10, tps: 8.30, reads: 74.70, writes: 33.20, response time: 1350.96ms (95%), errors: 0.00, reconnects: 0.00 [ 30s] threads: 10, tps: 8.30, reads: 74.70, writes: 33.20, response time: 1317.81ms (95%), errors: 0.00, reconnects: 0.00 [ 40s] threads: 10, tps: 7.80, reads: 70.20, writes: 31.20, response time: 1407.51ms (95%), errors: 0.00, reconnects: 0.00 [ 50s] threads: 10, tps: 6.70, reads: 60.30, writes: 26.80, response time: 2259.35ms (95%), errors: 0.00, reconnects: 0.00 [ 60s] threads: 10, tps: 6.60, reads: 59.40, writes: 26.40, response time: 3275.78ms (95%), errors: 0.00, reconnects: 0.00 [ 70s] threads: 10, tps: 5.70, reads: 60.30, writes: 26.80, response time: 1492.56ms (95%), errors: 0.00, reconnects: 1.00 <-- just a reconnect experience [ 80s] threads: 10, tps: 6.70, reads: 60.30, writes: 26.80, response time: 7959.74ms (95%), errors: 0.00, reconnects: 0.00 [ 90s] threads: 10, tps: 6.60, reads: 59.40, writes: 26.40, response time: 2109.03ms (95%), errors: 0.00, reconnects: 0.00 [ 100s] threads: 10, tps: 6.40, reads: 57.60, writes: 25.60, response time: 1883.96ms (95%), errors: 0.00, reconnects: 0.00 [ 110s] threads: 10, tps: 5.60, reads: 50.40, writes: 22.40, response time: 2167.27ms (95%), errors: 0.00, reconnects: 0.00

With no errors and no huge delay, our application (managing to reconnect) had only a glitch, and had to reconnect.

Read had no errors or reconnects.

The connection errors were managed by ProxySQL, and given it found five in one second it SHUNNED the node. The galera_script was able to promote a reader, and given it is a failover, no delay with retry loop. The whole thing was done in such brief time that application barely saw it.

Obviously, an application with thousands of connections/sec will experience larger impact, but the time-window will be very narrow. Once the failed node is ready to come back, either we choose to start it with READ_ONLY=1, and it will come back as the reader.
Or we will keep it as it is and it will come back as the writer.

No matter what, the script manages the case as it had done in the previous (manual) exercise.

Conclusions

ProxySQL and galera_check, when working together, are quite efficient in managing the cluster and its different scenarios. When using the single-writer mode, solving the manual part of the failover dramatically improves the efficiency in production state recovery performance — going from few minutes to seconds or less.

The multi-writer mode remains the preferred and most recommended way to use ProxySQL/Percona XtraDB Cluster given it performs failover without the need of additional scripts or extensions. It’s also the preferred method if a script is required to manage the integration with ProxySQL.

In both cases, the use of a script can identify the multiple states of Percona XtraDB Cluster and the mutable node scenario. It is a crucial part of the implementation, without which ProxySQL might not behave correctly.

Percona XtraDB Cluster 5.6.30-25.16.2 is now available (CVE-2016-6662 fix)

September 15, 2016 - 6:53am

Percona  announces the new release of Percona XtraDB Cluster 5.6 on September 15, 2016. Binaries are available from the downloads area or our software repositories.

Percona XtraDB Cluster 5.6.30-25.16.2 is now the current release, based on the following:

  • Percona Server 5.6.30-76.3
  • Galera Replication library 3.16
  • Codership wsrep API version 25

This release provides a fix for CVE-2016-6662. More information about this security issue can be found here.

Bug Fixed:

  • Due to security reasons ld_preload libraries can now only be loaded from the system directories (/usr/lib64, /usr/lib) and the MySQL installation base directory.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

MySQL Default Configuration Changes between 5.6 and 5.7

September 14, 2016 - 3:26pm

In this blog post, we’ll discuss the MySQL default configuration changes between 5.6 and 5.7.

MySQL 5.7 has added a variety of new features that might excite you. However, there are also changes in the current variables that you might have overlooked. MySQL 5.7 updated nearly 40 of the defaults from 5.6. Some of the changes could severely impact your server performance, while others might go unnoticed. I’m going to go over each of the changes and what they mean.

The change that can have the largest impact on your server is likely sync_binlog. My colleague, Roel Van de Paar, wrote about this impact in depth in another blog post, so I won’t go in much detail. Sync_binlog controls how MySQL flushes the binlog to disk. The new value of 1 forces MySQL to write every transaction to disk prior to committing. Previously, MySQL did not force flushing the binlog, and trusted the OS to decide when to flush the binlog.

(https://www.percona.com/blog/2016/06/03/binary-logs-make-mysql-5-7-slower-than-5-6/)

Variables 5.6.29 5.7.11 sync_binlog 0 1

 

The performance schema variables stand out as unusual, as many have a default of -1. MySQL uses this notation to call out variables that are automatically adjusted. The only performance schema variable change that doesn’t adjust itself is  performance_schema_max_file_classes. This is the number of file instruments used for the performance schema. It’s unlikely you will ever need to alter it.

Variables 5.6.29 5.7.11 performance_schema_accounts_size 100 -1 performance_schema_hosts_size 100 -1 performance_schema_max_cond_instances 3504 -1 performance_schema_max_file_classes 50 80 performance_schema_max_file_instances 7693 -1 performance_schema_max_mutex_instances 15906 -1 performance_schema_max_rwlock_instances 9102 -1 performance_schema_max_socket_instances 322 -1 performance_schema_max_statement_classes 168 -1 performance_schema_max_table_handles 4000 -1 performance_schema_max_table_instances 12500 -1 performance_schema_max_thread_instances 402 -1 performance_schema_setup_actors_size 100 -1 performance_schema_setup_objects_size 100 -1 performance_schema_users_size 100 -1

 

The optimizer_switch, and sql_mode variables have a variety of options that can each be enabled and cause a slightly different action to occur. MySQL 5.7 enables both variables for flags, increasing their sensitivity and security. These additions make the optimizer more efficient in determining how to correctly interpret your queries.

Three flags have been added to the optimzer_switch, all of which existed in MySQL 5.6 and were set as the default in MySQL 5.7 (with the intent to increase the optimizer’s efficiency): duplicateweedout=on, condition_fanout_filter=on, and derived_merge=on. duplicateweedout is part of the optimizer’s semi join materialization strategy. condition_fanout_filter controls the use of condition filtering, and derived_merge controls the merging of derived tables, and views into the outer query block.

https://dev.mysql.com/doc/refman/5.7/en/switchable-optimizations.html

http://www.chriscalender.com/tag/condition_fanout_filter/

The additions to SQL mode do not affect performance directly, however they will improve the way you write queries (which can increase performance). Some notable changes include requiring all fields in a select … group by statement must either be aggregated using a function like SUM, or be in the group by clause. MySQL will not assume they should be grouped, and will raise an error if a field is missing. Strict_trans_tables causes a different effect depending on if it used with a transactional table.

Statements are rolled back on transaction tables if there is an invalid or missing value in a data change statement. For tables that do not use a transactional engine, MySQL’s behavior depends on the row in which the invalid data occurs. If it is the first row, then the behavior matches that of a transactional engine. If not, then the invalid value is converted to the closest valid value, or the default value for the columns. A warning is generated, but the data is still inserted.

Variables 5.6.29 5.7.11 optimizer_switch index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
engine_condition_pushdown=on
index_condition_pushdown=on
mrr=on,mrr_cost_based=on
block_nested_loop=on
batched_key_access=off
materialization=on, semijoin=on
loosescan=on, firstmatch=on
subquery_materialization_cost_based=on
use_index_extensions=on
index_merge=on
index_merge_union=on
index_merge_sort_union=on
index_merge_intersection=on
engine_condition_pushdown=on
index_condition_pushdown=on
mrr=on
mrr_cost_based=on
block_nested_loop=on
batched_key_access=off
materialization=on
semijoin=on
loosescan=on
firstmatch=on
duplicateweedout=on
subquery_materialization_cost_based=on
use_index_extensions=on
condition_fanout_filter=on
derived_merge=on sql_mode NO_ENGINE_SUBSTITUTION ONLY_FULL_GROUP_BY
STRICT_TRANS_TABLES
NO_ZERO_IN_DATE
NO_ZERO_DATE
ERROR_FOR_DIVISION_BY_ZERO
NO_AUTO_CREATE_USER
NO_ENGINE_SUBSTITUTION

 

There have been a couple of variable changes surrounding the binlog. MySQL 5.7 updated the binlog_error_action so that if there is an error while writing to the binlog, the server aborts. These kind of incidents are rare, but cause a big impact to your application and replication when they occurs, as the server will not perform any further transactions until corrected.

The binlog default format was changed to ROW, instead of the previously used statement format. Statement writes less data to the logs. However there are many statements that cannot be replicated correctly, including “update … order by rand()”. These non-deterministic statements could result in different resultsets on the master and slave. The change to Row format writes more data to  the binlog, but the information is more accurate and ensures correct replication.

MySQL has begun to focus on replication using GTID’s instead of the traditional binlog position. When MySQL is started, or restarted, it must generate a list of the previously used GTIDs. If binlog_gtid_simple_recovery is OFF, or FALSE, then the server starts with the newest binlog and iterates backwards through the binlog files searching for a previous_gtids_log_event. With it set to ON, or TRUE, then the server only reviews the newest and oldest binlog files and computes the used gtids.Binlog_gtid_simple_recovery  makes it much faster to identify the binlogs, especially if there are a large number of binary logs without GTID events. However, in specific cases it could cause gtid_executed and gtid_purged to be populated incorrectly. This should only happen when the newest binarly log was generated by MySQL5.7.5 or older, or if a SET GTID_PURGED statement was run on MySQL earlier than version 5.7.7.

Another replication-based variable updated in 5.7 is slave_net_timeout. It is lowered to only 60 seconds. Previously the replication thread would not consider it’s connection to the master broken until the problem existed for at least an hour. This change informs you much sooner if there is a connectivity problem, and ensures replication does not fall behind significantly before informing you of an issue.

Variables 5.6.29 5.7.11 binlog_error_action IGNORE_ERROR ABORT_SERVER binlog_format STATEMENT ROW binlog_gtid_simple_recovery OFF ON slave_net_timeout 3600 60

 

InnoDB buffer pool changes impact how long starting and stopping the server takes. innodb_buffer_pool_dump_at_shutdown and innodb_buffer_pool_load_at_startup are used together to prevent you from having to “warm up” the server. As the names suggest, this causes a buffer pool dump at shutdown and load at startup. Even though you might have a buffer pool of 100’s of gigabytes, you will not need to reserve the same amount of space on disk, as the data written is much smaller. The only things written to disk for this is the information necessary to locate the actual data, the tablespace and page IDs.

Variables 5.6.29 5.7.11 innodb_buffer_pool_dump_at_shutdown OFF ON innodb_buffer_pool_load_at_startup OFF ON

 

MySQL now made some of the options implemented in InnoDB during 5.6 and earlier into its defaults. InnoDB’s checksum algorithm was updated from innodb to crc32, allowing you to benefit from the hardware acceleration recent Intel CPU’s have available.

The Barracuda file format has been available since 5.5, but had many improvements in 5.6. It is now the default in 5.7. The Barracuda format allows you to use the compressed and dynamic row formats. My colleague Alexey has written about the utilization of the compressed format and the results he saw when optimizing a server: https://www.percona.com/blog/2008/04/23/real-life-use-case-for-barracuda-innodb-file-format/

The innodb_large_prefix defaults to “on”, and when combined with the Barracuda file format allows for creating larger index key prefixes, up to 3072 bytes. This allows larger text fields to benefit from an index. If this is set to “off”, or the row format is not either dynamic or compressed, any index prefix larger than 767 bytes gets silently be truncated. MySQL has introduced larger InnoDB page sizes (32k and 64k) in 5.7.6.

MySQL 5.7 increased the innodb_log_buffer_size value as well. InnoDB uses the log buffer to log transactions prior to writing them to disk in the binary log. The increased size allows the log to flush to the disk less often, reducing IO, and allows larger transactions to fit in the log without having to write to disk before committing.

MySQL 5.7 moved InnoDB’s purge operations to a background thread in order to reduce the thread contention in MySQL 5.5.The latest version increases the default to four purge threads, but can be changed to have anywhere from 1 to 32 threads.

MySQL 5.7 now enables innodb_strict_mode by default, turning some of the warnings into errors. Syntax errors in create table, alter table, create index, and optimize table statements generate errors and force the user to correct them prior to running. It also enables a record size check, ensuring that insert or update statements will not fail due to the record being too large for the selected page size.

Variables 5.6.29 5.7.11 innodb_checksum_algorithm innodb crc32 innodb_file_format Antelope Barracuda innodb_file_format_max Antelope Barracuda innodb_large_prefix OFF ON innodb_log_buffer_size 8388608 16777216 innodb_purge_threads 1 4 innodb_strict_mode OFF ON

 

MySQL has increased the number of times the optimizer dives into the index when evaluating equality ranges. If the optimizer needs to dive into the index more than the eq_range_index_dive_limit , defaulted to 200 in MySQL 5.7, then it uses the existing index statistics. You can adjust this limit from 0, eliminating index dives, to 4294967295. This can have a significant impact to query performance since the table statistics are based on the cardinality of a random sample. This can cause the optimizer to estimate a much larger set of rows to review than it would with the index dives, changing the method the optimizer chooses to execute the query.

MySQL 5.7 deprecated log_warnings. The new preference is utilize log_error_verbosity. By default this is set to 3, and logs errors, warnings, and notes to the error log. You can alter this to 1 (log errors only) or 2 (log errors and warnings). When consulting the error log, verbosity is often a good thing. However this increases the IO and disk space needed for the error log.

Variables 5.6.29 5.7.11 eq_range_index_dive_limit 10 200 log_warnings 1 2

 

There are many changes to the defaults in 5.7. But many of these options have existed for a long time and should be familiar to users. Many people used these variables, and they are the best method to push MySQL forward. Remember, however, you can still edit these variables, and configure them to ensure that your server works it’s best for your data.

pmp-check-pt-table-checksum Percona Monitoring Plugin

September 14, 2016 - 1:52pm

Recently, I worked on a customer case where the customer needed to monitor the checksum via Nagios monitoring. The pmp-check-pt-table-checksum plugin from Percona Monitoring Plugins for MySQL achieves this goal. I thought it was worth a blogpost.

pmp-check-pt-table-checksum alerts you when the pt-table-checksum tool from Percona Toolkit finds data drifts on a replication slave. pmp-checksum-pt-table-checksum monitors data differences on the slave from the checksum table as per information in the last checksum performed by the pt-table-checksum tool. By default, the plugin queries the percona.checksum table to fetch information about data discrepancies. You can override this behavior with the “-T” option. You can check the pmp-check-pt-table-checksum documentation for details.

Let’s demonstrate checksum monitoring via Nagios. My setup contains a master with two slave(s) connected, as follows:

  • Host 10.0.3.131 is master.
  • Host 10.0.3.83 is slave1
  • Host 10.0.3.36 is slave2

I intentionally generated more data on the master so pt-table-checksum can catch the differences on the slave(s). Here’s what it looks like:

mysql-master> SELECT * FROM test.t1; +------+ | id | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | +------+ 10 rows in set (0.00 sec) mysql-slave1> SELECT * FROM test.t1; +------+ | id | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | +------+ 5 rows in set (0.00 sec) mysql-slave2> SELECT * FROM test.t1; +------+ | id | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | +------+ 5 rows in set (0.00 sec)

As you can see, slave1 and slave2 are different from the master: the master has ten rows while the slave(s) have five rows each (table t1).

Then, I executed pt-table-checksum from the master to check for data discrepancies:

[root@master]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 08-25T04:57:10 0 1 10 1 0 0.018 test.t1 [root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases mysql h=10.0.3.131,u=checksum_user,p=checksum_password Differences on slave1 TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY test.t1 1 -5 1 Differences on slave2 TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY test.t1 1 -5 1

pt-table-checksum correctly identifies the differences for the test.t1 table on slave1 and slave2. Now, you can use the pmp-check-pt-table-checksum  Percona checksum monitoring plugin. Let’s try to run it locally (via CLI) from the Nagios host.

[root@nagios]# pmp-check-pt-table-checksum -H slave1 -l checksum_user -p checksum_password -P 3306 WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1 [root@nagios]# pmp-check-pt-table-checksum -H slave2 -l checksum_user -p checksum_password -P 3306 WARN pt-table-checksum found 1 chunks differ in 1 tables, including test.t1]

NOTE: The checksum_user database user needs SELECT privileges on both the checksum table (Percona.checksums) and the slave(s) in order for SQL to alert for checksum differences on slave(s).

On the Nagios monitoring server, you need to add the pmp-check-pt-table-checksum command to the commands.cfg file:

define command{ command_name pmp-check-pt-table-checksum command_line $USER1$/pmp-check-pt-table-checksum -H $HOSTADDRESS$ -c $ARG1$ }

NOTE: I used “-c” option for pmp-check-pt-table-checksum, which raises a critical error instead of a warning.

And, on the existing hosts.cfg file (i.e., slave1.cfg and slave2.cfg), you need to add a monitoring command accordingly as below:

define service{ use generic-service host_name slave1 service_description Checksum Status check_command pmp-check-pt-table-checksum!1 }

In this command “1” is an argument to command “-c $ARG1$” so pmp-check-pt-table-checksum will raise a critical error when one or more chunks on the slave(s) are different from the master.

Last but not least, restart the Nagios daemon on the monitoring host to make the change.

Below is how it looks like on the Nagios monitoring on the web:


I also think the “INTERVAL” option is useful:

-i INTERVAL     Interval over which to ensure pt-table-checksum was run, in days; default - not to check.

It makes sure that chunks are recent on the checksum table. Used the other way around, it checks on how old your chunks are. This option ensures the checksum cron executes at a defined number of days. Let’s say you have pt-table-checksum cron running once per week. In that case, setting INTERVAL 14 or 21 alerts you if chunks are older then defined number of days (i.e., the INTERVAL number).

Conclusion:

Percona Monitoring plugins for MySQL are very useful and easy to embed in your centralize monitoring dashboard. You can schedule pt-table-checksum via a cron job, and get reports regarding master/slave(s) data drifts (if any) from one global dashboard on the monitoring host. There are various plugins available from Percona, e.g. processlist plugin, replication delay plugin, etc. Along with that, Percona offers Cacti and Zabbix templates to graph various MySQL activities.

Webinar Thursday Sept. 15: Identifying and Solving Database Performance Issues with PMM

September 14, 2016 - 6:51am

Please join Roman Vynar, Lead Platform Engineer on Thursday, September 15, 2016 at 10 am PDT (UTC-7) for a webinar on Identifying and Solving Database Performance Issues with PMM.

Database performance is the key to high-performance applications. Gaining visibility into the database is the key to improving database performance. Percona’s Monitoring and Management (PMM) provides the insight you need into your database environment.

In this webinar, we will demonstrate how using PMM for query analytics, in combination with database and host performance metrics, can more efficiently drive tuning, issue management and application development. Using PMM can result in faster resolution times, more focused development and a more efficient IT team.

Register for the webinar here.

Roman Vynar, Lead Platform Engineer Roman is a Lead Platform Engineer at Percona. He joined the company to establish and develop the Remote DBA service from scratch. Over time, the growing service successfully expanded to Managed Services. Roman develops the monitoring tools, automated scripts, backup solution, notification and incident tracking web system and currently leading Percona Monitoring and Management project.

Black Friday and Cyber Monday: Best Practices for Your E-Commerce Database

September 14, 2016 - 5:06am

This blog post discusses how your can protect your e-commerce database from a high traffic disaster.

Database power today’s e-commerce. Whether it’s listing items on your site, contacting your distributor for inventory, tracking shipments, payments, or customer data, your database must be up, running, tuned and available for your business to be successful.

There is no time that this is more important that high-volume traffic days. There are specific events that occur throughout the year (such as Black Friday, Cyber Monday, or Singles Day) that you know are going to put extra strain on your database environment. But these are the specific times that your database can’t go down – these are the days that can make or break your year!

So what can you do to guarantee that your database environment is up to the challenge of handling high traffic events? Are there ways of preparing for this type of traffic?

Yes, there are! In this blog post, we’ll look at some of the factors that can help prepare your database environment to handle large amounts of traffic.

Synchronous versus Asynchronous Applications

Before moving to strategies, we need to discuss the difference between synchronous and asynchronous applications.

In most web-based applications, user input starts a number requests for resources. Once the server answers the requests, no communication stops until the next input. This type of communication between a client and server is called synchronous communication.

Restricted application updates limited Synchronous communication. Even synchronous applications designed to automatically refresh application server information at regular intervals have consistent periods of delay between data refreshes. While usually such delays aren’t an issue, some applications (for example, stock-trading applications) rely on continuously updated information to provide their users optimum functionality and usability.

Web 2.0-based applications address this issue by using asynchronous communication. Asynchronous applications deliver continuously updated data to users. Asynchronous applications separate client requests from application updates, so multiple asynchronous communications between the client and server can occur simultaneously or in parallel.

The strategy you use to scale the two types of applications to meet growing user and traffic demands differ.

Scaling a Synchronous/Latency-sensitive Application

When it comes to synchronous applications, you really have only one option for scaling performance: sharding. With sharding, the tables are divided and distributed across multiple servers, which reduces the total number of rows in each table. This consequently reduces index size, and generally improves search performance.

A shard can also be located on its own hardware, with different shards added to different machines. This database distribution over a large multiple of machines spreads the load out, also improving performance. Sharding allows you to scale read and write performance when latency is important.

Generally speaking, it is better to avoid synchronous applications when possible – they limit your scalability options.

Scaling an Asynchronous Application

When it comes to scaling asynchronous applications, we have many more options than with synchronous applications. You should try and use asynchronous applications whenever possible:

  • Secondary/Slave hosts. Replication can be used to add more hardware for read traffic. Replication usually employs a master/slave relationship between a designated “original” server and copies of the server. The master logs and then distributes the updates to the slaves. This setup allows you to distribute the read load across more than one machine.
  • Caching. Database caching (tables, data, and models – caching summaries of data) improves scalability by distributing the query workload from expensive (overhead-wise) backend processes to multiple cheaper ones. It allows more flexibility for data processing: for example, premium user data can be cached, while regular user data isn’t.

    Caching also improves data availability by providing applications that don’t depend on backend services continued service. It also allows for improved data access speeds by localizing the data and avoiding roundtrip queries. There are some specific caching strategies you can use:

    • Pre-Emptive Caching. Ordinarily, an object gets cached the first time it is requested (or if cached data isn’t timely enough). Preemptive caching instead generates cached versions before an application requests them. Typically this is done by a cron process.
    • Hit/Miss Caching. A cache hit occurs when an application or software requests data. First, the central processing unit (CPU) looks for the data in its closest memory location, which is usually the primary cache. If the requested data is found in the cache, it is considered a cache hit. Cache miss occurs within cache memory access modes and methods. For each new request, the processor searched the primary cache to find that data. If the data is not found, it is considered a cache miss.A cache hit serves data more quickly, as the data can be retrieved by reading the cache memory. The cache hit also can be in disk caches where the requested data is stored and accessed by the first query. A cache miss slows down the overall process because after a cache miss, the central processing unit (CPU) will look for a higher level cache, such as L1, L2, L3 and random access memory (RAM) for that data. Further, a new entry is created and copied into cache before it can be accessed by the processor.
    • Client-side Caching. Client-side caching allows server data to be copied and cached on the client computer. Client side caching reduces load times by several factors
  • Queuing Updates. Queues are used to order queries (and other database functions) in a timely fashion.There are queues for asynchronously sending notifications like email and SMS in most websites. E-commerce sites have queues for storing, processing and dispatching orders. How your database handles queues can affect your performance:
    • Batching. Batch processing can be used for efficient bulk database updates and automated transaction processing, as contrasted to interactive online transaction processing (OLTP) applications.
    • Fan-Out Updates. Fan-out is duplicating data in the database. When data is duplicated it eliminates slow joins and increases read performance.
Efficient Usage of Data at Scale

As your scale up in terms of database workload, you need to be able to avoid silly queries or patterns from your applications.

  • Moving expensive queries out of the user request path. Even if your database server uses powerful hardware, its performance can be negatively affected by a handful of expensive queries. Even a single bad query can cause serious performance issues for your database. Make sure to use monitoring tools to track down the queries that are taking up the most resources.
  • Using caching to offload database traffic. Cache data away from the database using something like memcached. This is usually done at the application layer, and is highly effective.
  • Counters and In-Memory Stores. Use memory counters to monitor performance hits: pages/sec, faults/sec, available bytes, total server, target server memory, etc. Percona’s new in-memory storage engine for MongoDB also can help.
  • Connection Pooling. A connection pool made up of cached database connections, remembered so that the connections can be reused for future requests to the database. Connection pools can improve the performance of executing commands on a database.
Scaling Out (Horizontal) Tricks

Scaling horizontally means adding more nodes to a system, such as adding a new server to a database environment to a distributed software application. For example, scaling out from one Web server to three.

  • Pre-Sharding Data for Flexibility. Pre-sharding the database across the server instances allows you to have the entire environment resources available at the start of the event, rather than having to rebalance during peak event traffic.
  • Using “Kill Switches” to Control Traffic. The idea of a kill switch is a single point where you can stop the flow of data to a particular node. Strategically set up kill switches allow you to stop a destructive workload if it begins to impact the entire environment.
  • Limiting Graph Structures. By limiting the size or complexity of graph structures in the database, you will simplify data lookups and data size.
Scaling with Hardware (Vertical Scaling)

Another option to handle the increased traffic load is adding more hardware to your environment. More servers, more CPUs, more memory, etc. This, of course, can be expensive. One option here is to pre-configure your testing environment to become part of the production environment if necessary. Another is to pre-configure more Database-as-a-Service instances for the event (if you are a using cloud-based services).

Whichever method, be sure you verify and test your extra servers and environment before your drop dead date.

Testing Performance and Capacity

As always in any situation where your environment is going to be stressed beyond usual limits, testing under real-world conditions is a key factor. This includes not only test for raw traffic levels, but also the actual workloads that your database will experience, with the same volume and variety of requests.

Knowing Your Application and Questions to Ask at Development Time

Finally, it’s important that you understand what application will be using and querying the database. This sort of common sense idea is often overlooked, especially when teams (such as the development team and the database/operations team) get siloed and don’t communicate.

Get to know who is developing the applications that are using the database, and how they are doing it. As an example, a while back I had the opportunity to speak with a team of developers, mostly to just understand what they were doing. In the process of whiteboarding the app with them, we discovered a simple issue that – now that we were aware of it – took little effort to fix. These sorts of interactions, early in the process, can save a great deal of headache down the line.

Conclusion

There are many strategies that can help you prepare for high traffic events that will impact your database. I’ve covered a few here briefly. For an even more thorough look at database strategies, attend my webinar “Black Friday and Cyber Monday: How to Avoid an E-Commerce Disaster” on Thursday, September 22, 2016 10:00 am Pacific Time.

Register here.



General Inquiries

For general inquiries, please send us your question and someone will contact you.