Buy Percona ServicesBuy Now!
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 45 min 58 sec ago

Faster Node Rejoins with Improved IST performance

July 21, 2017 - 9:01am

In this blog, we’ll look at how improvements to Percona XtraDB Cluster improved IST performance.

Introduction

Starting in version 5.7.17-29.20 of Percona XtraDB Cluster significantly improved performance. Depending on the workload, the increase in throughput is in the range of 3-10x. (More details here). These optimization fixes also helped improve IST (Incremental State Transfer) performance. This blog is aimed at studying the IST impact.

IST

IST stands for incremental state transfer. When a node of the cluster leaves the cluster for a short period of time and then rejoins the cluster it needs to catch-up with cluster state. As part of this sync process existing node of the cluster (aka DONOR) donates missing write-sets to rejoining node (aka JOINER). In short, flow involves, applying missing write-sets on JOINER as it does during active workload replication.

Percona XtraDB Cluster / Galera already can apply write-sets in parallel using multiple applier threads. Unfortunately, due to commit contention, the commit action was serialized. This was fixed in the above Percona XtraDB Cluster release, allowing commits to proceed in parallel.

IST uses the same path for applying write-sets, except that it is more like a batch operation.

IST Performance

Let’s look at IST performance before and now.

Setup

  1. Two node cluster (node-1 and node-2) and gcache is configured large enough to avoid purging as we need IST
  2. Start workload against node-1 for 30 seconds
  3. Shutdown node-2
  4. Start workload that performs 4M requests against node-1. Workload produces ~3.5M write-sets that are cached in gcache and used later for IST
  5. Start node-2 with N-applier threads
  6. Wait until IST is done
  7. ….. repeat steps 3-6 with different values of N.

Observations:

  • IST is 4x faster with PXC-5.7.17 (compared to previous releases)
  • Improved performance means a quicker node rejoin, and an overall increase in cluster productivity as joiner node is available to process the workload more quickly
Conclusion

Percona XtraDB Cluster 5.7.17 significantly improved IST performance. A faster re-join of the node effectively means better cluster productivity and flexibility in planning maintenance window. So what are you waiting for? Upgrade to Percona XtraDB Cluster 5.7.17 or latest Percona XtraDB Cluster 5.7 release and experience the power!

Where Do I Put ProxySQL?

July 20, 2017 - 11:57am

In this blog post, we’ll look at how to deploy ProxySQL.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. It was created for DBAs by René Cannaò, as a means of solving complex replication topology issues. When bringing up ProxySQL with my clients, I always get questions about where it fits into the architecture. This post should clarify that.

Before continuing, you might want to know why you should use this software. The features that are of interest include:

  • MySQL firewall
  • Connection pooling
  • Shard lookup and automated routing
  • Ability to read/write split
  • Automatically switch to another master in case of active master failure
  • Query cache
  • Performance metrics
  • Other neat features!
Initial Configuration

In general, you install it on nodes that do not have a running MySQL database. You manage it via the MySQL command line on another port, usually 6032. Once it is started the configuration in /etc is not used, and you do everything within the CLI. The backend database is actually SQLite, and the db file is stored in /var/lib/proxysql.

There are many guides out there on initializing and installing it, so I won’t cover those details here. It can be as simple as:

apt-get install proxysql

ProxySQL Architecture

While most first think to install ProxySQL on a standalone node between the application and database, this has the potential to affect query performance due to the additional latency from network hops.

 

To have minimal impact on performance (and avoid the additional network hop), many recommend installing ProxySQL on the application servers. The application then connects to ProxySQL (acting as a MySQL server) on localhost, using Unix Domain Socket, and avoiding extra latency. It would then use its routing rules to reach out and talk to the actual MySQL servers with its own connection pooling. The application doesn’t have any idea what happens beyond its connection to ProxySQL.

Reducing Your Network Attack Surface

Another consideration is reducing your network attack surface. This means attempting to control all of the possible vulnerabilities in your network’s hardware and software that are accessible to unauthenticated users.

Percona generally suggests that you put a ProxySQL instance on each application host, like in the second image above. This suggestion is certainly valid for reducing latency in your database environment (by limiting network jumps). But while this is good for performance, it can be bad for security.

Every instance must be able to talk to:

  • Every master
  • Every slave

As you can imagine, this is a security nightmare. With every instance, you have x many more connections spanning your network. That’s x many more connections an attacker might exploit.

Instead, it can be better to have one or more ProxySQL instances that are between your application and MySQL servers (like the first image above). This provides a reasonable DMZ-type setup that prevents opening too many connections across the network.

That said, both architectures are valid production configurations – depending on your requirements.

Blog Poll: What Operating System Do You Run Your Development Database On?

July 19, 2017 - 3:05pm

In this post, we’ll use a blog poll to find out what operating system you use to run your development database servers.

In our last blog poll, we looked at what OS you use for your production database. Now we would like to see what you use for your development database.

As databases grow to meet more challenges and expanding application demands, they must try and get the maximum amount of performance out of available resources. How they work with an operating system can affect many variables, and help or hinder performance. The operating system you use for your database can impact consumable choices (such as hardware and memory). The operating system you use can also impact your choice of database engine as well (or vice versa).

When new projects, new applications or services or testing new architecture solutions, it makes sense to create a development environment in order to test and run scenarios before they hit production. Do you use the same OS in your development environment as you do your production environment?

Please let us know what operating system you use to run your development database. For this blog poll, we’re asking which operating system you use to actually run your development database server (not the base operating system).

If you’re running virtualized Linux on Windows, please select Linux as the OS used for development. Pick up to three that apply. Add any thoughts or other options in the comments section:

What operating system do you use to run your development database?

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.
Thanks in advance for your responses – they will help the open source community determine how database environments are being deployed.

Multi-Threaded Slave Statistics

July 19, 2017 - 10:02am

In this blog post, I’ll talk about multi-threaded slave statistics printed in MySQL error log file.

MySQL version 5.6 and later allows you to execute replicated events using parallel threads. This feature is called Multi-Threaded Slave (MTS), and to enable it you need to modify the slave_parallel_workers variable to a value greater than 1.

Recently, a few customers asked about the meaning of some new statistics printed in their error log files when they enable MTS. These error messages look similar to the example stated below:

[Note] Multi-threaded slave statistics for channel '': seconds elapsed = 123; events assigned = 57345; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 0 waited (count) when Workers occupied = 0 waited when Workers occupied = 0

The MySQL reference manual doesn’t show information about these statistics. I’ve filled a bug report asking Oracle to add information about these statistics in the MySQL documentation. I reported this bug as #85747.

Before they update the documentation, we can use the MySQL code to get insight as to the statistics meaning. We can also determine how often these statistics are printed in the error log file. Looking into the rpl_slave.cc file, we find that when you enable MTS – and log-warnings variable is greater than 1 (log-error-verbosity greater than 2 for MySQL 5.7) – the time to print these statistics in MySQL error log is 120 seconds. It is determined by a hard-coded constant number. The code below shows this:

/* Statistics go to the error log every # of seconds when --log-warnings > 1 */ const long mts_online_stat_period= 60 * 2;

Does this mean that every 120 seconds MTS prints statistics to your MySQL error log (if enabled)? The answer is no. MTS prints statistics in the mentioned period depending on the level of activity of your slave. The following line in MySQL code verifies the level of the slave’s activity to print the statistics:

if (rli->is_parallel_exec() && rli->mts_events_assigned % 1024 == 1)

From the above code, you need MTS enabled and the modulo operation between the mts_events_assigned variable and 1024 equal to 1 in order to print the statistics. The mts_events_assigned variable stores the number of events assigned to the parallel queue. If you’re replicating a low level of events, or not replicating at all, MySQL won’t print the statistics in the error log. On the other hand, if you’re replicating a high number of events all the time, and the mts_events_assigned variable increased its value until the remainder from the division between this variable and 1024 is 1, MySQL prints MTS statistics in the error log almost every 120 seconds.

You can find the explanation these statistics below (collected from information in the source code):

  1. Worker queues filled over overrun level: MTS tends to load balance events between all parallel workers, and the  slave_parallel_workers variable determines the number of workers. This statistic shows the level of saturation that workers are suffering. If a parallel worker queue is close to full, this counter is incremented and the worker replication event is delayed in order to avoid reaching worker queue limits.
  2. Waited due to a Worker queue full: This statistic is incremented when the coordinator thread must wait because of the worker queue gets overfull.
  3. Waited due to the total size: This statistic shows the number of times that the coordinator thread slept due to reaching the limit of the memory available in the worker queue to hold unapplied events. If this statistic keeps increasing when printed in your MySQL error log, you should resize the slave_pending_jobs_size_max variable to a higher value to avoid the coordinator thread waiting time.
  4. Waited at clock conflicts: In the case of possible dependencies between transactions, this statistic shows the wait time corresponding to logical timestamp conflict detection and resolution.
  5. Waited (count) when used occupied: A counter of how many times the coordinator saw Workers filled up with “enough” with assignments. The enough definition depends on the scheduler type (per-database or clock-based).
  6. Waited when workers occupied: These are statistics to compute coordinator thread waiting time for any worker available, and applies solely to the Commit-clock scheduler.
Conclusion

Multi-threaded slave is an exciting feature that allows you to replicate events faster, and keep in sync with master instances. By changing the log-warnings variable to a value greater than 1, you can get information from the slave error log file about how multi-threaded performance.

Backups and Disaster Recovery

July 18, 2017 - 10:06am

In this post, we’ll look at strategies for backups and disaster recovery.

Note: I am giving a talk on Backups and Disaster Recovery Best Practices on July 27th.

Register Now

When discussing disaster recovery, it’s important to take your business’ continuity plan into consideration. Backup and recovery processes are a critical part of any application infrastructure.

A well-tested backup and recovery system can be the difference between a minor outage and the end of your business.

You will want to take three things into consideration when planning your disaster recovery strategy: recovery time objective, recovery point objective and risk mitigation.

Recovery time objective (RTO) is how long it takes to restore your backups. Recovery point objective (RPO) is what point in time you want to recover (in other words, how much data you can afford to lose after recovery). Finally, you need to understand what risks you are trying to mitigate. Risks to your data include (but are not limited to) bad actors, data corruption, user error, host failure and data center failure.

Recommended Backup Strategies

We recommend that you use both physical (Percona XtraBackup, RDS/LVM Snapshots, MySQL Enterprise Backup) and logical backups (mysqldump, mydumper, mysqlpump). Logical backups protect against the loss of single data points, while physical backups protect against total data loss or host failure.

The best practice is running Percona XtraBackup nightly, followed by mysqldump (or in 5.7+, mysqlpump). Percona XtraBackup enables you to quickly restore a server, and mysqldump enables you to quickly restore data points. These address recovery time objectives.

For point-in-time recovery, it is recommended that you download binlogs on a regular basis (once an hour, for example).

Another option is binlog streaming. You can find more information on binlog streaming in our blog: Backing up binary log files with mysqlbinlog.

There is also a whitepaper that is the basis of my webinar here: MySQL Backup and Recovery Best Practices.

Delayed Slave

One way to save on operational overhead is to create a 24-hour delayed slave. This takes the place of the logical backup (mysqldump) as well as the binlog streaming. You want to ensure that you stop the delayed slave immediately following any issues. This ensures that the data does not get corrupted on the backup as well.

A delayed slave is created in 5.6 and above with:

CHANGE MASTER TO MASTER_DELAY = N;

After a disaster, you would issue:

STOP SLAVE;

Then, in order to get a point-in-time, you can use:

START SLAVE UNTIL MASTER_LOG_FILE = 'log_name', MASTER_LOG_POS = log_pos;

Restore

It is a good idea to test your backups at least once a quarter. Backups do not exist unless you know you can restore them. There are some recent high-profile cases where developers dropped tables or schemas, or data was corrupted in production, and in one case five different backup types were not viable to use to restore.

The best case scenario is an automated restore test that runs after your backup, and gives you information on how long it takes to restore (RTO) and how much data you can restore (RPO).

For more details on backups and disaster recovery, come to my webinar.

Register Now

Upcoming Webinar Wednesday, July 19, 2017: Learning MySQL 5.7

July 18, 2017 - 6:13am

Join Percona’s, Technical Services Manager, Jervin Real as he presents Learning MySQL 5.7 on Wednesday, July 19, 2017 at 10:00 am PDT / 1:00 pm EDT (UTC-7).

Register Now

MySQL 5.7 has a lot of new features. If you’ve dabbled with an older version of MySQL, it is worth learning what new features exist and how to use them. This webinar teaches you about new features such as multi-source replication, global transaction IDs (GTIDs), security improvements and more.

We’ll also discuss logical decoding. Logical decoding is one of the features under the BDR implementation, which allows bidirectional streams of data between Postgres instances. Also, it allows you to stream data outside Postgres into many other data systems.

Register for the webinar here.

Jervin Real, Technical Services Manager Jervin is a member of the Technical Services at Percona, where he partners with customers to build reliable and highly-performant MySQL infrastructures, He also does other fun stuff like watching cat videos on the internet, reading bugs for novels and playing around servers for numbers. Jervin joined Percona in April of 2010.

Percona Live Europe 2017 Call for Papers Deadline Extended to July 24, 2017

July 17, 2017 - 8:21pm

We are extending the Percona Live Open Source Database Conference Europe 2017 call for papers deadline to Monday, July 24, 2017. Get your submissions in now!

Between our Conference Committee working hard to review all the outstanding talk ideas, and the many community requests for more time, we didn’t want to shortchange any of our applicants. If you procrastinated too long, didn’t complete your talk submission or just plain forgot to submit, this is your reprieve! You have one extra week to pull together a proposal for a talk on open source databases at Percona Live Europe 2017.

The theme of Percona Live Europe 2017 is “Championing Open Source Databases.” There are sessions on MySQL, MariaDBMongoDB and Other Open Source Database technologies, including time series databases, PostgreSQL and RocksDB. Are you:

  • Working with MongoDB as a developer?
  • Creating a new MySQL-variant time series database?
  • Deploying MariaDB in a novel way?
  • Using open source database technology to solve a business issue?

Share your open source database experiences with peers and professionals in the open source community! We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions:

  • Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A).
  • Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. Encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A).
  • Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration.

Submit your talk ideas now.

PLEASE NOTE: We have a new call for papers system that requires everyone to register for a new account. You can save proposals as drafts and edit them later. Once a proposal is submitted, it is final. You can NOT change a proposal once submitted.

Register for Percona Live Europe 2017 now! Early Bird registration lasts until August 8

Last year’s Percona Live Europe sold out, and we’re looking to do the same at this year’s conference. Don’t miss your chance to get your ticket at its most affordable price. Click here to register.

Percona Live 2017 sponsorship opportunities are available now

Percona Live Europe 2017 is just around the corner. Have you secured your sponsorship yet? Last year’s event sold out! Booth selection for our limited sponsorship opportunities is on a first-come, first-served basis. Click here to find out how to sponsor.

Don’t miss out on a special room rate at the Percona Live Europe 2017 venue

This year, Percona Live Europe is being held at the Radisson Blu Royal Hotel, Dublin. You can get a special room rate when you attend the conference. But hurry, rooms are going fast! To get the special room rate:

  1. Visit https://www.radissonblu.com/en/royalhotel-dublin.
  2. Click BOOK NOW at the top right.
  3. Enter your preferred check-in and check-out dates, and how many rooms.
  4. From the drop-down “Select Rate Type,” choose Promotional Code.
  5. Enter the code PERCON.

The group rate only applies if used within the Percona Live Europe group block dates (September 25-27, 2017). The deadline for booking with this rate is July 24, 2017.

Percona Server for MongoDB 3.2.14-3.4 is Now Available

July 17, 2017 - 12:47pm

Percona announces the release of Percona Server for MongoDB 3.2.14-3.4 on July 17, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database that supports the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocks, Percona Memory Engine, and PerconaFT storage engine, as well as enterprise-grade features like External Authentication, Audit Logging, Profiling Rate Limiting, and Hot Backup at no extra cost. Percona Server for MongoDB requires no changes to MongoDB applications or code.

NOTE: This release deprecates the PerconaFT storage engine. It will not be available in future releases.

This release is based on MongoDB 3.2.14 and includes the following additional changes:

New Features

Bugs Fixed

  • #PSMDB-67: Fixed mongod service status messages.

Percona Server for MongoDB 3.2.14-3.4 release notes are available in the official documentation.

Percona Monitoring and Management 1.2.0 is Now Available

July 14, 2017 - 10:51am

Percona announces the release of Percona Monitoring and Management 1.2.0 on July 14, 2017.

For installation instructions, see the Deployment Guide.

Changes in PMM Server

PMM Server 1.2.0 introduced the following changes:

Updated Components New Features
  • PMM-737: New graphs in System Overview dashboard:
      • Memory Advanced Details
      • Saturation Metrics

  • PMM-1090: Added ESXi support for PMM Server virtual appliance.
UI Fixes
  • PMM-707: Fixed QPS metric in MySQL Overview dashboard to always show queries per second regardless of the selected interval.
  • PMM-708: Fixed tooltips for graphs that displayed incorrectly.
  • PMM-739PMM-797: Fixed PMM Server update feature on the landing page.
  • PMM-823: Fixed arrow padding for collapsible blocks in QAN.
  • PMM-887: Disabled the Add button when no table is specified for showing query info in QAN.
  • PMM-888: Disabled the Apply button in QAN settings when nothing is changed.
  • PMM-889: Fixed the switch between UTC and local time zone in the QAN time range selector.
  • PMM-909: Added message No query example when no example for a query is available in QAN.
  • PMM-933: Fixed empty tooltips for Per Query Stats column in the query details section of QAN.
  • PMM-937: Removed the percentage of total query time in query details for the TOTAL entry in QAN (because it is 100% by definition).
  • PMM-951: Fixed the InnoDB Page Splits graph formula in the MySQL InnoDB Metrics Advanced dashboard.
  • PMM-953: Enabled stacking for graphs in MySQL Performance Schema dashboard.
  • PMM-954: Renamed Top Users by Connections graph in MySQL User Statistics dashboard to Top Users by Connections Created and added the Connections/sec label to the Y-axis.
  • PMM-957: Refined titles for Client Connections and Client Questions graphs in ProxySQL Overview dashboard to mentioned that they show metrics for all host groups (not only the selected one).
  • PMM-961: Fixed the formula for Client Connections graph in ProxySQL Overview dashboard.
  • PMM-964: Fixed the gaps for high zoom levels in MySQL Connections graph on the MySQL Overview dashboard.
  • PMM-976: Fixed Orchestrator handling by supervisorctl.
  • PMM-1129: Updated the MySQL Replication dashboard to support new connection_name label introduced in mysqld_exporter for multi-source replication monitoring.
  • PMM-1054: Fixed typo in the tooltip for the Settings button in QAN.
  • PMM-1055: Fixed link to Query Analytics from Metrics Monitor when running PMM Server as a virtual appliance.
  • PMM-1086: Removed HTML code that showed up in the QAN time range selector.
Bug Fixes
  • PMM-547: Added warning page to Query Analytics app when there are no PMM Clients running the QAN service.
  • PMM-799: Fixed Orchestrator to show correct version.
  • PMM-1031: Fixed initialization of Query Profile section in QAN that broke after upgrading Angular.
  • PMM-1087: Fixed QAN package building.
Other Improvements
  • PMM-348: Added daily log rotation for nginx.
  • PMM-968: Added Prometheus build information.
  • PMM-969: Updated the Prometheus memory usage settings to leverage new flag. For more information about setting memory consumption by PMM, see FAQ.
Changes in PMM Client

PMM Client 1.2.0 introduced the following changes:

New Features
  • PMM-1114: Added PMM Client packages for Debian 9 (“stretch”).
Bug Fixes
  • PMM-481PMM-1132: Fixed fingerprinting for queries with multi-line comments.
  • PMM-623: Fixed mongodb_exporter to display correct version.
  • PMM-927: Fixed bug with empty metrics for MongoDB query analytics.
  • PMM-1126: Fixed promu build for node_exporter.
  • PMM-1201: Fixed node_exporter version.
Other Improvements
  • PMM-783: Directed mongodb_exporter log messages to stderr and excluded many generic messages from the default INFO logging level.
  • PMM-756: Merged upstream node_exporter version 0.14.0.
    PMM deprecated several collectors in this release:

    • gmond – Out of scope.
    • megacli – Requires forking, to be moved to textfile collection.
    • ntp – Out of scope.

    It also introduced the following breaking change:

    • Collector errors are now a separate metric: node_scrape_collector_success, not a label on node_exporter_scrape_duration_seconds
  • PMM-1011: Merged upstream mysqld_exporter version 0.10.0.
    This release introduced the following breaking change:

    • mysql_slave_... metrics now include an additional connection_name label to support MariaDB multi-source replication.
About Percona Monitoring and Management

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB performance. Percona developed it in collaboration with experts in the field of managed database services, support and consulting.

Percona Monitoring and Management is a free and open-source solution that you can run in your own environment for maximum security and reliability. It provides thorough time-based analysis for MySQL and MongoDB servers to ensure that your data works as efficiently as possible.

A live demo of PMM is available at pmmdemo.percona.com.

Please provide your feedback and questions on the PMM forum.

If you would like to report a bug or submit a feature request, use the PMM project in JIRA.

A Little Trick Upgrading to MySQL 5.7

July 14, 2017 - 10:20am

In this blog post, I’ll look at a trick we use at Percona when upgrading to MySQL 5.7.

I’ll be covering this subject (and others) in my webinar Learning MySQL 5.7 on Wednesday, July 19, 2017.

We’ve been doing upgrades for quite a while here are Percona, and we try to optimize, standardize and improve this process to save time. When upgrading to MySQL 5.7, more often than not you need to run REPAIR or ALTER via mysql_upgrade to a number of MySQL tables. Sometimes a few hundred, sometimes hundreds of thousands.

One way to cut some time from testing or executing mysql_upgrade is to combine it with mysqlcheck. This identifies tables that need to be rebuilt or repaired. The first step is to capture the output of this process:

revin@acme:~$ mysqlcheck --check-upgrade --all-databases > mysql-check.log

This provides a lengthy output of what needs to be done to successfully upgrade our tables. On my test data, I get error reports like the ones below. I’ll need to take the specified action against them:

ads.agency error : Table upgrade required. Please do "REPAIR TABLE `agency`" or dump/reload to fix it! store.categories error : Table rebuild required. Please do "ALTER TABLE `categories` FORCE" or dump/reload to fix it!

Before we run through this upgrade, let’s get an idea of how long it would take for a regular mysql_upgrade to complete on this dataset:

revin@acme:~$ time mysql_upgrade Enter password: Checking if update is needed. Checking server version. Running queries to upgrade MySQL server. Checking system database. mysql.columns_priv OK mysql.db OK ... mysql.user OK Upgrading the sys schema. Checking databases. ads.account_preference_assoc OK ... Repairing tables ... ads.agency Note : TIME/TIMESTAMP/DATETIME columns of old format have been upgraded to the new format. status : OK ... `store`.`categories` Running : ALTER TABLE `store`.`categories` FORCE status : OK ... Upgrade process completed successfully. Checking if update is needed. real 25m57.482s user 0m0.024s sys 0m0.072s

On a cold server, my baseline above took about 25 minutes.

The second step on our time-saving process is to identify the tables that need some action (in this case, REPAIR and ALTER … FORCE). Generate the SQL statements to run them and put them into a single SQL file:

revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 REPAIR | egrep -v 'REPAIR|--'); do echo "mysql -e 'REPAIR TABLE $t;'" >> upgrade.sql; done revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 ALTER | egrep -v 'ALTER|--'); do echo "mysql -e 'ALTER TABLE $t FORCE;'" >> upgrade.sql; done

My upgrade.sql file will have something like this:

mysql -e 'ALTER TABLE store.categories FORCE;' mysql -e 'REPAIR TABLE ads.agency;'

Now we should be ready to run these commands in parallel as the third step in the process:

revin@acme:~$ time parallel -j 4 -- < upgrade.sql ... real 17m31.448s user 0m1.388s sys 0m0.616s

Getting some parallelization is not bad, and the process improved by about 38%. If we are talking about multi-terabyte data sets, then it is already a big gain.

On the other hand, my dataset has a few tables that are bigger than the rest. Since mysqlcheck processes them in a specific order, one of the threads was processing most of them instead of spreading them out evenly to each thread by size. To fix this, we need to have an idea of the sizes of each table we will be processing. We can use a query from the INFORMATION_SCHEMA.TABLES for this purpose:

revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 ALTER | egrep -v 'ALTER|--'); do d=$(echo $t|cut -d'.' -f1); tbl=$(echo $t|cut -d'.' -f2); s=$(mysql -BNe "select sum(index_length+data_length) from information_schema.tables where table_schema='$d' and table_name='$tbl';"); echo "$s |mysql -e 'ALTER TABLE $t FORCE;'" >> table-sizes.sql; done revin@acme:~$ for t in $(cat mysql-check.log |grep -B1 REPAIR | egrep -v 'REPAIR|--'); do d=$(echo $t|cut -d'.' -f1); tbl=$(echo $t|cut -d'.' -f2); s=$(mysql -BNe "select sum(index_length+data_length) from information_schema.tables where table_schema='$d' and table_name='$tbl';"); echo "$s |mysql -e 'REPAIR TABLE $t;'" >> table-sizes.sql; done

Now my table-sizes.sql file will have contents like below, which I can sort and pass to the parallel command again and cut even more time!

32768 |mysql -e 'REPAIR TABLE ads.agency;' 81920 |mysql -e 'ALTER TABLE store.categories FORCE;'

revin@acme:~$ cat table-sizes.sql |sort -rn|cut -d'|' -f2 > upgrade.sql revin@acme:~$ time parallel -j 4 -- < upgrade.sql ... real 8m1.116s user 0m1.260s sys 0m0.624s

This go-around, my total execution time is 8 minutes – a good 65% improvement. To wrap it up, we will need to run mysql_upgrade one last time so that the system tables are also upgraded, the tables are checked again and then restart the MySQL server as instructed by the manual:

revin@acme:~$ time mysql_upgrade --force

The whole process should be easy to automate and script, depending on your preference. Lastly: YMMV. If you have one table that is more than half the size of your total data set, there might not be big gains.

If you want to learn more about upgrading to MySQL 5.7, come to my webinar on Wednesday, July 19: Learning MySQL 5.7. This process is only one of the phases in a multi-step upgrade process when moving to 5.7. I will discuss them in more detail next week. Register now from the link below, and I’ll talk to you soon!

Register Now

Setting Up Percona PAM with Active Directory for External Authentication

July 13, 2017 - 12:17pm

In this blog post, we’ll look at how to set up Percona PAM with Active Directory for external authentication.

In my previous article on Percona PAM, I demonstrated how to use Samba as a domain, and how easy it is to create domain users and groups via the samba-tool. Then we configured nss-pam-ldapd and nscd to enumerate user and group information via LDAP calls, and authenticate users from this source.

This time around, I will demonstrate two other ways of using Active Directory for external authentication by joining the domain via SSSD or Winbind. System Security Services Daemon (SSSD) allows you to configure access to several authentication hosts such as LDAP, Kerberos, Samba and Active Directory and have your system use this service for all types of lookups. Winbind, on the other hand, pulls data from Samba or Active Directory only. If you’re mulling over using SSSD or Winbind, take a look at this article on what SSSD or Winbind support.

For both methods, we’ll use realmd. That makes it easy to join a domain and enumerate users from it.

My testbed environment consists of two machines:

Samba PDC
OS: CentOS 7
IP Address: 172.16.0.10
Hostname: samba-10.example.com
Domain name: EXAMPLE.COM
DNS: 8.8.8.8(Google DNS), 8.8.4.4(Google DNS), 172.16.0.10(Samba)
Firewall: none

Note: Please follow the steps in the last article for setting up the Samba PDC environment.

Percona Server 5.7 with LDAP authentication via SSS or WinBind
OS: CentOS 7
IP Address: 172.16.0.21
Hostname: ps-ldap-21.example.com
DNS: 172.16.0.10(Samba PDC)

Installing realmd and Its Dependencies
  1. First, we need to make sure that the time is in sync (since this is a requirement for joining domains). Install NTP and make sure that it starts up at boot time:
    [root@ps-ldap-21 ~]# yum -y install ntp * * * Installed: ntp.x86_64 0:4.2.6p5-25.el7.centos.2 * * * [root@ps-ldap-21 ~]# ntpdate 0.centos.pool.ntp.org systemctl enable ntpd.service systemc 3 Jul 03:48:35 ntpdate[3708]: step time server 202.90.132.242 offset 1.024550 sec [root@ps-ldap-21 ~]# systemctl enable ntpd.service Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service. [root@ps-ldap-21 ~]# systemctl start ntpd.service
  2. Install realmd and its dependencies for SSSD or Winbind.
    For SSSD:
    yum -y install realmd oddjob oddjob-mkhomedir sssd adcli samba-common-tools
    For Winbind:
    yum -y install realmd oddjob oddjob-mkhomedir samba-winbind-clients samba-winbind samba-common-tools
Joining the Domain via SSSD and Preparing It for Percona PAM
  1. Run realm discover domain for realmd to discover what type of server it’s connecting to and what packages dependencies need to be installed:
    [root@ps-ldap-21 ~]# realm discover example.com example.com type: kerberos realm-name: EXAMPLE.COM domain-name: example.com configured: no server-software: active-directory client-software: sssd required-package: oddjob required-package: oddjob-mkhomedir required-package: sssd required-package: adcli required-package: samba-common-tools
    Our Samba PDC is detected as an Active Directory Controller, and the packages required have been installed previously.
  2. The next step is to join the domain by running realm join domain. If you want to get more information, add the --verbose option. You could also add the -U user option if you want to use a different administrator account.
    [root@ps-ldap-21 ~]# realm join example.com --verbose  * Resolving: _ldap._tcp.example.com  * Performing LDAP DSE lookup on: 172.16.0.10  * Successfully discovered: example.com Password for Administrator:  * Required files: /usr/sbin/oddjobd, /usr/libexec/oddjob/mkhomedir, /usr/sbin/sssd, /usr/bin/net  * LANG=C LOGNAME=root /usr/bin/net -s /var/cache/realmd/realmd-smb-conf.DM6W2Y -U Administrator ads join example.com Enter Administrator's password: Using short domain name -- EXAMPLE Joined 'PS-LDAP-21' to dns domain 'example.com'  * LANG=C LOGNAME=root /usr/bin/net -s /var/cache/realmd/realmd-smb-conf.DM6W2Y -U Administrator ads keytab create Enter Administrator's password:  * /usr/bin/systemctl enable sssd.service Created symlink from /etc/systemd/system/multi-user.target.wants/sssd.service to /usr/lib/systemd/system/sssd.service.  * /usr/bin/systemctl restart sssd.service  * /usr/bin/sh -c /usr/sbin/authconfig --update --enablesssd --enablesssdauth --enablemkhomedir --nostart && /usr/bin/systemctl enable oddjobd.service && /usr/bin/systemctl start oddjobd.service  * Successfully enrolled machine in realm
    As you can see from the command above, the realm command simplifies SSSD configuration and uses existing tools such as net and authconfig to join the domain and use it as an identity provider.
  3. Let’s test if we enumerate existing accounts by using the id command:
    [root@ps-ldap-21 ~]# id jervin id: jervin: no such user [root@ps-ldap-21 ~]# id jervin@example.com uid=343401115(jervin@example.com) gid=343400513(domain users@example.com) groups=343400513(domain users@example.com),343401103(support@example.com)
    As you can see, the user can be queried if the domain is specified. So if you want to log in as ‘jervin@example.com’, in Percona Server for MySQL you’ll need to create the user as ‘jervin@example.com’ and not ‘jervin’. For example:
    # Creating user 'jervin@example.com' CREATE USER 'jervin@example.com'@'%' IDENTIFIED WITH auth_pam; # Logging in as 'jervin@example.com' mysql -u 'jervin@example.com'
    If you want to omit the domain name when logging in, you’ll need to replace “use_fully_qualified_names = True” to “use_fully_qualified_names = False” in /etc/sssd/sssd.conf, and then restart SSSD. If you do this, then the user can be found without providing the domain:
    [root@ps-ldap-21 ~]# id jervin uid=343401115(jervin) gid=343400513(domain users) groups=343400513(domain users),343401103(support) [root@ps-ldap-21 ~]# id jervin@example.com uid=343401115(jervin) gid=343400513(domain users) groups=343400513(domain users),343401103(support)
    When you create the MySQL user, you don’t need to include the domain anymore:
    # Creating user 'jervin' CREATE USER 'jervin'@'%' IDENTIFIED WITH auth_pam; # Logging in as 'jervin' mysql -u jervin
  4. Optionally, you can specify which users and groups can log in by adding these settings to SSSD:
    Domain access filter
    Under “[domain/example.com]” /etc/sssd/sssd.conf, you can add the following to specify that only users that are members of support and dba are allowed to use SSSD. For example:
    ad_access_filter = (|(memberOf=CN=dba,CN=Users,DC=example,DC=com)(memberOf=CN=support,CN=Users,DC=example,DC=com))
    Simple filters
    You can use realm permit or realm permit -g to allow particular users or groups. For example:
    realm permit jervin realm permit -g support realm permit -g dba
    You can check sssd.conf on how these ACLs are implemented:
    access_provider = simple simple_allow_groups = support, dba simple_allow_users = jervin
  5. Finally, configure Percona Server for MySQL to authenticate to SSSD by creating /etc/pam.d/mysqld with this content:
    auth required pam_sss.so account required pam_sss.so
  6. Done. All you need to do now is to install Percona Server for MySQL, enable the auth_pam and auth_pam_compat plugins, and add PAM users. You can then check for authentication errors at /var/log/secure for troubleshooting. You could also get verbose logs by adding debug_level=[1-9] to [nss], [pam], or [domain] and then restarting SSSD. You can view the logs from /var/log/sssd.
Joining the Domain via Winbind and Preparing it for Percona PAM
  1. The realm command assumes that SSSD is used. To change the client software, use --client-software=winbind instead:
    [root@ps-ldap-21 ~]# realm --client-software=winbind discover example.com example.com     type: kerberos     realm-name: EXAMPLE.COM     domain-name: example.com     configured: no       server-software: active-directory     client-software: winbind     required-package: oddjob-mkhomedir     required-package: oddjob     required-package: samba-winbind-clients     required-package: samba-winbind     required-package: samba-common-tools
  2. Since the required packages have already been installed, we can now attempt to join this host to the domain:
    [root@ps-ldap-21 ~]# realm --verbose --client-software=winbind join example.com  * Resolving: _ldap._tcp.example.com  * Performing LDAP DSE lookup on: 172.16.0.10  * Successfully discovered: example.com Password for Administrator:  * Required files: /usr/libexec/oddjob/mkhomedir, /usr/sbin/oddjobd, /usr/bin/wbinfo, /usr/sbin/winbindd, /usr/bin/net  * LANG=C LOGNAME=root /usr/bin/net -s /var/cache/realmd/realmd-smb-conf.9YEO2Y -U Administrator ads join example.com Enter Administrator's password: Using short domain name -- EXAMPLE Joined 'PS-LDAP-21' to dns domain 'example.com'  * LANG=C LOGNAME=root /usr/bin/net -s /var/cache/realmd/realmd-smb-conf.9YEO2Y -U Administrator ads keytab create Enter Administrator's password:  * /usr/bin/systemctl enable winbind.service Created symlink from /etc/systemd/system/multi-user.target.wants/winbind.service to /usr/lib/systemd/system/winbind.service.  * /usr/bin/systemctl restart winbind.service  * /usr/bin/sh -c /usr/sbin/authconfig --update --enablewinbind --enablewinbindauth --enablemkhomedir --nostart && /usr/bin/systemctl enable oddjobd.service && /usr/bin/systemctl start oddjobd.service  * Successfully enrolled machine in realm
  3. Let’s test if we enumerate existing accounts by using the id command
    [root@ps-ldap-21 ~]# id jervin id: jervin: no such user [root@ps-ldap-21 ~]# id jervin@example.com uid=10000(EXAMPLEjervin) gid=10000(EXAMPLEdomain users) groups=10000(EXAMPLEdomain users),10001(EXAMPLEsupport)
    Unfortunately for Winbind, users identified with their domains cannot login to Percona Server for MySQL. We need to disable this from the Samba config (performed in the next step).
  4. Edit /etc/samba/smb.conf, and change “winbind use default domain = no” to “winbind use default domain = yes”. Restart the Winbind service. For example:
    vi /etc/samba/smb.conf #Look for: "winbind use default domain = no" #Change to: "winbind use default domain = yes" systemctl restart winbind.service
    Try running id again:
    [root@ps-ldap-21 ~]# id jervin uid=10000(jervin) gid=10000(domain users) groups=10000(domain users),10001(support) [root@ps-ldap-21 ~]# id jervin@example.com id: jervin@example.com: no such user
    When you create the MySQL user, do not include the domain name. For example:
    # Creating user 'jervin' CREATE USER 'jervin'@'%' IDENTIFIED WITH auth_pam; # Logging in as 'jervin' mysql -u jervin
  5. Finally, configure Percona Server for MySQL to authenticate to Winbind by creating /etc/pam.d/mysqld with this content:
    auth required pam_winbind.so account required pam_winbind.so

You can debug authentication attempts by reviewing the logs at /var/log/secure. You may also change “auth required pam_winbind.so” to “auth required pam_winbind.so debug” in /etc/pam.d/mysqld to get verbose logging in the same file.

As for filtering who can authenticate with Winbind, you can add require_membership_of=group_name under the [global] section of /etc/security/pam_winbind.conf

You’ll need to restart winbind daemon to apply the changes.

Conclusion

Thanks to realmd, it’s easier to setup Active Directory as an identity provider. With minimal configuration tweaks, you can use the identity provider to authenticate MySQL users.

Gh-ost benchmark against pt-online-schema-change performance

July 12, 2017 - 11:31am

In this blog post, I will run a gh-ost benchmark against the performance of pt-online-schema-change.

When gh-ost came out, I was very excited. As MySQL ROW replication became commonplace, you could use it to track changes instead of triggers. This practice is cleaner and safer compared to Percona Toolkit’s pt-online-schema-change. Since gh-ost doesn’t need triggers, I assumed it would generate lower overhead and work faster. I frequently called it “pt-online-schema-change on steroids” in my talks. Finally, I’ve found some time to check my theoretical claims with some benchmarks.

DISCLAIMER: These benchmarks correspond to one specific ALTER TABLE on the table of one specific structure and hardware configuration. I have not set up a broad set of tests. If you have other results – please comment!

Benchmark Setup Details
  • pt-online-schema-change from Percona Toolkit 3.0.3
  • gh-ost 1.0.36
  • Percona Server 5.7.18 on Ubuntu 16.04 LTS
  • Hardware: 28CPU cores/56 Threads.  128GB Memory.   Samsung 960 Pro 512GB
  • Sysbench 1.0.7

Prepare the table by running:

sysbench --threads=40 --rate=0 --report-interval=1 --percentile=99 --events=0 --time=0 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest  /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 prepare

The table size is about 3GB (completely fitting to innodb_buffer_pool).

Run the benchmark in “full ACID” mode with:

  • sync_binlog=1
  • innodb_flush_log_at_trx_commit=1
  • innodb_doublewrite=1

This is important as this workload is heavily commit-bound, and extensively relies on group commit.

This is the pt-online-schema-change command to alter table:

time pt-online-schema-change --execute --alter "ADD COLUMN c1 INT" D=sbtest,t=sbtest1

This the gh-ost command to alter table:

time ./gh-ost  --user="sbtest" --password="sbtest" --host=localhost --allow-on-master --database="sbtest" --table="sbtest1"  --alter="ADD COLUMN c1 INT" --execute

Tests Details

For each test the old sysbench table was dropped and a new one prepared. I tested alter table in three different cases:

  • When nothing else was running (“Idle Load”)   
  • When the system handled about 2% of load it can handle at full capacity (“Light Background Load”)
  • When the system handled about 40% of the possible load, with sysbench injected about 25% of the transactions/sec the system could handle at full load (“Heavy Background Load”)

I measured the alter table completion times for all cases, as well as the overhead generated by the alter (in other words, how much peak throughput is reduced by running alter table through the tools).

Idle Load

For the Idle Load test, pt-online-schema-change completed nearly twice as fast as gh-ost. This was a big surprise for me. I haven’t looked into the reasons or details yet, though I can see most of the CPU usage for gh-ost is on the MySQL server side. Perhaps the differences relate to the SQL used to perform non-blocking alter tables.

Light Background Load

I generated the Light Background Load by running the sysbench command below. It corresponds to a roughly 4% load, as the system can handle some 2500 transactions/sec at this concurrency under full load. Adjust the --rate value to scale it for your system.

time sysbench --threads=40 --rate=100 --report-interval=1 --percentile=99 --events=0 --time=0 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest  /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 run

The numbers changed (as expected), but pt-online-schema-change is still approximately twice as fast as gh-ost.

What is really interesting in this case is how a relatively light background load affects the process completion time. It took both pt-online-schema-change and gh-ost about 2.7x times longer to finish! 

Heavy Background Load

I generated the Heavy Background Load running the sysbench command below. It corresponds to a roughly 40% load, as the system can handle some 2500 transactions/sec at this concurrency under full load. Adjust --rate value to scale it for your system.

time sysbench --threads=40 --rate=1000 --report-interval=1 --percentile=99 --events=0 --time=0 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest  /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 run

What happened in this case? When the load gets higher, gh-ost can’t keep up with binary log processing, and just never finishes at all. While this may be surprising at first, it makes sense if you think more about how these tools work. pt-online-schema-change uses triggers, and while they have a lot of limitations and overhead they can execute in parallel. gh-ost, on the other hand, processes the binary log in a single thread and might not be able to keep up.   

In MySQL 5.6 we didn’t have parallel replication, which applies writes to the same table in parallel. For that version the gh-ost limitation probably isn’t as big a deal, as such a heavy load would also cause replication lag. MySQL 5.7 has parallel replication. This makes it much easier to quickly replicate workloads that are too heavy for gh-ost to handle.

I should note that the workload being simulated in this benchmark is a rather extreme case. The table being altered by gh-ost here is at the same time handling a background load so high it can’t be replicated in a single thread.

Future versions of gh-ost could improve this issue by applying binlog events in parallel, similar to what MySQL replicas do.

An excerpt from the gh-ost log shows how it is totally backed up trying to apply the binary log:

root@rocky:/tmp# time ./gh-ost  --user="sbtest" --password="sbtest" --host=localhost --allow-on-master --database="sbtest" --table="sbtest1"  --alter="ADD COLUMN c1 INT" --execute 2017/06/25 19:16:05 binlogsyncer.go:75: [info] create BinlogSyncer with config &{99999 mysql localhost 3306 sbtest sbtest  false false <nil>} 2017/06/25 19:16:05 binlogsyncer.go:241: [info] begin to sync binlog from position (rocky-bin.000018, 640881773) 2017/06/25 19:16:05 binlogsyncer.go:134: [info] register slave for master server localhost:3306 2017/06/25 19:16:05 binlogsyncer.go:568: [info] rotate to (rocky-bin.000018, 640881773) 2017-06-25 19:16:05 ERROR parsing time "" as "2006-01-02T15:04:05.999999999Z07:00": cannot parse "" as "2006" # Migrating `sbtest`.`sbtest1`; Ghost table is `sbtest`.`_sbtest1_gho` # Migrating rocky:3306; inspecting rocky:3306; executing on rocky # Migration started at Sun Jun 25 19:16:05 -0400 2017 # chunk-size: 1000; max-lag-millis: 1500ms; max-load: ; critical-load: ; nice-ratio: 0.000000 # throttle-additional-flag-file: /tmp/gh-ost.throttle # Serving on unix socket: /tmp/gh-ost.sbtest.sbtest1.sock Copy: 0/9872432 0.0%; Applied: 0; Backlog: 0/100; Time: 0s(total), 0s(copy); streamer: rocky-bin.000018:641578191; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 0; Backlog: 100/100; Time: 1s(total), 1s(copy); streamer: rocky-bin.000018:641626699; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 640; Backlog: 100/100; Time: 2s(total), 2s(copy); streamer: rocky-bin.000018:641896215; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 1310; Backlog: 100/100; Time: 3s(total), 3s(copy); streamer: rocky-bin.000018:642178659; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 1920; Backlog: 100/100; Time: 4s(total), 4s(copy); streamer: rocky-bin.000018:642436043; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 2600; Backlog: 100/100; Time: 5s(total), 5s(copy); streamer: rocky-bin.000018:642722777; State: ... Copy: 0/9872432 0.0%; Applied: 120240; Backlog: 100/100; Time: 3m0s(total), 3m0s(copy); streamer: rocky-bin.000018:694142377; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 140330; Backlog: 100/100; Time: 3m30s(total), 3m30s(copy); streamer: rocky-bin.000018:702948219; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 160450; Backlog: 100/100; Time: 4m0s(total), 4m0s(copy); streamer: rocky-bin.000018:711775662; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 180600; Backlog: 100/100; Time: 4m30s(total), 4m30s(copy); streamer: rocky-bin.000018:720626338; State: migrating; ETA: N/A Copy: 0/9872432 0.0%; Applied: 200770; Backlog: 100/100; Time: 5m0s(total), 5m0s(copy); streamer: rocky-bin.000018:729509960; State: migrating; ETA: N/A

Online Schema Change Performance Impact

For this test I started the alter table, waited 60 seconds and then ran sysbench at full speed for five minutes. Then I measured how much the performance was impacted by running the tool:

sysbench --threads=40 --rate=0 --report-interval=1 --percentile=99 --events=0 --time=300 --db-ps-mode=auto --mysql-user=sbtest --mysql-password=sbtest  /usr/share/sysbench/oltp_read_write.lua --table_size=10000000 run

As we can see, gh-ost has negligible overhead in this case. pt-online-schema-change on the other hand, had peformance reduced by 12%. It is worth noting though that pt-online-schema-change still makes progress in this case (though slowly), while gh-ost would never complete.

If anything, I was surprised at how little impact the pt-online-schema-change run had on sysbench performance.

It’s important to note that in this case we only measured the overhead for the “copy” stage of the online schema change. Another thing you should worry about is the impact to performance during “table rotation” (which I have not measured).

Summary

While gh-ost introduces a number of design advantages, and gives better results in some situation, I wouldn’t call it always superior the tried and true pt-online-schema-change. At least in some cases, pt-online-schema-change offers better performance than gh-ost and completes a schema change when gh-ost is unable to keep up. Consider trying out both tools and see what works best in your situation.

Thread_Statistics and High Memory Usage

July 11, 2017 - 1:15pm

In this blog post, we’ll look at how using thread_statistics can cause high memory usage.

I was recently working on a high memory usage issue for one of our clients, and made some interesting discoveries: high memory usage with no bounds. It was really tricky to diagnose.

Below, I am going to show you how to identify that having thread_statistics enabled causes high memory usage on busy systems with many threads.

Part 1: Issue Background

I had a server with 55.0G of available memory. Percona Server for MySQL version:

Version | 5.6.35-80.0-log Percona Server (GPL), Release 80.0, Revision f113994f31 Built On | debian-linux-gnu x86_64

We have calculated approximately how much memory MySQL can use in a worst case scenario for max_connections=250:

mysql> select ((@@key_buffer_size+@@innodb_buffer_pool_size+@@innodb_log_buffer_size+@@innodb_additional_mem_pool_size+@@net_buffer_length+@@query_cache_size)/1024/1024/1024)+((@@sort_buffer_size+@@myisam_sort_buffer_size+@@read_buffer_size+@@join_buffer_size+@@read_rnd_buffer_size+@@thread_stack)/1024/1024/1024*250); +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | ((@@key_buffer_size+@@innodb_buffer_pool_size+@@innodb_log_buffer_size+@@innodb_additional_mem_pool_size+@@net_buffer_length+@@query_cache_size)/1024/1024/1024)+((@@sort_buffer_size+@@myisam_sort_buffer_size+@@read_buffer_size+@@join_buffer_size+@@read_rnd | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 12.445816040039 | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

So in our case, this shouldn’t be more than ~12.5G.

After MySQL Server has been restarted, it allocated about 7G. After running for a week, it reached 44G:

# Memory ##################################################### Total | 55.0G Free | 8.2G Used | physical = 46.8G, swap allocated = 0.0, swap used = 0.0, virtual = 46.8G Shared | 560.0k Buffers | 206.5M Caches | 1.0G Dirty | 7392 kB UsedRSS | 45.3G Swappiness | 60 DirtyPolicy | 20, 10 DirtyStatus | 0, 0

# Top Processes ############################################## PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2074 mysql 20 0 53.091g 0.044t 8952 S 126.0 81.7 34919:49 mysqld

I checked everything that could be related to the high memory usage (for example, operating system settings such as Transparent Huge Pages (THP), etc.). But I still didn’t find the cause (THP was disabled on the server). I asked my teammates if they had any ideas.

Part 2: Team Is on Rescue

After brainstorming and reviewing the status, metrics and profiles again and again, my colleague (Yves Trudeau) pointed out that User Statistics is enabled on the server.

User Statistics adds several INFORMATION_SCHEMA tables, several commands, and the userstat variable. The tables and commands can be used to better understand different server activity, and to identify the different load sources. Check out the documentation for more information.

mysql> show global variables like 'user%'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | userstat | ON | +---------------+-------+ mysql> show global variables like 'thread_statistics%'; +-------------------------------+---------------------------+ | Variable_name | Value | +-------------------------------+---------------------------+ | thread_statistics | ON | +-------------------------------+---------------------------+

Since we saw many threads running, it was a good option to verify this as the cause of the issue.

Part 3: Cause Verification – Did It Really Eat Our Memory?

I decided to apply some calculations, and the following test cases to verify the cause:

  1. Looking at the THREAD_STATISTICS table in the INFORMATION_SCHEMA, we can see that for each connection there is a row like the following:
    mysql> select * from THREAD_STATISTICS limit 1G *************************** 1. row *************************** THREAD_ID: 3566 TOTAL_CONNECTIONS: 1 CONCURRENT_CONNECTIONS: 0 CONNECTED_TIME: 30 BUSY_TIME: 0 CPU_TIME: 0 BYTES_RECEIVED: 495 BYTES_SENT: 0 BINLOG_BYTES_WRITTEN: 0 ROWS_FETCHED: 27 ROWS_UPDATED: 0 TABLE_ROWS_READ: 0 SELECT_COMMANDS: 11 UPDATE_COMMANDS: 0 OTHER_COMMANDS: 0 COMMIT_TRANSACTIONS: 0 ROLLBACK_TRANSACTIONS: 0 DENIED_CONNECTIONS: 0 LOST_CONNECTIONS: 0 ACCESS_DENIED: 0 EMPTY_QUERIES: 0 TOTAL_SSL_CONNECTIONS: 1 1 row in set (0,00 sec)
  2. We have 22 columns, each of them BIGINT, which gives us ~ 176 bytes per row.
  3. Let’s calculate how many rows we have in this table at this time, and check once again in an hour:
    mysql> select count(*) from information_schema.thread_statistics; +----------+ | count(*) | +----------+ | 7864343 | +----------+ 1 row in set (15.35 sec)

    In an hour:

    mysql> select count(*) from information_schema.thread_statistics; +----------+ | count(*) | +----------+ | 12190801 | +----------+ 1 row in set (24.46 sec)

  4. Now let’s check on how much memory is currently in use:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2096 mysql 20 0 12.164g 0.010t 16036 S 173.4 18.9 2274:51 mysqld

  5. We have 12190801 rows in the THREAD_STATISTICS table, which is ~2G in size.
  6. Issuing the following statement cleans up the statistics:

    mysql> flush thread_statistics; mysql> select count(*) from information_schema.thread_statistics; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (00.00 sec)

  7. Now, let’s check again on how much memory is in use:

    ID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2096 mysql 20 0 12.164g 7.855g 16036 S 99.7 14.3 2286:24 mysqld

As we can see, memory usage drops to the approximate value of 2G that we had calculated earlier!

That was the root cause of the high memory usage in this case.

Conclusion

User Statistics (basically Thread_Statistics) is a great feature that allows us to identify load sources and better understand server activity. At the same time, though, it can be dangerous (from the memory usage point of view) to use as a permanent monitoring solution due to no limitations on memory usage.

As a reminder, thread_statistics is NOT enabled by default when you enable User_Statistics. If you have enabled Thread_Statistics for monitoring purposes, please don’t forget to pay attention to it.

As a next step, we are considering submitting a feature request to implement some default limits that can prevent Out of Memory issues on busy systems.

Webinar Wednesday July 12, 2017: MongoDB Index Types – How, When and Where Should They Be Used?

July 11, 2017 - 8:32am

Join Percona’s Senior Technical Services Engineer, Adamo Tonete as he presents MongoDB Index Types: How, When and Where Should They Be Used? on Wednesday, July 12, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7).

Register Now

MongoDB has 12 index types. Do you know how each works, or when you should use each of them? This talk will arm you with this knowledge, and help you understand how indexes impact performance, storage and even the sharding of your data. We will also discuss some solid index operational practices, as well as some settings for things like TTL you might not know exist. The contents of this webinar will make you a Rock Star!

Register for the webinar here.

Adamo Tonete, Senior Technical Services Engineer

Adamo joined Percona in 2015, after working as a MongoDB/MySQL database administrator for three years. As the main database admin of a startup, he was responsible for suggesting the best architecture and data flows for a worldwide company in a 24/7 environment. Before that, he worked as a Microsoft SQL Server DBA for a large e-commerce company, working mainly on performance tuning and automation. Adamo has almost eight years of experience working as a DBA, and in the past three years he has moved to NoSQL technologies (without giving up relational databases).

Percona Server for MongoDB 3.4.5-1.5 is Now Available

July 10, 2017 - 11:37am

Percona announces the release of Percona Server for MongoDB 3.4.5-1.5 on July 10, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.5 and includes the following additional changes:

New Features Bugs Fixed
  • #PSMDB-67: Fixed mongod service status messages.

Webinar Tuesday July 11, 2017: Securing Your MySQL/MariaDB Data

July 10, 2017 - 9:51am

Join Percona’s Chief Evangelist, Colin Charles as he presents Securing Your MySQL/MariaDB Data on Tuesday, July 11, 2017 at 7:00 am PDT / 10:00 am EDT (UTC-7).

Register Now

This webinar will discuss the features of MySQL/MariaDB that when enabled and used improve the default usage of MySQL. Many cloud-based applications fail to:

  • Use appropriate filesystem permissions
  • Employ TLS/SSL for connections
  • Require TLS/SSL with MySQL replication
  • Use external authentication plugins (LDAP, PAM, Kerberos)
  • Encrypt all your data at rest
  • Monitor your database with the audit plugin
  • Review and rejecting SQL injections
  • Design application access using traditional firewall technology
  • Employ other MySQL/MariaDB security features

This webinar will demonstrate and advise on how to correctly implement the features above. We will end the presentation with some simple steps on how to hack a MySQL installation.

You can register for the webinar here.

Colin Charles, Percona Chief Evangelist

Colin Charles is the Chief Evangelist at Percona. He was previously on the founding team of MariaDB Server in 2009, worked at MySQL since 2005 and been a MySQL user since 2000. Before joining MySQL, he worked actively on the Fedora and OpenOffice.org projects. He’s well known within open source communities in APAC, and has spoken at many conferences.

MongoDB Indexing Types: How, When and Where Should They Be Used?

July 8, 2017 - 3:18pm

In this blog post, we will see how to do MongoDB indexing, as well as the different types of indexes that are available in MongoDB.

Note: We are hosting a webinar on July 12, 2017, where I will talk about MongoDB indexes and how to choose a good indexing plan.

MongoDB is a NoSQL database that is document-oriented. NoSQL databases share many features with relational databases, and one of them is indexes. The question is how are such documents indexed in the database?

Remember that because MongoDB is a database that writes JSONs, there is no predefined schema in the document. The same field can be a string or an integer – depending on the document.

MongoDB, as well as other databases, use B-trees to index. With some exceptions, the algorithm is the same as a relational database.

The B-tree can use integers and strings together to organize data. The most important thing to know is that an index-less database must read all the documents to filter what you want, while an indexed database can – through indexes – find the documents quickly.
Imagine you are looking for a book in a disorganized library. This is how the query optimizer feels when we are looking for something that is not indexed.

There are several different types of indexes available: single field, compound Indexes, hashed indexes, geoIndexes, unique, sparse, filtered, text… and so on. All of them help the database in some way, although they obviously also get in the way of write performance if used too much.

  • Single fields. Single fields are simple indexes that index only one field in a collection. MongoDB can only use one index per query – remember that $or actually executes more than one query at a time. Therefore $or and $in can use more than one index.
  • Compound indexes. Compound indexes are indexes that have more than one indexed field, so ideally the most restrictive field should be to the left of the B-tree. If you want to index by sex and birth, for instance, the index should begin by birth as it is much more restrictive than sex.
  • Hashed indexes. Shards use hashed indexes, and create a hash according to the field value to spread the writes across the sharded instances.
  • GeoIndexes. GeoIndexes are a special index type that allows a search based on location, distance from a point and many other different features.
  • Unique indexes. Unique indexes work as in relational databases. They guarantee that the value doesn’t repeat and raise an error when we try to insert a duplicated value. Unique doesn’t work across shards.
  • Text indexes. Text indexes can work better with indexes than a single indexed field. There are different flags we can use, like giving weights to control the results or using different collections.
  • Sparse/Filtered indexes. Sparse and filtered indexes seem very similar. However, sparse indexes will index only an existing field and not check its value, while filtered indexes will apply a filter (like greater than) to a field to index. This means the filtered index doesn’t index all the documents with the existing field, but only documents that match the create index filter.

We will discuss and talk more about indexes and how they work in my webinar MongoDB® Index Types: How, When and Where Should They Be Used? If you have questions, feel free to ask them in the comments below and I will try to answer all of them in the webinar (or in a complementary post).

I hope this blog post was useful, please feel free to reach out me on twitter @AdamoTonete or @percona.

Last Resort: How to Use a Backup to Start a Secondary Instance for MongoDB?

July 7, 2017 - 11:56am

In this blog post, I’ll look at how you can use a backup to start a secondary instance for MongoDB.

Although the documentation says it is not possible to use a backup to start a secondary, sometimes this is the only possible way to start a new instance. In this blog post, we will explain how to bypass this limitation and use a backup to start a secondary instance.

The initial sync/rsync or snapshot works fine when the instances are in the same data center, but it can fail or be really slow. Much slower than moving a compressed backup between data centers.

Not every backup can be used as a source for starting a replica set. The backup must have the oplog file. This means the backup must be done in a previously existent replica set using the --oplog flag point in time backup when dumping the collections. The time spent to move and restore the backup file must be less than the oplog window.

Please follow the next steps to create a secondary from a backup:

  1. Create a backup using the --oplog command.
    mongodump --oplog -o /tmp/backup/
  2. Backup the replica set collection from the local database.

    mongodump -d local -c system.replset
  3. After backup finishes please confirm the oplog.rs file is in the backup folder.
  4. Use bsondump to convert the oplog to JSON. We will use the last oplog entry as a starting point for the replica set.
    bsondump oplog.rs > oplog.rs.txt
  5. Initiate the new instance without the replica set parameter configured. At this point, the instance will act as a single instance.
  6. Restore the database normally using --oplogreplay to apply all the oplogs that have been recorded while the backup was running.
    mongorestore /tmp/backup --oplogReplay
  7. Connect to this server and use the local database to create the oplog.rs collection. Please use the same value as the other members (e.g., 20 GB).
    mongo use local db.runCommand( { create: "oplog.rs", capped: true, size: (20 * 1024 * 1024 * 1024) } )
  8. From the oplog.rs.txt generated in step 4, get the last line and copy the fields ts and h values to a MongoDB document.

    tail -n 1 opslog.rs.txt
  9. Insert the JSON value to the oplog.rs collection that was created before.
    mongo use local db.oplog.rs.insert({ ts: Timestamp(12354454,1), h: NumberLong("-3434387384732843")})
  10. Restore the replset collection to the local database.
    mongorestore -d local -c system.replset ./backup/repliset.bson
  11. Stop the service and edit the parameter replica set name to match the existing replica set.
  12. Connect to the primary and add this new host. The new host must start catching up the oplog and get in sync after a few hours/minutes, depending on the number of operations the replica set handles. It is important to consider adding this new secondary as a hidden secondary, without votes if possible, to avoid triggering an election. When the secondary is added to the replica set drivers, it will start using this host to perform reads. If you don’t add the server with hidden: true, the application will read inconsistent data (old data).

    mongo PRIMARY>rs.add({_id : <your id>, host: "<newhost:27017>", hidden : true, votes : 0, priority :0})
  13. Please check the replication lag, and once the seconds behind master is near to zero, change the host parameters in the replica set to hidden: false and priority or votes.
    mongo PRIMARY> rs.printSlaveReplicationInfo()
  14. We are considering a replica set with three members, where the new secondary has the ID 2 in the member’s array. Use the following command to unhide the secondary and make it available for reads. The priority and votes depend on your environment. Please notice you might need to change the member ID.
    mongo PRIMARY> cfg = rs.config() cfg.members[2].hidden = false cfg.members[2].votes = 1 cfg.members[2].priority = 1 PRIMARY> rs.reconfig(rs)

I hope this tutorial helps in an emergency situation. Please consider using initial sync, disk snapshots and hot backups before using this method.

Feel free to reach out me on twitter @AdamoTonete or @percona.

ClickHouse: One Year!

July 6, 2017 - 11:17am

In this blog, we’ll look at ClickHouse on its one year anniversary.

It’s been a year already since the Yandex team released ClickHouse as open source software. I’ve had an interest in this project from the very start, as I didn’t think there was an open source analytical database that could compete with industry leaders like Vertica (for example).

This was an exciting year for ClickHouse early adopters. Let’s look at what it accomplished so far.

ClickHouse initially generated interest due to the Yandex name – the most popular search engine in Russia. It wasn’t long before jaw-dropping responses popped up: guys, this thing is crazy fast! Many early adopters who tried ClickHouse were really impressed.

Fast doesn’t mean convenient though. That was the main community concern to ClickHouse over the past months. Developed as an internal project for an internal customer (Yandex.Metrica), ClickHouse had a lot of limitations for general community use. It took several months before Yandex could restructure the team and mindset, establish proper communication with the community and start addressing external needs. There are still a lot of things that need to be done. The public roadmap is not easily available, for example, but the wheels are pointed in the right direction. The ClickHouse team has added a lot of the features people were screaming for, and more are in ClickHouse’s future plans.

The Yandex guys are actively attending international conferences, and they were:

They are speaking much more in Russia (no big surprise).

We were very excited by Yandex’s ClickHouse performance claims at Percona, and could not resist making our own benchmarks:

ClickHouse did very well in these benchmarks. There are many other benchmarks by other groups as well, including a benchmark against Amazon RedShift by Altinity.

The first ClickHouse production deployments outside of Yandex started in October-November 2016. Today, Yandex reports that dozens of companies around the world are using ClickHouse in production, with the biggest installations operating with up to several petabytes of data. Hundreds of other enterprises are deploying pilot installations or actively evaluating the software.

There are also interesting reports from CloudFare (How Cloudflare analyzes 1M DNS queries per second) and from Carto (Geospatial processing with ClickHouse).

There are also various community projects around ClickHouse worth mentioning:

Percona is also working to adapt ClickHouse to our projects. We are using ClickHouse to handle Query Analytics and as a long term metrics data for Metrics inside a new version (under development) of Percona Monitoring and Management.

I also will be speaking about ClickHouse at BIG DATA DAY LA 2017 on August 5th, 2017. You are welcome to attend if you are in Los Angeles this day!

ClickHouse has the potential to become one of the leading open source analytical DBMSs – much like MySQL and PostreSQL are leaders for OLTP workloads. We will see in the next couple of years if it happens or not. Congratulations to the Yandex team on their one-year milestone!

Differences in PREPARE Statement Error Handling with Binary and Text Protocol (Percona XtraDB Cluster / Galera)

July 5, 2017 - 11:22am

In this blog, we’ll look at the differences in how a PREPARE statement handles errors in binary and text protocols.

Introduction

Since Percona XtraDB Cluster is a multi-master solution, when an application executes conflicting workloads one of the workloads gets rolled back with a DEADLOCK error. While the same holds true even if you fire the workload through a PREPARE statement, there are differences between using the MySQL connector API (with binary protocol) and the MySQL client (with text protocol). Let’s look at these differences with the help of an example.

Base Workload
  • Say we have a two-node cluster (n1 and n2) with the following base schema and tables:
    use test; create table t (i int, k int, primary key pk(i)) engine=innodb; insert into t values (1, 10), (2, 20), (3, 30); select * from t;
  • Workload (w1) on node-1 (n1):
    prepare st1 from 'update t set k = k + 100 where i > 1'; execute st1;
  • Workload (w2) on node-2 (n2):
    prepare st1 from 'update t set k = k + 100 where i > 1'; execute st1;
  • The workloads are conflicting, and the first node to commit wins. Let’s assume n1 commits first, and the write-set is replicated to n2 while n2 is busy executing the local workload.
Different Scenarios Based on Configuration wsrep_retry_autocommit > 0
  • n2 tries to apply the replicated workload, thereby forcefully aborting (brute-force abort) the local workload (w2).
  • The forcefully aborted workload is retried (based on wsrep_retry_autocommit value (default is 1)).
  • w2 succeeds on retry, and the new state of table t would be:
    (1, 10), (2, 220), (3, 230);
  • The user can increase wsrep_retry_autocommit to a higher value, though we don’t recommend this as a higher value increases pressure on the system to retry failed workloads. A retry doesn’t ensure things always pass if additional conflicting replicated workload are lined up in the certification queue. Also, it is important to note that a retry works only if AUTO_COMMIT=ON. For a begin-commit transaction, retry is not applicable. If we abort this transaction, then the complete transaction is rolled back. The workload executor application should have the logic to expect workload failure due to conflict and handle it accordingly.
wsrep_retry_autocommit = 0
  • Let’s say the user sets wsrep_retry_autocommit=0 on node-2 and executes the same workload.
  • This time it doesn’t retry the local workload (w2). Instead, it sends an error to the end-user:
    mysql> execute st1; ERROR 1213 (40001): WSREP detected deadlock/conflict and aborted the transaction. Try restarting the transaction mysql> select * from t; +---+------+ | i | k | +---+------+ | 1 | 10 | | 2 | 120 | | 3 | 130 | +---+------+ 3 rows in set (0.00 sec)
  • Only the replicated workload (w1) is applied, and local workload w2 fails with a DEADLOCK error.
Use of MySQL Connector API

Now let’s try to execute these workloads through Connector API. Connector API uses a binary protocol, which means it directly invokes the PREPARE statement and executes the statement using dedicated command codes (COM_STMT_PREPARE and COM_STMT_EXECUTE).

While the conflicting transaction internal abort remains same, the COM_QUERY command code (text protocol) handling corrects the “query interrupted error” and resets it to “deadlock error” based on the wsrep_conflict_state value. It also has logic to retry the auto-commit enable statement.

This error handling and retry logic is not present when the COM_STMT_PREPARE and COM_STMT_EXECUTE command codes are used. This means the user sees a different error (non-masked error) when we try the same workload through an application using the MySQL Connector API:

Output of application from node-2: Statement init OK! Statement prepare OK! Statement execution failed: Query execution was interrupted exact error-code to be specific: ER_QUERY_INTERRUPTED - 1317 - "Query execution was interrupted"

As you can see above, the statement/workload execution fails with “Query execution was interrupted” error vs. “deadlock” error (as seen with the normal MySQL client).

Conclusion

While the core execution remains same, error handling is different with text and binary protocol. This issue/limitation was always present, but it’s more prominent since sysbench-1.0 started using the PREPARE statement as a default.

What does this means to end-user/application?

  • Nothing has changed from the Percona XtraDB Cluster perspective, except that if your application is planning to use a binary protocol (like Connector API), then the application should able to handle both the errors and retry the failed transaction accordingly.
Visit Percona Store


General Inquiries

For general inquiries, please send us your question and someone will contact you.