Buy Percona ServicesBuy Now!
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 10 min 13 sec ago

Saturation Metrics in PMM 1.2.0

August 4, 2017 - 6:23am

One of the new graphs added to Percona Monitoring and Management (PMM) is saturation metrics. This blog post explains how to use the information provided by these graphs.

You might have heard about Brendan Gregg’s USE Method  (Utilization-Saturation-Errors) as a way to analyze the performance of any system. Our goal in PMM is to support this method fully over time, and these graphs take us one step forward.

When it comes to utilization, there are many graphs available in PMM. There is the CPU Usage graph:

There is also Disk IO Utilization:

And there is Network Traffic:

If you would like to look at saturation type metrics, there is classical the Load Average graph:

While Load Average is helpful for understanding system saturation in general, it does not really distinguish whether it is the CPU or Disk that is saturated. Load Average, as the name says, is already averaged — so we can’t really observe short saturation spikes with Load Average. It is averaged for at least one minute. Finally, the problem with Load Average is it does not keep the number of CPU cores/threads into account. Suppose I have a CPU-bound Load Average of 16, for example. That is quite a load and will cause high saturation and queueing if you have two CPU threads. But if you have 64 threads, then 16 becomes a trivial load with no saturation at all.

Let’s take a look at the Saturation Metrics graph:

It provides us two metrics: one showing the CPU load and another is showing the IO load.These values roughly correspond to  the “r” and “b” columns in VMSTAT output:

These are sampled every second and then averaged over the reporting interval.

We also normalize the CPU load by dividing the raw number of runnable processes by a number of threads available. “Rocky” has 56 threads, which is why the normalized CPU load is about one even though the number of runnable processes shown by VMSTAT is around 50.

We do not normalize the IO load, as systems can have multiple IO devices and a number of requests they can handle in parallel is largely unknown. If you want to understand specific IO device performance, you should check out the Disk Performance Dashboard.

Testing Saturation Metrics in Practice

Let’s see if saturation graphs indeed show us when CPU saturation is the issue. I will use a sysbench CPU test for illustration, run as:

sysbench cpu  --cpu-max-prime=100000 --threads=1 --time=60 run

This will use the said number of threads to execute compute jobs, each of which will compute the said number of prime numbers. If we have enough CPU resources available, with no saturation, the latency of executing such requests should be about the same. When we overload the system, so there are not enough CPU execution units to process everything in the parallel, the average latency should increase.   

root@ts140i:/mnt/data# sysbench cpu  --cpu-max-prime=100000 --threads=1 --time=300 run sysbench 1.0.7 (using bundled LuaJIT 2.1.0-beta2) Running the test with following options: Number of threads: 1 Initializing random number generator from current time Prime numbers limit: 100000 Initializing worker threads... Threads started! General statistics:    total time:                          300.0234s    total number of events:              12784 Latency (ms):         min:                                 23.39         avg:                                 23.47         max:                                 28.07         95th percentile:                     23.52         sum:                             300018.06

As we can see with one thread working, the average time it takes to handle a single request is 23ms. Obviously, there is no saturation happening in this case:

“Ts140i” has four CPU cores, and as you can see the Normalized CPU load stays below one. You may wonder why isn’t it closer to 0.25 in this case, with one active thread and four cores available? The reason is at exactly the time when the metrics are being captured, there often happen to be an additional two to three threads active to facilitate the process. They are only active for a very few milliseconds at the time, so they do not produce much load — but they tend to skew the number a little bit.

Let’s now run with four threads. The number of threads matches the number of CPU cores available (and it is true cores in this case, no hyperthreading). In this case, don’t expect too much increase in the event processing time.

root@ts140i:/mnt/data# sysbench cpu  --cpu-max-prime=100000 --threads=4 --time=300 run sysbench 1.0.7 (using bundled LuaJIT 2.1.0-beta2) Running the test with following options: Number of threads: 4 Initializing random number generator from current time Prime numbers limit: 100000 Initializing worker threads... Threads started! General statistics:    total time:                          300.0215s    total number of events:              48285 Latency (ms):         min:                                 24.19         avg:                                 24.85         max:                                 43.61         95th percentile:                     24.83         sum:                            1200033.93

As you see test confirms the theory – we have avg latency increased just by about 6%  with Normalized CPU load in saturation metrics mostly hovering between 1 and 2:

Let’s now do the test with 16 threads, which is four times more than available CPU cores. We should see the latency dramatically increase due to CPU overload (or saturation). The same will happen to your CPU bound MySQL queries if you have more concurrency than CPUs available.

root@ts140i:/mnt/data# sysbench cpu  --cpu-max-prime=100000 --threads=16 --time=300 run sysbench 1.0.7 (using bundled LuaJIT 2.1.0-beta2) Running the test with following options: Number of threads: 16 Initializing random number generator from current time Prime numbers limit: 100000 Initializing worker threads... Threads started! General statistics:    total time:                          300.0570s    total number of events:              48269 Latency (ms):         min:                                 27.83         avg:                                 99.44         max:                                189.05         95th percentile:                    121.08         sum:                            4799856.52

We can see it takes about four times longer to process each request due to CPU overload and queueing. Let’s see what saturation metrics tell us:

As you can see, Normalized CPU Load floats between four and five on the graph, consistent with saturation we’re observing.

You may ask does the CPU utilization graph help us here? Not really. You will see 100% CPU usage for both the run with four threads and 16 threads, while request latencies are completely different.   

Summary

As we can see from our test, Normalized CPU Load is very helpful for understanding when the CPU is overloaded. An overloaded CPU causes response times to increase and performance to degrade. Furthermore, you can use it to (roughly) see how serious the overload is. As a rule of thumb, if you see Normalized CPU saturation over two, it indicates your CPUs are overloaded.

Percona Live Europe 2017 Sneak Peek Schedule Up Now! See Available Sessions!

August 3, 2017 - 7:42am

We are excited to announce that the sneak peek schedule for the Percona Live Open Source Database Conference Europe 2017 is up! The Percona Live Open Source Database Conference Europe 2017 is September 25 – 27, at the Radisson Blu Royal Hotel.

The theme of Percona Live Europe 2017 is Championing Open Source Databases, with sessions on MySQL, MariaDB, MongoDB and other open source database technologies, including time series databases, PostgreSQL and RocksDB. This year’s conference will feature one day of tutorials and two days of keynote talks and breakout sessions related to open source databases and software. Tackling subjects such as analytics, architecture and design, security, operations, scalability and performance, Percona Live Europe provides in-depth discussions for your high-availability, IoT, cloud, big data and other changing business needs.

Below are some of our top picks for MySQL, MongoDB and open source database sessions:

Tutorials Breakout Talks

MySQL:

MongoDB:

Other Open Source Database Topics:

Registration Prices Increase August 9, 2017 – Get Tickets Now for the Best Price!

Just a reminder to everyone out there that the Early Bird discount rate for the Percona Live Open Source Database Conference Europe 2017 ends August 8! The price increases as of August 9, so buy now. The Early Bird rate gets you all the excellent and amazing opportunities that Percona Live Europe offers, at a very reasonable price! Get your tickets as soon as possible for the best price.

Percona Live Europe 2017 Open Source Database Conference will be held at the Radisson Blu Royal Hotel, at Golden Lane 8, Dublin, Ireland.

The Radisson Blu Royal Hotel is a prime location in the heart of Dublin. Enjoy this spacious venue with complementary WiFi, expert on-site staff and three great restaurants offering a wide variety of meals. Staying for a couple extra days? Take time to enjoy the different tourist attractions, like traditional beer pubs and XII century castles, located minutes away.

A special hotel rate of EUR 250.00 is available for Percona Live Europe 2017 until August 14, 2017.

You can reserve a room by booking through the Radisson Blu’s reservation site.

  1. Click BOOK NOW at the top right.
  2. Enter your preferred check-in and check-out dates, and how many rooms.
  3. From the drop-down “Select Rate Type,” choose Promotional Code.
  4. Enter the code PERCON to get the discount

This special deal includes breakfast each morning! The group rate only applies if used within the Percona Live Europe group block dates (September 25-27, 2017).

Sponsor Percona Live

Become a conference sponsor! We have sponsorship opportunities available for this annual MySQL, MongoDB and open source database event. Sponsors become a part of a dynamic and growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who attend the event.

Percona Toolkit 3.0.4 is Now Available

August 2, 2017 - 9:44am

Percona announces the release of Percona Toolkit 3.0.4 on August 2, 2017.

Percona Toolkit is a collection of advanced command-line tools that perform a variety of MySQL and MongoDB server and system tasks too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

You download Percona Toolkit packages from the web site or install from official repositories.

This release includes the following changes:

New Features

  • PT-90: Added collection of information about prepared statements by pt-stalk when Performance Schema is enabled. For more information, see #1642750.
  • PT-91: Added the --preserve-triggers option for pt-online-schema-change to support AFTER triggers.
  • PT-138: Added --output-format option for pt-mongodb-summary to choose between JSON format and the default plain text.
  • PT-141: Added the --output-format=csv parameter for pt-archiver to archive rows in CSV format.
  • PT-142: Added the --only-same-schema-fks option for pt-online-schema-change to check foreigns keys only on tables with the same schema as the original table. This should speed up the tool’s execution, but keep in mind that if you have foreign keys referencing tables in other schemas, they won’t be detected. For more information, see #1690122.
  • PT-153: Added the --check-unique-key-change option for pt-online-schema-change to abort if the specified statement for --alter is trying to add a unique index. This is supposed to avoid adding duplicate keys that might lead to silently losing data.
  • PT-173: Added the --truncate-replicate-table option for pt-table-checksum to ensure stale data is removed.

Bug fixes

  • PT-136: Fixed pt-table-checksum to support tables that have columns with different collations or charsets. For more information, see #1674266.
  • PT-143: Fixed primary key handling by pt-archiver. For more information, see #1691630.
  • PT-144: Limited constraint name in the new table when running pt-online-schema-change. For more information, see #1491674.
  • PT-146: Fixed the --no-check-binlog-format option for pt-table-checksum to work as expected.
  • PT-148: Fixed the use of uninitialized value in printf() for pt-online-schema-change. For more information, see #1693614.
  • PT-151: Fixed pt-table-sync to prevent field type point to be taken as decimal.
  • PT-154: Reverted PT-116 to remove the --use-insert-ignore option from pt-online-schema-change.
  • PT-161: Fixed the --skip-check-slave-lag feature for pt-table-checksum to safely check for undefined values.
  • PT-178: Fixed regression in --check-slave-lag option for pt-online-schema-change.
  • PT-180: Fixed regression in --skip-check-slave-lag option for pt-online-schema-change.
  • PT-181: Fixed syntax error in pt-online-schema-change.

Other Improvements

  • PT-162: Updated list of tables ignored by pt-table-checksum.

You can find release details in the release notes. Report bugs in Toolkit’s launchpad bug tracker.

Percona Server for MongoDB 3.4.6-1.7 is Now Available

August 2, 2017 - 6:58am

Percona announces the release of Percona Server for MongoDB 3.4.6-1.7 on August 2, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

NOTE: Red Hat Enterprise Linux 5 (including CentOS 5 and other derivatives), Ubuntu 12.04 and older versions are no longer supported by Percona software.

This release is based on MongoDB 3.4.6 and includes the following additional bug fix:

  • #PSMDB-155: Fixed mongod startup on NUMA systems.

Percona Server for MySQL 5.6.36-82.1 is Now Available

August 1, 2017 - 5:41pm

Percona is glad to announce the release of Percona Server for MySQL 5.6.36-82.1 on August 1, 2017 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.6.36, including all the bug fixes in it, Percona Server for MySQL 5.6.34-79.1 is the current GA release in the Percona Server for MySQL 5.6 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.6.36-82.1 milestone at Launchpad.

Please note that RHEL 5, CentOS 5 and Ubuntu versions 12.04 and older are not supported in future releases of Percona Server for MySQL and no further packages are added for these distributions.

New Features

  • Percona Server for MySQL can now be built with support of OpenSSL 1.1.
  • Percona Server for MySQL is now available on Debian 9 (stretch). The support only covers the amd64 architecture.
  • TokuDB enables to kill a query that is awaiting an FT locktree lock.
Bugs Fixed
  • Row counts in TokuDB could be lost intermittently after restarts. Bug fixed #2.
  • In TokuDB, two races in the fractal tree lock manager could significantly affect transactional throughput for some applications that used a small number of concurrent transactions. These races manifested as transactions unnecessarily waiting for an available lock. Bug fixed #3.
  • TokuDB could assert when opening a dictionary with no useful information to error log. Bug fixed #23.
  • TokuDB could assert for various reasons deserializing nodes with no useful error output. Bug fixed #24.
  • Percona Server could crash when running a query over a partitioned table that uses an index to read a range of rows if this range was not covered by any existing partition. Bug fixed #1657941 (upstream #76418).
  • With two client connections to a server (debug server build), the server could crash after one of the clients set the global option userstat and flushed the client statistics (FLUSH CLIENT_STATISTICS) and then both clients were closed. Bug fixed #1661488.
  • TokuDB did not pass cmake flags on to snappy cmake. Bug fixed #41. The progress status for partitioned TokuDB table ALTERs was misleading. Bug fixed #42.
  • When a client application is connecting to the Aurora cluster end point using SSL (--ssl-verify-server-cert or --ssl-mode=VERIFY_IDENTITY option), wildcard and SAN enabled SSL certificates were ignored. See also Compatibility Matrix. Note that the --ssl-verify-server-cert option is deprecated in Percona Server 5.7. Bug fixed #1673656 (upstream #68052).
  • Killing a stored procedure execution could result in an assert failure on a debug server build. Bug fixed #1689736 (upstream #86260).
  • It was not possible to build Percona Server on Debian 9 (stretch) due to issues with OpenSSL 1.1. Bug fixed #1702903 (upstream #83814).
  • The SET STATEMENT .. FOR statement changed the global instead of the session value of a variable if the statement occurred immediately after the SET GLOBAL or SHOW GLOBAL STATUS command. Bug fixed #1385352.
  • The synchronization between the LRU manager and page cleaner threads was not done at shutdown. Bug fixed #1689552.

Other bugs fixed: #6#44#65#1160986#1676740#1689989#1689998#1690012#1699788, and #1684601 (upstream #86016).

Compatibility Matrix Feature YaSSL OpenSSL < 1.0.2 OpenSSL >= 1.0.2 ‘commonName’ validation Yes Yes Yes SAN validation No Yes Yes Wildcards support No No Yes

Group Replication: the Sweet and the Sour

August 1, 2017 - 8:54am

In this blog, we’ll look at group replication and how it deals with flow control (FC) and replication lag. 

Overview

In the last few months, we had two main actors in the MySQL ecosystem: ProxySQL and Group-Replication (with the evolution to InnoDB Cluster). 

While I have extensively covered the first, my last serious work on Group Replication dates back to some lab version years past.

Given that Oracle decided to declare it GA, and Percona’s decision to provide some level of Group Replication support, I decided it was time for me to take a look at it again.

We’ve seen a lot of coverage already too many Group Replication topics. There are articles about Group Replication and performance, Group Replication and basic functionalities (or lack of it like automatic node provisioning), Group Replication and ProxySQL, and so on.

But one question kept coming up over and over in my mind. If Group Replication and InnoDB Cluster have to work as an alternative to other (virtually) synchronous replication mechanisms, what changes do our customers need to consider if they want to move from one to the other?

Solutions using Galera (like Percona XtraDB Cluster) must take into account a central concept: clusters are data-centric. What matters is the data and the data state. Both must be the same on each node at any given time (commit/apply). To guarantee this, Percona XtraDB Cluster (and other solutions) use a set of data validation and Flow Control processes that work to the ensure a consistent cluster data set on each node.

The upshot of this principle is that an application can query ANY node in a Percona XtraDB Cluster and get the same data, or write to ANY node and know that the data is visible everywhere in the cluster at (virtually) the same time.

Last but not least, inconsistent nodes should be excluded and either rebuild or fixed before rejoining the cluster.

If you think about it, this is very useful. Guaranteeing consistency across nodes allows you to transparently split write/read operations, failover from one node to another with very few issues, and more.

When I conceived of this blog on Group Replication (or InnoDB Cluster), I put myself in the customer shoes. I asked myself: “Aside from all the other things we know (see above), what is the real impact of moving from Percona XtraDB Cluster to Group Replication/InnoDB Cluster for my application? Since Group Replication still (basically) uses replication with binlogs and relaylog, is there also a Flow Control mechanism?” An alarm bell started to ring in my mind.

My answer is: “Let’s do a proof of concept (PoC), and see what is really going on.”

The POC

I setup a simple set of servers using Group Replication with a very basic application performing writes on a single writer node, and (eventually) reads on the other nodes. 

You can find the schema definition here. Mainly I used the four tables from my windmills test suite — nothing special or specifically designed for Group Replication. I’ve used this test a lot for Percona XtraDB Cluster in the past, so was a perfect fit.

Test Definition

The application will do very simple work, and I wanted to test four main cases:

  1. One thread performing one insert at each transaction
  2. One thread performing 50 batched inserts at each transaction
  3. Eight threads performing one insert to each transaction
  4. Eight threads performing 50 batched inserts at each transaction

As you can see, a pretty simple set of operations. Then I decided to test it using the following four conditions on the servers:

  1. One slave worker FC as default
  2. One slave worker FC set to 25
  3. Eight slave workers FC as default
  4. Eight slave workers FC set to 25

Again nothing weird or strange from my point of view. I used four nodes:

  1. Gr1 Writer
  2. Gr2 Reader
  3. Gr3 Reader minimal latency (~10ms)
  4. Gr4 Reader minimal latency (~10ms)

Finally, I had to be sure I measured the lag in a way that allowed me to reference it consistently on all nodes. 

I think we can safely say that the incoming GTID (last_ Received_transaction_set from replication_connection_status) is definitely the last change applied to the master that the slave node knows about. More recent changes could have occurred, but network delay can prevent them from being “received.” The other point of reference is GTID_EXECUTED, which refers to the latest GTID processed on the node itself.

The closest query that can track the distance will be:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-2),':',1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX( Received_transaction_set,':',-2),':',1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag

Or in the case of a single worker:

select @last_exec:=SUBSTRING_INDEX(SUBSTRING_INDEX( @@global.GTID_EXECUTED,':',-1),'-',-1) last_executed;select  @last_rec:=SUBSTRING_INDEX(SUBSTRING_INDEX(Received_transaction_set,':',-1),'-',-1) last_received FROM performance_schema.replication_connection_status WHERE Channel_name = 'group_replication_applier'; select (@last_rec - @last_exec) as real_lag;

The result will be something like this:

+---------------+ | last_executed | +---------------+ | 23607         | +---------------+ +---------------+ | last_received | +---------------+ | 23607         | +---------------+ +----------+ | real_lag | +----------+ |        0 | +----------+

The whole set of tests can be found here, with all the commands you need to run the application (you can find it here) and replicate the tests. I will focus on the results (otherwise this blog post would be far too long), but I invite you to see the details.

The Results Efficiency on Writer by Execution Time and Rows/Sec

Using the raw data from the tests (Excel spreadsheet available here), I was interested in identifying if and how the Writer is affected by the use of Group Replication and flow control.

Reviewing the graph, we can see that the Writer has a linear increase in the execution time (when using default flow control) that matches the increase in the load. Nothing there is concerning, and all-in-all we see what is expected if the load is light. The volume of rows at the end justifies the execution time.

It’s a different scenario if we use flow control. The execution time increases significantly in both cases (single worker/multiple workers). In the worst case (eight threads, 50 inserts batch) it becomes four times higher than the same load without flow control.

What happens to the inserted rows? In the application, I traced the rows inserted/sec. It is easy to see what is going on there:

We can see that the Writer with flow control activated inserts less than a third of the rows it processes without flow control. 

We can definitely say that flow control has a significant impact on the Writer performance. To clarify, let’s look at this graph:

Without flow control, the Writer processes a high volume of rows in a limited amount of time (results from the test of eight workers, eight threads, 50 insert batch). With flow control, the situation changes drastically. The Writer takes a long time processing a significantly smaller number of rows/sec. In short, performance drops significantly.

But hey, I’m OK with that if it means having a consistent data-set cross all nodes. In the end, Percona XtraDB Cluster and similar solutions pay a significant performance price match the data-centric principle. 

Let’s see what happen on the other nodes.

Entries Lag

Well, this scenario is not so good:

When NOT using flow control, the nodes lag behind the writer significantly. Remember that by default flow control in Group Replication is set to 25000 entries (I mean 25K of entries!!!).

What happens is that as soon as I put some salt (see load) on the Writer, the slave nodes start to lag. When using the default single worker, that will have a significant impact. While using multiple workers, we see that the lag happens mainly on the node(s) with minimal (10ms) network latency. The sad thing is that is not really going down with respect to the single thread worker, indicating that the simple minimal latency of 10ms is enough to affect replication.

Time to activate the flow control and have no lag:

Unfortunately, this is not the case. As we can see, the lag of single worker remains high for Gr2 (154 entries). While using multiple workers, the Gr3/4 nodes can perform much better, with significantly less lag (but still high at ~1k entries).

It is important to remember that at this time the Writer is processing one-third or less of the rows it is normally able to. It is also important to note that I set 25 to the entry limit in flow control, and the Gr3 (and Gr4) nodes are still lagging more than 1K entries behind.

To clarify, let check the two graphs below:

Using the Writer (Master) as a baseline in entry #N, without flow control, the nodes (slaves) using Group Replication start to significantly lag behind the writer (even with a light load).

The distance in this PoC ranged from very minimal (with 58 entries), up to much higher loads (3849 entries):

Using flow control, the Writer (Master) diverges less, as expected. If it has a significant drop in performance (one-third or less), the nodes still lag. The worst-case is up to 1363 entries. 

I need to underline here that we have no further way (that I am aware of, anyway) to tune the lag and prevent it from happening.

This means an application cannot transparently split writes/reads and expect consistency. The gap is too high.

A Graph That Tells Us a Story

I used Percona Monitoring and Management (PMM) to keep an eye on the nodes while doing the tests. One of the graphs really showed me that Group Replication still has some “limits” as the replication mechanism for a cluster:

This graph shows the MySQL queries executed on all the four nodes, in the testing using 8-50 threads-batch and flow control. 

As you can see, the Gr1 (Writer) is the first one to take off, followed by Gr2. Nodes Gr3 and Gr4 require a bit more, given the binlog transmission (and 10ms delay). Once the data is there, they match (inconsistently) the Gr2 node. This is an effect of flow control asking the Master to slow down. But as previously seen, the nodes will never match the Writer. When the load test is over, the nodes continue to process the queue for additional ~130 seconds. Considering that the whole load takes 420 seconds on the Writer, this means that one-third of the total time on the Writer is spent syncing the slave AFTERWARDS.

The above graph shows the same test without flow control. It is interesting to see how the Writer moved above 300 queries/sec, while G2 stayed around 200 and Gr3/4 far below. The Writer was able to process the whole load in ~120 seconds instead 420, while Gr3/4 continue to process the load for an additional ~360 seconds.

This means that without flow control set, the nodes lag around 360 seconds behind the Master. With flow control set to 25, they lag 130 seconds.

This is a significant gap.

Conclusions

Going back to the reason why I was started this PoC, it looks like my application(s) are not a good fit for Group Replication given that I have set Percona XtraDB Cluster to scale out the reads and efficiently move my writer to another when I need to. 

Group Replication is still based on asynchronous replication (as my colleague Kenny said). It makes sense in many other cases, but it doesn’t compare to solutions based on virtually synchronous replication. It still requires a lot of refinement.

On the other hand, for applications that can afford to have a significant gap between writers and readers it is probably fine. But … doesn’t standard replication already cover that? 

Reviewing the Oracle documentations (https://dev.mysql.com/doc/refman/5.7/en/group-replication-background.html), I can see why Group Replication as part of the InnoDB cluster could help improve high availability when compared to standard replication. 

But I also think it is important to understand that Group Replication (and derived solutions like InnoDB cluster) are not comparable or a replacement for data-centric solutions as Percona XtraDB Cluster. At least up to now.

Good MySQL to everyone.

Platform End of Life (EOL) Announcement for RHEL 5 and Ubuntu 12.04 LTS

July 31, 2017 - 5:28pm

Upstream platform vendors have announced the general end of life (EOL) for Red Hat Enterprise Linux 5 (RHEL 5) and its derivatives, as well as Ubuntu 12.04 LTS. With this announcement comes some implications to support for Percona software running on these operating systems.

RHEL 5 was EOL as of March 31st, 2017 and Ubuntu 12.04 LTS was end of life as of April 28th, 2017. Pursuant to our end of life policies, we are announcing that these EOLs will go into effect for Percona software on August 1st, 2017. As of this date, we will no longer be producing new packages, binary builds, hotfixes, or bug fixes for Percona software on these platforms.

We generally align our platform end of life dates with those of the upstream platform vendor. The platform end of life dates are published in advance on our website under the page Supported Linux Platforms and Versions.

Per our policies, Percona will continue to provide operational support for your databases on EOLed platforms. However, we will be unable to provide any bug fixes, builds or OS-level assistance if you encounter an issue outside the database itself.

Each platform vendor has a supported migration or upgrade path to their next major release.  Please reach out to us if you need assistance in migrating your database to your vendor’s supported platform – Percona will be happy to assist you.

Webinar Wednesday August 2, 2017: MySQL Disk Encryption with LUKS

July 31, 2017 - 3:12pm

Join Percona’s, Senior Architect, Matthew Boehm as he presents MySQL Disk Encryption with LUKS on Wednesday, August 2, 2017, at 1:00 pm PDT / 4:00 pm EDT (UTC-7).

Register Now

Clients require strong security measures for PCI, HIPAA or PHI. You must encrypt MySQL “at rest” to satisfy the data managed under these standards. InnoDB’s built-in encryption features work, but there are some caveats to that solution.

In this talk, you’ll see how to encrypt your entire disk to protect everything from data, redo logs and binary logs.

Register for the webinar here.

Matthew Boehm, Architect

Matthew joined Percona in the fall of 2012 as a MySQL consultant. His areas of knowledge include the traditional Linux/Apache/MySQL/PHP stack, memcached, MySQL Cluster, massive sharding topologies, PHP development and a bit of MySQL-C-API development. Previously, Matthew DBAed for the 5th largest MySQL installation at eBay/PayPal, and also hails from managed hosting environments. During his off-hours, Matthew is a nationally-ranked competitive West Coast Swing dancer, and travels to competitions around the US. He enjoys working out, camping, biking and playing MMOs with his son.

Percona Server for MySQL 5.7.18-16 Is Now Available

July 28, 2017 - 11:49am

Percona is glad to announce the GA release of Percona Server for MySQL 5.7.18-16 on July 28, 2017 (Downloads are available here and from the Percona Software Repositories).

Based on MySQL 5.7.18, including all the bug fixes in it, Percona Server for MySQL 5.7.18-16 is the current GA release in the Percona Server for MySQL 5.7 series. All of Percona‘s software is open-source and free, and you can find all the release details in the 5.7.18-16 milestone at Launchpad

Please note that RHEL 5, CentOS 5 and Ubuntu versions 12.04 and older are not supported in future releases of Percona Server and no further packages are added for these distributions.

New Features:

  • Percona Server for MySQL is now available on Debian 9 (stretch). The support only covers the amd64 architecture.
  • Percona Server for MySQL can now be built with the support of OpenSSL 1.1.
  • MyRocks storage engine has been merged into Percona Server.
  • TokuDB enables to kill a query that is awaiting an FT locktree lock.
  • TokuDB enables using the MySQL DEBUG_SYNC facility within Percona FT.

Bugs Fixed:

  • Row counts in TokuDB could be lost intermittently after restarts. Bug fixed #2.
  • In TokuDB, two races in the fractal tree lock manager could significantly affect transactional throughput for some applications that used a small number of concurrent transactions. These races manifested as transactions unnecessarily waiting for an available lock. Bug fixed #3.
  • Percona FT could assert when opening a dictionary with no useful information to an error log. Bug fixed #23.
  • Percona FT could assert for various reasons deserializing nodes with no useful error output. Bug fixed #24.
  • It was not possible to build Percona Server on Debian 9 (stretch) due to issues with OpenSSL 1.1. Bug fixed #1702903 (upstream #83814).
  • Packaging was using the dpkg --verify command which is not available on wheezy/precise. Bug fixed #1694907.
  • Enabling and disabling the slow query log rotation spuriously added the version suffix to the next slow query log file name. Bug fixed #1704056.
  • With two client connections to a server (debug server build), the server could crash after one of the clients set the global option userstat and flushed the client statistics (FLUSH CLIENT_STATISTICS) and then both clients were closed. Bug fixed #1661488.
  • Percona FT did not pass cmake flags on to snappy cmake. Bug fixed #41. The progress status for partitioned TokuDB table ALTERs was misleading. Bug fixed #42.
  • When a client application is connecting to the Aurora cluster end point using SSL (--ssl-verify-server-cert or --ssl-mode=VERIFY_IDENTITY option), wildcard and SAN enabled SSL certificates were ignored. Note that the --ssl-verify-server-cert option is deprecated in Percona Server 5.7. Bug fixed #1673656 (upstream #68052).
  • Killing a stored procedure execution could result in an assert failure on a debug server build. Bug fixed #1689736 (upstream #86260).
  • The SET STATEMENT .. FOR statement changed the global instead of the session value of a variable if the statement occurred immediately after the SET GLOBAL or SHOW GLOBAL STATUS command. Bug fixed #1385352.
  • When running SHOW ENGINE INNODB STATUS, the Buffer pool size, bytes entry contained 0. BUg fixed #1586262.
  • The synchronization between the LRU manager and page cleaner threads was not done at shutdown. Bug fixed #1689552.
  • Spurious lock_wait_timeout_thread wakeup in lock_wait_suspend_thread() could occur. Bug fixed #1704267 (upstream #72123).

Other bugs fixed: #1686603#6#44#65#1160986#1686934#1688319#1689989#1690012#1691682#1697700#1699788#1121072, and #1684601 (upstream #86016).

The release notes for Percona Server for MySQL 5.7.18-16 are available in the online documentation. Please report any bugs on the launchpad bug tracker.

Note

Due to new package dependency, Ubuntu/Debian users should use apt-get dist-upgrade or apt-get install percona-server-server-5.7 to upgrade.

Percona Server for MongoDB 3.4.6-1.6 is Now Available

July 28, 2017 - 10:37am

Percona announces the release of Percona Server for MongoDB 3.4.6-1.6 on July 27, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.6 and does not include any additional changes.

What is MySQL Partitioning?

July 27, 2017 - 8:39am

In this blog, we’ll quickly look at MySQL partitioning.

Partitioning is a way in which a database (MySQL in this case) splits its actual data down into separate tables, but still get treated as a single table by the SQL layer.

When partitioning, it’s a good idea to find a natural partition key. You want to ensure that table lookups go to the correct partition or group of partitions. This means that all SELECT, UPDATE, DELETE should include that column in the WHERE clause. Otherwise, the storage engine does a scatter-gather, and queries ALL partitions in a UNION that is not concurrent.

Generally, you must add the partition key into the primary key along with the auto increment, i.e., PRIMARY KEY (part_id,id). If you don’t have well-designed and small columns for this composite primary key, it could enlarge all of your secondary indexes.

You can partition by range or hash. Range is great because you have groups of known IDs in each table, and it helps when querying across partition IDs. This still can create hotspots in the newest partition, as all new inserts go there. Partitioning by hash “load balances” the table, and allows you to write to partitions more concurrently. This makes range queries on the partition key a bad idea.

In MySQL 5.7, partitioning became native to the store engine and deprecated the old method where MySQL itself had to handle the partitions. This means InnoDB partitions (and a larger amount of partitions) are a better choice than in the past.

As with all features and recommendations, this only makes sense if it helps your data and workload!

Percona Server for MongoDB 3.2.15-3.5 is Now Available

July 26, 2017 - 9:42am

Percona announces the release of Percona Server for MongoDB 3.2.15-3.5 on July 26, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open-source, fully compatible, highly scalable, zero-maintenance downtime database that supports the MongoDB v3.2 protocol and drivers. It extends MongoDB with MongoRocks, Percona Memory Engine, and PerconaFT storage engine, as well as enterprise-grade features like External Authentication, Audit Logging, Profiling Rate Limiting, and Hot Backup at no extra cost. The software requires no changes to MongoDB applications or code.

NOTE: We deprecated the PerconaFT storage engine. It will not be available in future releases.

This release is based on MongoDB 3.2.15 and does not include any additional changes.

Percona Server for MongoDB 3.2.15-3.5 release notes are available in the official documentation.

What is innodb_autoinc_lock_mode and why should I care?

July 26, 2017 - 8:15am

In this blog post, we’ll look at what innodb_autoinc_lock_mode is and how it works.

I was recently discussing innodb_autoinc_lock_mode with some colleagues to address issues at a company I was working with.This variable defines the lock mode to use for generating auto-increment values. The permissible values are 0, 1 or 2 (for “traditional”, “consecutive” or “interleaved” lock mode, respectively). In most cases, this variable is set to the default of 1.

We recommend setting it to 2 when the BINLOG_FORMAT=ROW. With interleaved, INSERT statements don’t use the table-level AUTO-INC lock and multiple statements can execute at the same time. Setting it to 0 or 1 can cause a huge hit in concurrency for certain workloads.

Interleaved (or 2) is the fastest and most scalable lock mode, but it is not safe if using STATEMENT-based replication or recovery scenarios when SQL statements are replayed from the binary log. Another consideration – which you shouldn’t rely on anyway – is that IDs might not be consecutive with a lock mode of 2. That means you could do three inserts and expect IDs 100,101 and 103, but end up with 100, 102 and 104. For most people, this isn’t a huge deal.

If you are only doing simple inserts, this might not help you. I did a sysbench test on MySQL 5.7 in Amazon RDS with 100 threads and found no difference in performance or throughput between lock modes 1 and 2. It helps the most when you when the number of rows can’t be determined, such as with INSERT INTO…SELECT statements.

You can find a longer form article in the manual, but I highly recommend setting this value to 2 if you are not using STATEMENT-based replication.

Webinar Thursday July 27, 2017: Database Backup and Recovery Best Practices (with a Focus on MySQL)

July 25, 2017 - 12:46pm

Join Percona’s, Architect, Manjot Singh as he presents Database Backup and Recovery Best Practices (with a Focus on MySQL) on Thursday, July 27, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7).

Register Now

In the case of a failure, do you know how long it will take to restore your database? Do you know how old the backup will be? In this presentation, we will cover the basics of best practices for backup, restoration and business continuity. Don’t put your company on the line due to bad data retention and backup policies.

Register for the webinar here.

Manjot Singh, Architect Manjot Singh is an Architect with Percona in California. He loves to learn about new technologies and apply them to real-world problems. Manjot is a veteran of startup and Fortune 500 enterprise companies alike, with a few years spent in government, education and hospital IT. Now he consults for Percona with companies around the world on many interesting problems.

Percona XtraBackup 2.4.8 is Now Available

July 24, 2017 - 10:59am

Percona announces the GA release of Percona XtraBackup 2.4.8 on July 24, 2017. You can download it from our download site and apt and yum repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

New features: Bugs Fixed:
  • xtrabackup would hang with Waiting for master thread to be suspended message when backup was being prepared. Bug fixed #1671437.
  • xtrabackup would fail to prepare the backup with 6th page is not initialized message in case server didn’t properly initialize the page. Bug fixed #1671722.
  • xbstream could run out of file descriptors while extracting the backup which contains many tables. Bug fixed #1690823.
  • When a table was created with the DATA DIRECTORY option xtrabackup would back up the .frm and .isl files, but not the .ibd file. Due to the missing .ibd files backup then could not be restored. Bug fixed #1701736.
  • Percona XtraBackup incorrectly determined use of master_auto_postion on a slave, and thus generated invalid xtrabackup_slave_info file. Bug fixed #1705193.
  • Percona XtraBackup will now print a warning if it encounters unsupported storage engine. Bug fixed #1394493.
  • Percona XtraBackup would crash while backing up MariaDB 10.2.x with --ftwrl-* options. Bug fixed #1704636.
  • xtrabackup --slave-info didn’t write the correct information into xtrabackup_slave_info file when multi-source replication was used. Bug fixed #1551634.
  • Along with xtrabackup_checkpints file, xtrabackup now copies xtrabackup_info file into directory specified by --extra-lsndir option. Bug fixed #1600656.
  • GTID position was not recorded when --binlog-info option was set to AUTO. Bug fixed #1651505.

Release notes with all the bugfixes for Percona XtraBackup 2.4.8 are available in our online documentation. Please report any bugs to the launchpad bug tracker.

Percona XtraBackup 2.3.9 is Now Available

July 24, 2017 - 10:35am

Percona announces the release of Percona XtraBackup 2.3.9 on July 24, 2017. Downloads are available from our download site or Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

This release is the current GA (Generally Available) stable release in the 2.3 series.

New Features Bugs Fixed:
  • Percona XtraBackup would crash when being prepared if the index compaction was enabled. Bug fixed #1192834.
  • Fixed build failure on Debian Stretch by adding support for building with OpenSSL 1.1. Bug fixed #1678947.
  • xbstream could run out of file descriptors while extracting the backup which contains many tables. Bug fixed #1690823.
  • Percona XtraBackup incorrectly determined use of master_auto_postion on a slave, and thus generated invalid xtrabackup_slave_info file. Bug fixed #1705193.
  • Percona XtraBackup would crash while backing up MariaDB 10.2.x with --ftwrl-* options. Bug fixed #1704636.
  • Along with xtrabackup_checkpints file, xtrabackup now copies xtrabackup_info file into directory specified by --extra-lsndir option. Bug fixed #1600656.
  • GTID position was not recorded when --binlog-info option was set to AUTO. Bug fixed #1651505.

Release notes with all the bugfixes for Percona XtraBackup 2.3.9 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

Faster Node Rejoins with Improved IST performance

July 21, 2017 - 9:01am

In this blog, we’ll look at how improvements to Percona XtraDB Cluster improved IST performance.

Introduction

Starting in version 5.7.17-29.20 of Percona XtraDB Cluster significantly improved performance. Depending on the workload, the increase in throughput is in the range of 3-10x. (More details here). These optimization fixes also helped improve IST (Incremental State Transfer) performance. This blog is aimed at studying the IST impact.

IST

IST stands for incremental state transfer. When a node of the cluster leaves the cluster for a short period of time and then rejoins the cluster it needs to catch-up with cluster state. As part of this sync process existing node of the cluster (aka DONOR) donates missing write-sets to rejoining node (aka JOINER). In short, flow involves, applying missing write-sets on JOINER as it does during active workload replication.

Percona XtraDB Cluster / Galera already can apply write-sets in parallel using multiple applier threads. Unfortunately, due to commit contention, the commit action was serialized. This was fixed in the above Percona XtraDB Cluster release, allowing commits to proceed in parallel.

IST uses the same path for applying write-sets, except that it is more like a batch operation.

IST Performance

Let’s look at IST performance before and now.

Setup

  1. Two node cluster (node-1 and node-2) and gcache is configured large enough to avoid purging as we need IST
  2. Start workload against node-1 for 30 seconds
  3. Shutdown node-2
  4. Start workload that performs 4M requests against node-1. Workload produces ~3.5M write-sets that are cached in gcache and used later for IST
  5. Start node-2 with N-applier threads
  6. Wait until IST is done
  7. ….. repeat steps 3-6 with different values of N.

Observations:

  • IST is 4x faster with PXC-5.7.17 (compared to previous releases)
  • Improved performance means a quicker node rejoin, and an overall increase in cluster productivity as joiner node is available to process the workload more quickly
Conclusion

Percona XtraDB Cluster 5.7.17 significantly improved IST performance. A faster re-join of the node effectively means better cluster productivity and flexibility in planning maintenance window. So what are you waiting for? Upgrade to Percona XtraDB Cluster 5.7.17 or latest Percona XtraDB Cluster 5.7 release and experience the power!

Where Do I Put ProxySQL?

July 20, 2017 - 11:57am

In this blog post, we’ll look at how to deploy ProxySQL.

ProxySQL is a high-performance proxy, currently for MySQL and its forks (like Percona Server for MySQL and MariaDB). It acts as an intermediary for client requests seeking resources from the database. It was created for DBAs by René Cannaò, as a means of solving complex replication topology issues. When bringing up ProxySQL with my clients, I always get questions about where it fits into the architecture. This post should clarify that.

Before continuing, you might want to know why you should use this software. The features that are of interest include:

  • MySQL firewall
  • Connection pooling
  • Shard lookup and automated routing
  • Ability to read/write split
  • Automatically switch to another master in case of active master failure
  • Query cache
  • Performance metrics
  • Other neat features!
Initial Configuration

In general, you install it on nodes that do not have a running MySQL database. You manage it via the MySQL command line on another port, usually 6032. Once it is started the configuration in /etc is not used, and you do everything within the CLI. The backend database is actually SQLite, and the db file is stored in /var/lib/proxysql.

There are many guides out there on initializing and installing it, so I won’t cover those details here. It can be as simple as:

apt-get install proxysql

ProxySQL Architecture

While most first think to install ProxySQL on a standalone node between the application and database, this has the potential to affect query performance due to the additional latency from network hops.

 

To have minimal impact on performance (and avoid the additional network hop), many recommend installing ProxySQL on the application servers. The application then connects to ProxySQL (acting as a MySQL server) on localhost, using Unix Domain Socket, and avoiding extra latency. It would then use its routing rules to reach out and talk to the actual MySQL servers with its own connection pooling. The application doesn’t have any idea what happens beyond its connection to ProxySQL.

Reducing Your Network Attack Surface

Another consideration is reducing your network attack surface. This means attempting to control all of the possible vulnerabilities in your network’s hardware and software that are accessible to unauthenticated users.

Percona generally suggests that you put a ProxySQL instance on each application host, like in the second image above. This suggestion is certainly valid for reducing latency in your database environment (by limiting network jumps). But while this is good for performance, it can be bad for security.

Every instance must be able to talk to:

  • Every master
  • Every slave

As you can imagine, this is a security nightmare. With every instance, you have x many more connections spanning your network. That’s x many more connections an attacker might exploit.

Instead, it can be better to have one or more ProxySQL instances that are between your application and MySQL servers (like the first image above). This provides a reasonable DMZ-type setup that prevents opening too many connections across the network.

That said, both architectures are valid production configurations – depending on your requirements.

Visit Percona Store


General Inquiries

For general inquiries, please send us your question and someone will contact you.