Buy Percona ServicesBuy Now!
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 44 min 47 sec ago

How to Deal with XA Transactions Recovery

2 hours 12 min ago

For most people (including me until recently) database XA transactions are a fuzzy concept. In over eight years with Percona, I have never had to deal with XA transactions. Then a few weeks ago I got two customers having issues with XA transactions. That deserves a post.

XA 101

What are XA transactions? XA transactions are useful when you need to coordinate a transaction between different systems. The simplest example could be simply two storage engines within MySQL. Basically, it follows this sequence:

  1. XA START
  2. Some SQL statements
  3. XA END
  4. XA PREPARE
  5. XA COMMIT or ROLLBACK

Once prepared, the XA transaction survives a MySQL crash. Upon restart, you’ll see something like this in the MySQL error log:

2017-08-23T14:53:54.189068Z 0 [Note] Starting crash recovery... 2017-08-23T14:53:54.189204Z 0 [Note] InnoDB: Starting recovery for XA transactions... 2017-08-23T14:53:54.189225Z 0 [Note] InnoDB: Transaction 45093 in prepared state after recovery 2017-08-23T14:53:54.189244Z 0 [Note] InnoDB: Transaction contains changes to 2 rows 2017-08-23T14:53:54.189257Z 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2017-08-23T14:53:54.189267Z 0 [Note] Found 1 prepared transaction(s) in InnoDB 2017-08-23T14:53:54.189312Z 0 [Warning] Found 1 prepared XA transactions 2017-08-23T14:53:54.189329Z 0 [Note] Crash recovery finished. 2017-08-23T14:53:54.189472Z 0 [Note] InnoDB: Starting recovery for XA transactions... 2017-08-23T14:53:54.189489Z 0 [Note] InnoDB: Transaction 45093 in prepared state after recovery 2017-08-23T14:53:54.189501Z 0 [Note] InnoDB: Transaction contains changes to 2 rows 2017-08-23T14:53:54.189520Z 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2017-08-23T14:53:54.189529Z 0 [Note] Found 1 prepared transaction(s) in InnoDB 2017-08-23T14:53:54.189539Z 0 [Warning] Found 1 prepared XA transactions

The command xa recover shows you an output like:

mysql> xa recover; +----------+--------------+--------------+-----------+ | formatID | gtrid_length | bqual_length | data | +----------+--------------+--------------+-----------+ | 1234 | 4 | 5 | bqual | +----------+--------------+--------------+-----------+ 1 row in set (0.00 sec)

There are some binary data that can’t be shown in HTML. The XA Xid is made of three fields: gtrid (global trx id), bqual (branch qualifier) and formatId. Java applications use all three fields. For my example above, I used “X’01020304′,’bqual’,1234”. You can trust Java application servers to be creative with Xid values. With MySQL 5.7, you can output the data part in hex with convert xid :

mysql> xa recover convert xid; +----------+--------------+--------------+----------------------+ | formatID | gtrid_length | bqual_length | data | +----------+--------------+--------------+----------------------+ | 1234 | 4 | 5 | 0x01020304627175616C | +----------+--------------+--------------+----------------------+ 1 row in set (0.01 sec)

The Problem

If you do nothing, the prepared transaction stays there forever and holds locks and a read view open. As a consequence, the history list grows without bound along with your ibdata1 file, where the undo entries are kept. If you have slaves, they all have the prepared transaction too (at least with 5.7). No fun.

As a consequence, if you are using XA transactions, you MUST check if there are prepared transactions pending after the server or mysqld restarted. If you find such transactions, you need to commit or roll them back, depending on what is involved.

But how do you commit these XA transactions? The problem here is the output of xa recover. As it is, the output is unusable if there is a bqual field or non-default formatID field:

mysql> xa commit 0x01020304627175616C; ERROR 1397 (XAE04): XAER_NOTA: Unknown XID

The Fix

Looking back at the xa recover convert xid output above, the gtrid_length and bqual_length are provided. With the use of these values, you can extract the parts of the data field which gives us:

  • gtrid = 0x01020304
  • bqual = 0x627175616C

And, of course, the formatID is 1234. Altogether, we have:

mysql> xa commit 0x01020304,0x627175616C,1234; Query OK, 0 rows affected (0.15 sec)

Which finally works! On 5.6 the convert xid option is not available. You have to be a bit more creative:

root@master57:/var/lib/mysql# mysql -r -e 'xa recoverG' | hexdump -C 00000000 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a |****************| 00000010 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 20 31 2e 20 72 |*********** 1. r| 00000020 6f 77 20 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a |ow *************| 00000030 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 0a 20 |**************. | 00000040 20 20 20 66 6f 72 6d 61 74 49 44 3a 20 31 32 33 | formatID: 123| 00000050 34 0a 67 74 72 69 64 5f 6c 65 6e 67 74 68 3a 20 |4.gtrid_length: | 00000060 34 0a 62 71 75 61 6c 5f 6c 65 6e 67 74 68 3a 20 |4.bqual_length: | 00000070 35 0a 20 20 20 20 20 20 20 20 64 61 74 61 3a 20 |5. data: | 00000080 01 02 03 04 62 71 75 61 6c 0a |....bqual.| 0000008a

But there is a limitation in 5.6: you can only XA commit/rollback transactions that belong to your session. That means after a crash you are out of luck. To get rid of these you need to promote a slave or perform a logical dump and restore. The best plan is to avoid the use of XA transactions with 5.6.

I submitted this bug to Percona Server for MySQL in order to get a usable output out of xa recover convert xid. If you think this is important, vote for it!

This Week in Data with Colin Charles #7: Percona Live Europe and Open Source Summit North America

2 hours 44 min ago

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

Percona Live Europe Dublin

Are you affected by the Ryanair flight cancellations? Have you made alternative arrangements? Have you registered for the community dinner? Even speakers have to register, so this is a separate ticket cost! There will be fun lightning talks in addition to food and drink.

You are, of course, already registered for Percona Live Europe Dublin, right? See you there! Don’t forget to pack a brolly, or a rain jacket (if this week’s weather is anything to go by).

Open Source Summit North America

Last week, a lot of open source folk were in Los Angeles, California for the annual Open Source Summit North America (formerly known as LinuxCon). I’ve been to many as a speaker, and have always loved going to the event (so save the date, in 2018 it is August 29-31 in Vancouver, British Columbia, Canada).

What were major themes this year? Containerization. Everyone (large and small) seem to be moving workloads into containers. Containers and stateful applications make things all the more interesting, as well as thoughts on performance. This is a big deal for us in the MySQL/MongoDB/other open source database space. Technologies to watch include: Docker/Moby, Kubernetes, and Mesos. These are technologies people are frankly already deploying on, and it looks like the on-ramp is coming. Videos to watch:

The cloud is still a big deal. Yes, people are all customers of Amazon Web Services. Sure they are looking at Microsoft Azure. Google Cloud Platform is – from my informal survey – the third most popular. In many instances, I had conversations about Oracle Cloud, and it looks like there is a huge push behind this (but not too many users that I’ve seen yet). So it’s still a bet on the future as it continues to be developed by engineers. A mention of Rackspace Cloud (which offers all the MySQL variants in the cloud) is good, but many large-scale shops haven’t thought about it.

There were also some “fun” keynotes:

I wish more events had this kind of diverse keynotes.

From a speaker standpoint, I enjoyed the speaker/sponsor dinner party (a great time to catch up with friends and meet new ones), as well as the t-shirt and speaker gift (wooden board). I had a great time at the attendee expo hall reception and the party at Paramount Studios (lots of fun catered things, like In-N-Out burgers!).

Releases
  • ProxySQL 1.4.3. Hot on the heels of 1.4.2 comes 1.4.3, nicknamed “The ClickHouse release.” Clients can connect to ProxySQL, and it will query a ClickHouse backend. Should be exciting for ClickHouse users. Don’t forget the SQLite support, too!
  • Percona XtraDB Cluster 5.6.37-26.21
  • MariaDB ColumnStore 1.1.0 Beta. Do you use ColumnStore? Or do you use ClickHouse? There’s a new beta that might be worth trying.
  • MySQL 8.0.3 Release Candidate. Download this on your way to Percona Live Europe Dublin! Try it. There are many talks for this, including a keynote. You’ll find things like Histograms, more improvements around the optimizer, JSON and GIS improvements, security improvements, resource groups seem very interesting, data dictionary changes and a whole lot more!
Link List
  • CallidusCloud Acquires OrientDB, the Leading Multi-Model Database Technology
  • Database provider MongoDB has filed to go public. Bound to happen, and some highlights according to TechCrunch: “The company brought in $101.4 million in revenue in the most recent year ending January 31, and around $68 million in the first six months ending July 31 this year. In that same period, MongoDB burned through $86.7 million in the year ending January 31 and $45.8 million in the first six months ending July 31. MongoDB’s revenue is growing, and while its losses seem to be stable, they aren’t shrinking either. There have been over 30 million downloads of MongoDB community, and the link also has a nice cap table pie chart.”
Upcoming appearances

Percona’s website keeps track of community events, so check that out and see where to listen to a Perconian speak. My upcoming appearances are:

Feedback

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Percona Support with Amazon RDS

September 21, 2017 - 2:35pm

This blog post will give a brief overview of Amazon RDS capabilities and limitations, and how Percona Support can help you succeed in your Amazon RDS deployments.

One of the common questions that we get from customers and prospective customers is about Percona Support with Amazon RDS. As many companies have shifted to the cloud, or are considering how to do so, it’s natural to try to understand the limitations inherent in different deployment strategies.

Why Use Amazon RDS?

As more companies move to using the cloud, we’ve seen a shift towards work models in technical teams that require software developers to take on more operational duties than they have traditionally. This makes it essential to abstract infrastructure so it can be interacted with as code, whether through automation or APIs. Amazon RDS presents a compelling DBaaS product with significant flexibility while maintaining ease of deployment.

Use Cases Where RDS Isn’t a Fit

There are a number of use cases where the inherent limitations of RDS make it not a good fit. With RDS, you are trading off the flexibility to deploy complex environment topologies for the ease of deploying with the push of a button, or a simple API call. RDS eliminates most of the operational overhead of running a database in your environment by abstracting away the physical or virtual hardware and the operating system, networking and replication configuration. This, however, means that you can’t get too fancy with replication, networking or the underlying operating system or hardware.

When Using RDS, Which Engine is Right For Me?

Amazon’s RDS has numerous database engines available, each suited to a specific use case. The three RDS database engines we’ll be discussing briefly here are MySQL, MariaDB and Aurora.

Use MySQL when you have an application tuned for MySQL, you need to use MySQL plug-ins or you wish to maintain compatibility to support external replicas in EC2. MySQL with RDS has support for Memcached, including plug-in support and 5.7 compatible query optimizer improvements. Unfortunately, thread pooling and similar features that are available in Percona Server for MySQL are not currently available in the MySQL engine on RDS.

Use MariaDB when you have an application that requires features available for this engine but not in others. Currently, MariaDB engines in RDS support thread pooling, table elimination, user roles and virtual columns. MySQL or Aurora don’t support these. MariaDB engines in RDS support global transaction IDs (GTIDs), but they are based on the MariaDB implementation. They are not compatible with MySQL GTIDs. This can affect replication or migrations in the future.

Use Aurora when you want a simple-to-setup solution with strong availability guarantees and minimal configuration. This RDS database engine is cloud-native, built with elasticity and the vagaries of running in a distributed infrastructure in mind. While it does limit your configuration and optimization capabilities more than other RDS database engines, it handles a lot of things for you – including ensuring availability. Aurora automatically detects database crashes and restarts without the need for crash recovery or to rebuild the database cache. If the entire instance fails, Aurora automatically fails over to one of up to 15 read replicas.

So If RDS Handles Operations, Why Do I Need Support?

Generally speaking, properly using a database implies four quadrants of tasks. RDS only covers one of these four quadrants: the operational piece. Your existing staff (or another provider such as Percona) must cover each of the remaining quadrants.

Amazon RDS

The areas where people run into trouble are slow queries, database performance not meeting expectations or other such issues. In these cases they often can contact Amazon’s support line. The AWS Support Engineers are trained and focused on addressing issues specific to the AWS environment, however. They’re not DBAs and do not have the database expertise necessary to fully troubleshoot your database issues in depth. Often, when an RDS user encounters a performance issue, the first instinct is to increase the size of their AWS deployment because it’s a simple solution. A better path would be investigating performance tuning. More hardware is not necessarily the best solution. You often end up spending far more on your monthly cloud hosting bill than necessary by ignoring unoptimized configurations and queries.

As noted above, when using MariaDB or MySQL RDS database engines you can make use of plug-ins and inject additional configuration options that aren’t available in Aurora. This includes the ability to replicate to external instances, such as in an EC2 environment. This provides more configuration flexibility for performance optimization – but does require expertise to make use of it.

Outside support vendors (like Percona) can still help you even when you eliminate the operational elements by lending the expertise to your technical teams and educating them on tuning and optimization strategies.

Percona Live Europe Featured Talks: Modern sysbench – Teaching an Old Dog New Tricks with Alexey Kopytov

September 21, 2017 - 9:40am

Welcome to another post in our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post is with Alexey Kopytov, sofware developer and maintainer of sysbench. His talk is Modern sysbench: Teaching an Old Dog New Tricks. His presentation present new features provided by recent releases and explain how they can be used to create complex benchmark scenarios and collect performance metrics with a simple Lua API. It will also run a live demo of some of the new sysbench features.

In our conversation, we discussed benchmarking your database environment:

Percona: How did you get into database technology? What do you love about it?

Alexey: It was 2003, and I was working as a software developer for a boring company providing hosted VoIP solutions. I was a big fan of the free and open source software philosophy, which was way less popular back then than it is today. I contributed to a number of open source projects in my free time, but I also had a dream of developing open source software as part of my paid job. This looked completely unrealistic at the time, until I came across a job posting on a Russian IT forum about a Swedish company called MySQL AB looking for software developers to work remotely on MySQL! That sounded like my dream job, so I applied.

I knew very little about database internals at the time, so looking back I was giving terrible answers during my job interviews. Nevertheless, I joined the High Performance Group at MySQL AB after a few months, and that has defined my professional life for many years.

I love database technology because it presents the toughest challenges in software development. Most problems and solutions related to ever-evolving hardware, scalability and data processing requirements are discovered first by people from the database world.

Percona: Your talk is called “Modern sysbench: Teaching an Old Dog New Tricks”. What is sysbench used for generally, why is it important and how have you used it in your career? 

Alexey: sysbench was an internal project that I took over as soon as I joined MySQL AB. We used it to troubleshoot customer issues, find performance bottlenecks in MySQL and evaluate new features. Of course it was an open source project, so over the years we’ve got many people from the MySQL community using sysbench for all kinds of performance research like testing new hardware, identifying performance-related issues and comparing MySQL configurations, versions and forks.

Percona: What are some of the important new developments in the latest release?

Alexey: This year sysbench got a major upgrade in terms of features and performance to meet the modern world of many-core CPUs, powerful storage devices and distributed database systems capable of processing millions of transactions per second. Some feature highlights from the latest release include simplified command-line interface, a revamped API which allows creating more complex benchmark scenarios with less code, new performance metrics, customizable reports and more!

Percona: What do you want attendees to take away from your session? Why should they attend?

Alexey: sysbench is quite popular, but most people rarely use it more than a few bundled OLTP-style benchmarks. I’d like to explain its full potential, especially the possibilities provided by the new features. I want people to use it to create their own benchmarks, not necessarily related to MySQL, and hopefully find sysbench useful in areas that I have not even envisioned myself.

Percona: What are you most looking forward to at Percona Live Europe 2017?

Alexey: For me Percona Live conferences have always been the place where I can feel the pulse of the technology and learn from the smartest people in the industry. This is especially true now that Percona Live provides talks on diverse topics from communities and database management technologies other than MySQL. Which makes it an even greater event to share ideas, solutions and expertise.

Want to find out more about Alexey, sysbench and database benchmarking? Register for Percona Live Europe 2017, and see his talk Modern sysbench: Teaching an Old Dog New Tricks. Register now to get the best price! Use discount code SeeMeSpeakPLE17 to get 10% off your registration.

Percona Live Open Source Database Conference Europe 2017 in Dublin is the premier European open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, MariaDB, MongoDB, time series database, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference Europe will be September 25-27, 2017 at the Radisson Blu Royal Hotel, Dublin.

sysbench Histograms: A Helpful Feature Often Overlooked

September 20, 2017 - 1:11pm

In this blog post, I will demonstrate how to run and use sysbench histograms.

One of the features of sysbench that I often I see overlooked (and rarely used) is its ability to produce detailed query response time histograms in addition to computing percentile numbers. Looking at histograms together with throughput or latency over time provides many additional insights into query performance.

Here is how you get detailed sysbench histograms and performance over time:

sysbench --rand-type=uniform --report-interval=1 --percentile=99 --time=300 --histogram --mysql-password=sbtest oltp_point_select --table_size=400000000 run

There are a few command line options to consider:

  • report-interval=1 – prints out the current performance measurements every second, which helps see if performance is uniform, if you have stalls or otherwise high variance
  • percentile=99 – computes 99 percentile response time, rather than 95 percentile (the default); I like looking at 99 percentile stats as it is a better measure of performance
  • histogram=on – produces a histogram at the end of the run (as shown below)

The first thing to note about this histogram is that it is exponential. This means the width of the buckets changes with higher values. It starts with 0.001 ms (one microsecond) and gradually grows. This design is used so that sysbench can deal with workloads with requests that take small fractions of milliseconds, as well as accommodate requests that take many seconds (or minutes).

Next, we learn some us very interesting things about typical request response time distribution for databases. You might think that this distribution would be close to some to some “academic” distributions, such as normal distribution. In reality, we often observe is something of a “camelback” distribution (not a real term) – and our “camel” can have more than two humps (especially for simple requests such as the single primary key lookup shown here).

Why do request response times tend to have this distribution? It is because requests can take multiple paths inside the database. For example, certain requests might get responses from the MySQL Query Cache (which will result in the first hump). A second hump might come from resolving lookups using the InnoDB Adaptive Hash Index. A third hump might come from finding all the data in memory (rather than the Adaptive Hash Index). Finally, another hump might coalesce around the time (or times) it takes to execute on requests that require disk IO.    

You also will likely see some long-tail data that highlights the fact that MySQL and Linux are not hard, real-time systems. As an example, this very simple run with a single thread (and thus no contention) has an outlier at around 18ms. Most of the requests are served within 0.2ms or less.

As you add contention, row-level locking, group commit and other issues, you are likely to see even more complicated diagrams – which can often show you something unexpected:

Latency histogram (values are in milliseconds)       value  ------------- distribution ------------- count       0.050 |                                         1       0.051 |                                         2       0.052 |                                         2       0.053 |                                         54       0.053 |                                         79       0.054 |                                         164       0.055 |                                         883       0.056 |*                                        1963       0.057 |*                                        2691       0.059 |**                                       4047       0.060 |****                                     9480       0.061 |******                                   15234       0.062 |********                                 20723       0.063 |********                                 20708       0.064 |**********                               26770       0.065 |*************                            35928       0.066 |*************                            34520       0.068 |************                             32247       0.069 |************                             31693       0.070 |***************                          41682       0.071 |**************                           37862       0.073 |********                                 22691       0.074 |******                                   15907       0.075 |****                                     10509       0.077 |***                                      7853       0.078 |****                                     9880       0.079 |****                                     10853       0.081 |***                                      9243       0.082 |***                                      9280       0.084 |***                                      8947       0.085 |***                                      7869       0.087 |***                                      8129       0.089 |***                                      9073       0.090 |***                                      8364       0.092 |***                                      6781       0.093 |**                                       4672       0.095 |*                                        3356       0.097 |*                                        2512       0.099 |*                                        2177       0.100 |*                                        1784       0.102 |*                                        1398       0.104 |                                         1082       0.106 |                                         810       0.108 |                                         742       0.110 |                                         511       0.112 |                                         422       0.114 |                                         330       0.116 |                                         259       0.118 |                                         203       0.120 |                                         165       0.122 |                                         126       0.125 |                                         108       0.127 |                                         87       0.129 |                                         83       0.132 |                                         55       0.134 |                                         42       0.136 |                                         45       0.139 |                                         41       0.141 |                                         149       0.144 |                                         456       0.147 |                                         848       0.149 |*                                        2128       0.152 |**                                       4586       0.155 |***                                      7592       0.158 |*****                                    13685       0.160 |*********                                24958       0.163 |*****************                        44558       0.166 |*****************************            78332       0.169 |*************************************    98616       0.172 |**************************************** 107664       0.176 |**************************************** 107154       0.179 |****************************             75272       0.182 |******************                       49645       0.185 |****************                         42793       0.189 |*****************                        44649       0.192 |****************                         44329       0.196 |******************                       48460       0.199 |*****************                        44769       0.203 |**********************                   58578       0.206 |***********************                  61373       0.210 |**********************                   58758       0.214 |******************                       48012       0.218 |*************                            34533       0.222 |**************                           36517       0.226 |*************                            34645       0.230 |***********                              28694       0.234 |*******                                  17560       0.238 |*****                                    12920       0.243 |****                                     10911       0.247 |***                                      9208       0.252 |****                                     10556       0.256 |***                                      7561       0.261 |**                                       5047       0.266 |*                                        3757       0.270 |*                                        3584       0.275 |*                                        2951       0.280 |*                                        2078       0.285 |*                                        2161       0.291 |*                                        1747       0.296 |*                                        1954       0.301 |*                                        2878       0.307 |*                                        2810       0.312 |*                                        1967       0.318 |*                                        1619       0.324 |*                                        1409       0.330 |                                         1205       0.336 |                                         1193       0.342 |                                         1151       0.348 |                                         989       0.354 |                                         985       0.361 |                                         799       0.367 |                                         671       0.374 |                                         566       0.381 |                                         537       0.388 |                                         351       0.395 |                                         276       0.402 |                                         214       0.409 |                                         143       0.417 |                                         80       0.424 |                                         85       0.432 |                                         54       0.440 |                                         41       0.448 |                                         29       0.456 |                                         16       0.464 |                                         15       0.473 |                                         11       0.481 |                                         4       0.490 |                                         9       0.499 |                                         4       0.508 |                                         3       0.517 |                                         4       0.527 |                                         4       0.536 |                                         2       0.546 |                                         4       0.556 |                                         4       0.566 |                                         4       0.587 |                                         1       0.597 |                                         1       0.608 |                                         5       0.619 |                                         3       0.630 |                                         2       0.654 |                                         2       0.665 |                                         5       0.677 |                                         26       0.690 |                                         298       0.702 |                                         924       0.715 |*                                        1493       0.728 |                                         1027       0.741 |                                         1112       0.755 |                                         1127       0.768 |                                         796       0.782 |                                         574       0.797 |                                         445       0.811 |                                         415       0.826 |                                         296       0.841 |                                         245       0.856 |                                         202       0.872 |                                         210       0.888 |                                         168       0.904 |                                         217       0.920 |                                         163       0.937 |                                         157       0.954 |                                         204       0.971 |                                         155       0.989 |                                         158       1.007 |                                         137       1.025 |                                         94       1.044 |                                         79       1.063 |                                         52       1.082 |                                         36       1.102 |                                         25       1.122 |                                         25       1.142 |                                         16       1.163 |                                         8       1.184 |                                         5       1.205 |                                         7       1.227 |                                         2       1.250 |                                         4       1.272 |                                         3       1.295 |                                         3       1.319 |                                         2       1.343 |                                         2       1.367 |                                         1       1.417 |                                         2       1.791 |                                         1       1.996 |                                         2       2.106 |                                         2       2.184 |                                         1       2.264 |                                         1       2.347 |                                         2       2.389 |                                         1       2.433 |                                         1       2.477 |                                         1       2.568 |                                         2       2.615 |                                         1       2.710 |                                         1       2.810 |                                         1       2.861 |                                         1       3.187 |                                         1       3.488 |                                         1       3.816 |                                         1       4.028 |                                         1       6.913 |                                         1       7.565 |                                         1       8.130 |                                         1      17.954 |                                         1

I hope you give sysbench histograms a try, and see what you can discover!

Percona XtraDB Cluster 5.6.37-26.21 is Now Available

September 20, 2017 - 11:03am

Percona announces the release of Percona XtraDB Cluster 5.6.37-26.21 on September 20, 2017. Binaries are available from the downloads section or our software repositories.

Percona XtraDB Cluster 5.6.37-26.21 is now the current release, based on the following:

All Percona software is open-source and free.

Improvements
  • PXC-851: Added version compatibility check during SST with XtraBackup:
    • If donor is 5.6 and joiner is 5.7: A warning is printed to perform mysql_upgrade.
    • If donor is 5.7 and joiner is 5.6: An error is printed and SST is rejected.
Fixed Bugs
  • PXC-825: Fixed script for SST with XtraBackup (wsrep_sst_xtrabackup-v2) to include the --defaults-group-suffix when logging to syslog. For more information, see #1559498.
  • PXC-827: Fixed handling of different binlog names between donor and joiner nodes when GTID is enabled. For more information, see #1690398.
  • PXC-830: Rejected the RESET MASTER operation when wsrep provider is enabled and gtid_mode is set to ON. For more information, see #1249284.
  • PXC-833: Fixed connection failure handling during SST by making the donor retry connection to joiner every second for a maximum of 30 retries. For more information, see #1696273.
  • PXC-841: Added check to avoid replication of DDL if sql_log_bin is disabled. For more information, see #1706820.
  • PXC-853: Fixed cluster recovery by enabling wsrep_ready whenever nodes become PRIMARY.
  • PXC-862: Fixed script for SST with XtraBackup (wsrep_sst_xtrabackup-v2) to use the ssl-dhparams value from the configuration file.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Percona Live Europe Featured Talks: Automatic Database Management System Tuning Through Large-Scale Machine Learning with Dana Van Aken

September 19, 2017 - 4:09pm

Welcome to another post in our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post is with Dana Van Aken, a Ph.D. student in Computer Science at Carnegie Mellon University. Her talk is titled Automatic Database Management System Tuning Through Large-Scale Machine Learning. DBMSs are difficult to manage because they have hundreds of configuration “knobs” that control factors such as the amount of memory to use for caches and how often to write data to storage. Organizations often hire experts to help with tuning activities, but experts are prohibitively expensive for many. In this talk, Dana will present OtterTune, a new tool that can automatically find good settings for a DBMS’s configuration knobs. In our conversation, we discussed how machine learning helps DBAs manage DBMSs:

Percona: How did you get into database technology? What do you love about it?

Dana: I got involved with research as an undergrad and ended up working on a systems project with a few Ph.D. students. It turned out to be a fantastic experience and is what convinced me to go for my Ph.D. I visited potential universities and chatted with many faculty members. I met with my current advisor at Carnegie Mellon University, Andy Pavlo, for a half hour and left his office excited about databases and the research problems he was interested in. Three years later, I’m even more excited about databases and the progress we’ve made in developing smarter auto-tuning techniques.

Percona: You’re presenting a session called “Automatic Database Management System Tuning Through Large-Scale Machine Learning”. How does automation make DBAs life easier in a DBMS production environment?

Dana: The role of the DBA is becoming more challenging due to the advent of new technologies and increasing scalability requirements of data-intensive applications. Many DBAs are constantly having to adjust their responsibilities to manage more database servers or support new platforms to meet an organization’s needs as they change over time. Automation is critical for reducing the DBA’s workload to a manageable size so that they can focus on higher-value tasks. Many organizations now automate at least some of the repetitive tasks that were once DBA responsibilities: several have adopted public/private cloud-based services whereas others have built their own automated solutions internally.

The problem is that the tasks that have now become the biggest time sinks for DBAs are much harder to automate. For example, DBMSs have dozens of configuration options. Tuning them is an essential but tedious task for DBAs, because it’s a trial and error approach even for experts. What makes this task even more time-consuming is that the best configuration for one DBMS may not be the best for another. It depends on the application’s workload and the server’s hardware. Given this, successfully automating DBMS tuning is a big win for DBAs since it would streamline common configuration tasks and give DBAs more time to deal with other issues. This is why we’re working hard to develop smarter tuning techniques that are mature and practical enough to be used in a production environment.

Percona: What do you want attendees to take away from your session? Why should they attend?

Dana: I’ll be presenting OtterTune, a new tool that we’re developing at Carnegie Mellon University that can automatically find good settings for a DBMS’s configuration knobs. I’ll first discuss the practical aspects and limitations of the tool. Then I’ll move on to our machine learning (ML) pipeline. All of the ML algorithms that we use are popular techniques that have both practical and theoretical work backing their effectiveness. I’ll discuss each algorithm in our pipeline using concrete examples from MySQL to give better intuition about what we are doing. I will also go over the outputs from each stage (e.g., the configuration parameters that the algorithm find to be the most impactful on performance). I will then talk about lessons I learned along the way, and finally wrap up with some exciting performance results that show how OtterTune’s configurations compared to those created by top-notch DBAs!

My talk will be accessible to a general audience. You do not need a machine learning background to understand our research.

Percona: What are you most looking forward to at Percona Live Europe 2017?

Dana: This is my first Percona Live conference, and I’m excited about attending. I’m looking forward to talking with other developers and DBAs about the projects they’re working on and the challenges they’re facing and getting feedback on OtterTune and our ideas.

Want to find out more about Dana and machine learning for DBMS management? Register for Percona Live Europe 2017, and see his talk Automatic Database Management System Tuning Through Large-Scale Machine Learning. Register now to get the best price! Use discount code SeeMeSpeakPLE17 to get 10% off your registration.

Percona Live Open Source Database Conference Europe 2017 in Dublin is the premier European open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, MariaDB, MongoDB, time series database, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference Europe will be September 25-27, 2017 at the Radisson Blu Royal Hotel, Dublin.

ProxySQL Improves MySQL SSL Connections

September 19, 2017 - 11:47am

In this blog post, we’ll look at how ProxySQL improves MySQL SSL connection performance.

When deploying MySQL with SSL, the main concern is that the initial handshake causes significant overhead if you are not using connection pools (i.e., mysqlnd-mux with PHP, mysql.connector.pooling in Python, etc.). Closing and making new connections over and over can greatly impact on your total query response time. A customer and colleague recently educated me that although you can improve SSL encryption/decryption performance with the AES-NI hardware extension on modern Intel processors, the actual overhead when creating SSL connections comes from the handshake when multiple roundtrips between the server and client are needed.

With ProxySQL’s support for SSL on its backend connections and connection pooling, we can have it sit in front of any application, on the same server (illustrated below):

With this setup, ProxySQL is running on the same server as the application and is connected to MySQL though local socket. MySQL data does not need to go through the TCP stream unsecured.

To quickly verify how this performs, I used a PHP script that simply creates 10k connections in a single thread as fast it can:

<?php $i = 10000; $user = 'percona'; $pass = 'percona'; while($i>=0) { $mysqli = mysqli_init(); // Use SSL //$link = mysqli_real_connect($mysqli, "192.168.56.110", $user, $pass, "", 3306, "", MYSQL_CLIENT_SSL) // No SSL //$link = mysqli_real_connect($mysqli, "192.168.56.110", $user, $pass, "", 3306 ) // OpenVPN //$link = mysqli_real_connect($mysqli, "10.8.99.1", $user, $pass, "", 3306 ) // ProxySQL $link = mysqli_real_connect($mysqli, "localhost", $user, $pass, "", 6033, "/tmp/proxysql.sock") or die(mysqli_connect_error()); $info = mysqli_get_host_info($mysqli); $i--; mysqli_close($mysqli); unset($mysqli); } ?>

Direct connection to MySQL, no SSL:

[root@ad ~]# time php php-test.php real 0m20.417s user 0m0.201s sys 0m3.396s

Direct connection to MySQL with SSL:

[root@ad ~]# time php php-test.php real 1m19.922s user 0m29.933s sys 0m9.550s

Direct connection to MySQL, no SSL, with OpenVPN tunnel:

[root@ad ~]# time php php-test.php real 0m15.161s user 0m0.493s sys 0m0.803s

Now, using ProxySQL via the local socket file:

[root@ad ~]# time php php-test.php real 0m2.791s user 0m0.402s sys 0m0.436s

Below is a graph of these numbers:

As you can see, the difference between SSL and no SSL performance overhead is about 400% – pretty bad for some workloads.

Connections through OpenVPN are also better than MySQL without SSL. While this is interesting, the OpenVPN server needs to be deployed on another server, separate from the MySQL server and application. This approach allows the application servers and MySQL servers (including replica/cluster nodes) to communicate on the same secured network, but creates a single point of failure. Alternatively, deploying OpenVPN on the MySQL server means if you have an additional high availability layer in place and it gets quite complicated when a new master is promoted. In short, OpenVPN adds many additional moving parts.

The beauty with ProxySQL is that you can just run it from all application servers and it works fine if you simply point it to a VIP that directs it to the correct MySQL server (master), or use the replication group feature to identify the authoritative master.

Lastly, it is important to note that these tests were done on CentOS 7.3 with OpenSSL 1.0.1e, Percona Server for MySQL 5.7.19, ProxySQL 1.4.1, PHP 5.4 and OpenVPN 2.4.3.

Happy ProxySQLing!

Webinar Tuesday, September 19, 2017: A Percona Support Engineer Walkthrough for pt-stalk

September 18, 2017 - 12:02pm

Join Percona’s, Principal Support Engineer, Markus Albe as he presents A Percona Support Engineer Walkthrough for pt-stalk on Tuesday, September 19, 2017, at 10:00 am PDT / 1:00 pm EDT (UTC-7).

Register Now

As a support engineer, I get dozens of pt-stalk captures from our customers containing samples of iostat, vmstat, top, ps, SHOW ENGINE INNODB STATUS, SHOW PROCESSLIST and a multitude of other diagnostics outputs.

These are the tools of the trade for performance and troubleshooting, and we must learn to digest these outputs in an effective and systematic way. This allows us to provide high-quality service to a large volume of customers.

In this presentation, I will share the knowledge we’ve gained working with this data, and how to apply it to your database environment. We will learn to setup, capture data, write plugins to trigger collection and to capture custom data, look at our systematic approach and learn what data to read first and how to unwind the tangled threads of pt-stalk.

By the end of this presentation, you will have expert knowledge on how to capture diagnostic metrics at the right time and have a generic approach to digest the captured data. This allows you to diagnose and solve many of problems common to MySQL setups.

Resister for the webinar here.

Marcos Albe, Principal Technical Services Engineer

Marcos Albe has been doing web development for over ten years, providing solutions for various media and technology companies of different sizes. He is now a member of the Percona Support Team. Born and raised in the city of Montevideo, Uruguay, he became passionate about computers at the age of 11, when he got a 25Mhz i386-SX. Ten years later, he became one of the pioneers in telecommuting in Uruguay while leading the IT efforts for the second largest newspaper in the country.

Percona Live Europe Featured Talks: Debugging with Logs (and Other Events) Featuring Charity Majors

September 18, 2017 - 11:25am

Welcome to another post in our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post is with Charity Majors, CEO/Cofounder of Honeycomb. Her talk is Debugging with Logs (and Other Events). Her presentation covers some of the lessons every engineer should know (and often learns the hard way): why good logging solutions are so expensive, why treating your logs as strings can be costly and dangerous, how logs can impact code efficiency and add/fix/change race conditions in your code. In our conversation, we discussed debugging your database environment:

Percona: How did you get into database technology? What do you love about it?

Charity: Oh dear, I don’t. I hate databases. Data is the scariest, hardest part of computing. The stakes are highest and the mistakes the most permanent. Data is where you can kill any company with the smallest number of errors. That’s why I always end up in charge of the databases – I just don’t trust anybody else with the power. (Also, I’m an adrenaline junkie who gets off on high stakes. I could gamble or I could do databases, and I know too much math to gamble.) Literally, nobody loves databases. If they tell you anything different, they are either lying to you or they’re nowhere near production.

I got into databases from operations. I’ve been on call since I was 17, over half my life. I am really stubborn, have an inflated sense of my own importance and like solving problems, so operations was a natural fit. I started diving on the databases grenades when I worked at Linden Lab and MySQL was repeatedly killing us. It seemed impossible, so I volunteered to own it. I’ve been doing that ever since.

Percona: You’re presenting a session called “Debugging with Logs (and Other Events)”. What is the importance of logs for databases and DBAs?

Charity: I mean, it’s not really about logs. I might change my title. It’s about understanding WTF is going on. Logs are one way of extracting events in a format that humans can understand. My startup is all about “what’s happening right now; what’s just happened?” Which is something we are pretty terrible at as an industry. Databases are just another big complex piece of software, and the only reason we have DBAs is because the tooling has historically been so bad that you had to specialize in this piece of software as your entire career.

The tooling is getting better. With the right tools, you don’t have to skulk around like a detective trying to model and predict what might be happening, as though it were a living organism. You can simply sum up the lock time being held, and show what actor is holding it. It’s extremely important that we move away from random samples and pre-aggregated metrics, toward dynamic sampling and rich events. That’s the only way you will ever truly understand what is happening under the hood in your database. That’s part of what my company was built to do.

Percona: How can logging be used in debugging to track down database issues? Can logging affect performance?

Charity: Of course logging can affect performance. For any high traffic website, you should really capture your logs (events) by streaming tcpdump over the wire. Most people know how to do only one thing with db logs: look for slow queries. But those slow queries can be actively misleading! A classic example is when somebody says “this query is getting slow” and they look at source control and the query hasn’t been modified in years. The query is getting slower either because the data volume is growing (or data shape is changing), or because reads can yield but writes can’t, and the write volume has grown to the point where reads are spending all their time waiting on the lock.

Yep, most db logs are terrible.

Percona: What do you want attendees to take away from your session? Why should they attend?

Charity: Lots of cynicism. Everything in computers is terrible, but especially so with data. Everything is a tradeoff, all you can hope to do is be aware of the tradeoffs you are making, and what costs you are incurring whenever you solve a given problem. Also, I hope people come away trembling at the thought of adding any more strings of logs to production. Structure your logs, people! Grep is not the answer to every single question! It’s 2017, nearly 2018, and unstructured logs do not belong anywhere near production.

Percona: What are you most looking forward to at Percona Live Europe 2017?

Charity: My coauthor Laine and I are going to be signing copies of our book Database Reliability Engineering and giving a short keynote on the changes in our field. I love the db community, miss seeing Mark Callaghan and all my friends from the MongoDB and MySQL world, and cannot wait to laugh at them while they cry into their whiskey about locks or concurrency or other similar nonsense. Yay!

Want to find out more about Charity and database debugging? Register for Percona Live Europe 2017, and see her talk Debugging with Logs (and Other Events). Register now to get the best price! Use discount code SeeMeSpeakPLE17 to get 10% off your registration.

Percona Live Open Source Database Conference Europe 2017 in Dublin is the premier European open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, MariaDB, MongoDB, time series database, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference Europe will be September 25-27, 2017 at the Radisson Blu Royal Hotel, Dublin.

Percona Blog Poll Results: What Database Engine Are You Using to Store Time Series Data?

September 15, 2017 - 4:55pm

In this blog post, we talk about the results of Percona’s time series database poll “What Database Engine Are You Using to Store Time Series Data?”

Time series data is some of the most actionable data available when it comes to analyzing trends and making predictions. Simply put, time series data is data that is indexed not just by value, but by time as well – allowing you to view value changes over time as they occur. Obvious uses include the stock market, web traffic, user behavior, etc.

With the increasing number of smart devices in the Internet of Things (IoT), being able to track data over time is more and more important. With time series data, you can measure and make predictions on things like energy consumption, pH values, water consumption, data from environment-aware machines like smart cars, etc. The sensors used in IoT devices and systems generate huge amounts of time-series data.

A couple of months back, we ran a poll on what time series databases were being used by the community. We wanted to quickly report on the results from that poll.

First the results:

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

Here are some thoughts:

  • The fact that this blog started as a place exclusively for MySQL information probably explains why we skewed high with MySQL respondents – still that doesn’t mean it doesn’t reflect reality.
  • Elastic seems the most common after that, possibly to tie in with MySQL use.
  • InfluxDB as next popular. This suggests that Paul Dix’s chosen business model is “AOK” so to speak. It is unclear if people use the open source version, or outgrow it and switch to the commercial stuff.
  • We lumped together “general purpose NoSQL engine”, but in some cases examples like Cassandra are targeted at time series. Notice that KairosDB, which is built on top of Cassandra itself, is not as popular in our survey.
  • Prometheus is the canonical “not a time series database”, but still used as one. I have a feeling alongside Graphite, this is monitoring related.
  • ClickHouse time series is a new time series database and it is surprising that it gets such high rankings. It was also relatively unknown outside of its home country Russia, but now we are seeing uses at places like CloudFlare and more.

Thanks for participating in the poll. We’re still running a poll on operating systems, so don’t forget to register your responses. We’ll report on that poll soon, with a new one on the way shortly.

The MySQL High Availability Landscape in 2017 (the Babies)

September 15, 2017 - 4:44pm

This post is the third of a series focusing on the MySQL high availability solutions available in 2017.

The first post looked at the elders, the technologies that have been around for more than ten years. The second post talked about the adults, the more recent and mature technologies. In this post, we will look at the emerging MySQL high availability solutions. The “baby” MySQL high availability solutions I chose for the blog are group replication, proxies and distributed storage.

Group replication

Group replication is the Oracle response to Galera. The term “InnoDB cluster” means a cluster using group replication. The goal is offering similar functionalities, especially the almost synchronous feature.

At first glance, the group replication implementation appears to be rather elegant. The basis is the GTID replication mode. The nodes of an InnoDB cluster share a single UUID sequence. To control the replication lag, Oracle added a flow control layer. While Galera requires unanimity, group replication only requires a majority. The majority protocol in use is derived from Paxos. A majority protocol makes the cluster more resilient to a slow node.

Like Galera, when you add flow control you needs queues. Group replication has two queues. There is one queue for the certification process and one queue for the appliers. What is interesting in the Oracle approach is the presence of a throttling mechanism. When flow control is requested by a node, instead of halting the processing of new transactions like Galera, the rate of transactions is throttled. That can help to meet strict timing SLAs.

Because the group replication logic is fairly similar to Galera, they suffer from the same limitations: large transactions, latency and hot rows. Group replication is recent. The first GA version is 5.7.17, from December 2016. It is natural then that it has a number of sharp edges. I won’t extend too much here, but if you are interested read here, here. I am confident over time group replication will get more polished. Some automation, like the Galera SST process, would also be welcome.

Given the fact the technology is recent, I know no Percona customer using group replication in production.

Proxies

Intelligent proxies can be viewed as another type of upcoming MySQL high availability solution. It is not strictly MySQL. In fact, this solution is more of a mix of other solutions.

The principle is simple: you connect to a proxy, and the proxy directs you to a valid MySQL server. The proxy has to monitor the states of the back-end servers, and maybe even perform actions on them. Of course, the proxy layer must not become a single point of failure. There should be more than one proxy host for basic HA. If more that one proxy is used at the same time, they’ll have to agree on the state of the back-end servers. For example, on a cluster using MySQL async replication, if the proxies are not sending the write traffic to the same host, things will quickly become messy.

There are few ways of achieving this. The simplest solution is an active-passive setup where only one proxy is active at a given time. You’ll need some kind of logic to determine if the proxy host is available or not. Typical choices will use tools like keepalived or Pacemaker.

A second option is to have the proxies agree to a deterministic way of identifying a writer node. For example, with a Galera-based cluster, the sane back-end node with the lowest wsrep_local_index could be the writer node.

Finally, the proxies could talk to each other and coordinate. Such an approach is promising. It could allow a single proxy to perform the monitoring and inform its peers of the results. It would allow also coordinated actions on the cluster when a failure is detected.

Currently, there are a few options in terms of proxies:

  • ProxySQL: An open-source that understands the MySQL protocol and can do R/W splitting, query caching, sharding, SQL firewalling, etc. A new alpha level feature, mirroring, targets the inter-proxy communication need.
  • MaxScale: No longer fully open-source (BSL), but understands the MySQL protocol. Can do R/W splitting, sharding, binlog serving, SQL firewalling, etc.
  • MySQL Router: MySQL Router is an open-source proxy developed by Oracle for InnoDB Cluster (Group replication). It understands the MySQL protocol and also supports the new X protocol. It can do R/W splitting.
  • HAProxy: HAProxy is a popular open-source TCP level proxy. It doesn’t understand the MySQL protocol. It needs helper scripts, responding to HTTP type requests, to figure the node’s health.

To these open source proxies, there are two well-known commercial proxy-like solutions, Tungsten and ScaleArc. Both of these technologies are mature and are not “babies” in terms of age and traction. On top of these, there are also numerous hardware-based load balancer solutions.

The importance of proxies in MySQL high availability has led Percona to include ProxySQL in the latest releases of Percona XtraDB Cluster. In collaboration with the ProxySQL maintainer, René Cannaò, features have been added to make ProxySQL aware of the Percona XtraDB Cluster state.

Proxies are already often deployed in MySQL high availability solutions. Often proxies are only doing load balancing type work. We start to see deployment using proxies for more advanced things, like read/write splitting and sharding.

Distributed storage Replication setup using distributed storage

 

This MySQL high availability solution is a project I am interested in. It is fair to say it is more a “fetus” than a real “baby,” since I know nobody using it in production. You can see this solution as a shared storage approach on steroids.

The simplest solution requires a three-node Ceph cluster. The nodes also run MySQL and the datadir is a Ceph RBD block device. Data in Ceph is automatically replicated to multiple hosts. This built-in data replication is an important component of the solution. Also, Ceph RBD supports snapshots and clones. A clone is a copy of the whole data set that consumes only the data that changed (delta) in terms of storage. Our three MySQL servers will thus not use three full copies of the dataset, but only one full copy and two deltas. As time passes, the deltas grow. When they are too large, we can simply generate new snapshots and clones and be back to day one. The generation of a new snapshot and clone takes a few seconds, and doesn’t require stopping MySQL.

The obvious use case for the distributed storage approach is a read-intensive workload on a very large dataset. The setup can handle a lot of writes. The higher the write load, the more frequently there will be a snapshot refresh. Keep in mind that refreshing a snapshot of a 10 TB data set takes barely more time than for a 1 GB data set.

For that purpose, I wrote an SST script for Percona XtraDB Cluster that works with Ceph. I blogged about it here. I also wrote a Ceph snapshot/clone backup script that can provision a slave from a master snapshot. I’ll blog about how to use this Ceph backup script in the near future.

Going further with distributed storage, multiple MySQL instances could use the same data pages. Ceph would be use as a distributed object store for InnoDB pages. This would allow to build an open-source Aurora like database. Coupled with Galera or Group replication, you could have a highly-available MySQL cluster sharing a single copy of the dataset.

I started to modify MySQL, actually Percona Server for MySQL 5.7, to add support for Ceph/Rados. Rados is the object store protocol of Ceph. There is still a lot of effort needed to make it work. My primary job is not development, so progress is slow. My work can be found (here). The source compiles well but MySQL doesn’t fully start. I need to debug where things are going wrong.

Adding a feature to MySQL like that is an awesome way to learn the internals of MySQL. I would really appreciate any help if you are interested in this project.

Conclusion

Over the three articles in this series, we have covered the 2017 landscape of MySQL high availability solutions. The first focused on the old timers, “the elders”, composed of: replication, shared storage and NDB. The second articles dealt with the solutions that are more recent and have a good traction: Galera and RDS Aurora. The conclusion of the series is the current article, which looked at what could be possibly coming in term of MySQL high availability solutions.

The main goal of this series is to help planning the deployment of MySQL in a highly-available way. I hope it can be used for hints and pointers to get better and more efficient solutions.

This Week in Data with Colin Charles #6: Open Source Summit and Percona Live Europe

September 15, 2017 - 10:25am

Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community.

What a long, packed week! Spent most of it at Open Source Summit North America, while still enjoying the myriad phone calls and meetings you have as a Perconian. In addition to two talks, I also gave a webinar this week on the differences between MySQL and MariaDB (I’ll post a blog Q&A in the near future).

Percona Live Europe Dublin

Have you registered for Percona Live Europe Dublin? If no, what’s keeping you from doing so?

In addition, I think it’s definitely worth registering for the community dinner. You can hang out with other like-minded folks, and see the lightning talks (we may announce more as time gets closer).

See what the MySQL Team will speak about at Percona Live Dublin. You’ll notice that a few of the releases I mention below have Percona Live Europe talks associated with them.

Releases Link List Feedback

On a somber note, former Perconian and all round great community member, Jaakko Pesonen passed away. Shlomi Noach commented online: Remembering Jaakko Pesonen.

I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot.

Percona Live Europe Featured Talks: Monitoring Open Source Databases with Icinga with Bernd Erk

September 14, 2017 - 3:31pm

Welcome to another post in our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post is with Bernd Erk, CEO of Icinga. His talk is titled Monitoring Open Source Databases with Icinga. Icinga is a popular open source successor of Nagios that checks hosts and services, and notifies you of their statuses. But you also need metrics for performance and growth to deal with your scaling needs. Adding conditional behaviors and configuration in Icinga is not just intuitive, but also intelligently adaptive at runtime. In our conversation, we how to intelligently monitor open source databases:

Percona: How did you get into database technology? What do you love about it?

Bernd: I started a position as a junior systems engineer in a large German mail order company. They were totally committed to Oracle databases and the tool stack around it. As Linux gained more and more attention, we became aware of MySQL very early and were fascinated by the simplicity of installation and administration. There were of course so many things Oracle had in those days that MySQL didn’t have, but most of our uses also didn’t require those extra (and of course expensive) features.

Percona: You’re presenting a session called “Monitoring Open Source Databases with Icinga”. Why is monitoring databases important, and what sort of things need to be monitored?

Bernd: Usually databases are a very important part of an IT infrastructure, and need to be online 24/7. I also had the personal experience of database downtime putting a lot of pressure on both the organization in general and the team in charge. Since most open source databases provide very good interfaces, it is not so hard to figure out if they are up and running. Like in many monitoring arenas, knowing what to monitor is the important information.

In addition to the basic local and remote availability checks, monitoring database replication is very important. We often see environments where the standby slave is outdated by, years or not able to keep up with the incoming load. From there you can go into databases and application metrics to learn more about performance and IO behavior.

Percona: Why are you using Icinga specifically? What value does it provide above other monitoring solutions?

Bernd: I’ve been involved with Icinga from the beginning, so it is my number one choice in open source monitoring. In my opinion, the great advance of Icinga 2 is the simplicity of legacy systems like Nagios (or Icinga 1), but also its support for complex environments (such as application-based clustering). There is also the live configuration of the Icinga 2 monitoring core through our REST API. With all the supported tools for metrics, logs and management around it, for me Icinga 2 is the best match for open source monitoring.

Percona: What do you want attendees to take away from your session? Why should they attend?

Bernd: Attendees will get a short overview on Icinga 2, and why it is different to Nagios (Icinga 1). I will also guide them through practical monitoring examples and show implemented checks in a live demo. After my talk, they should be able to adapt and extend on-premise or cloud monitoring with Icinga 2 using the default open source plugins.

Percona: What are you most looking forward to at Percona Live Europe 2017?

Bernd: Getting together with the great database community in all aspects, and going to Dublin (to be honest). I have never been there, and so it is my first time.

Want to find out more about Bernd and database monitoring? Register for Percona Live Europe 2017, and see his talk Monitoring Open Source Databases with Icinga. Register now to get the best price! Use discount code SeeMeSpeakPLE17 to get 10% off your registration.

Percona Live Open Source Database Conference Europe 2017 in Dublin is the premier European open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, MariaDB, MongoDB, time series database, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference Europe will be September 25-27, 2017 at the Radisson Blu Royal Hotel, Dublin.

Percona Server for MongoDB 3.4.7-1.8 is Now Available

September 14, 2017 - 11:08am

Percona announces the release of Percona Server for MongoDB 3.4.7-1.8 on September 14, 2017. Download the latest version from the Percona web site or the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly-scalable, zero-maintenance downtime database supporting the MongoDB v3.4 protocol and drivers. It extends MongoDB with Percona Memory Engine and MongoRocks storage engine, as well as several enterprise-grade features:

Percona Server for MongoDB requires no changes to MongoDB applications or code.

This release is based on MongoDB 3.4.7 and includes the following additional change:

  • Added packages for Debian 9 (“stretch”)

Percona Live Europe Featured Talks: Visualize Your Data with Grafana Featuring Daniel Lee

September 13, 2017 - 9:47am

Welcome to another post in our series of interview blogs for the upcoming Percona Live Europe 2017 in Dublin. This series highlights a number of talks that will be at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post is with Daniel Lee, a software developer at Grafana. His tutorial is Visualize Your Data With Grafana. This presentation teaches you how to create dashboards and graphs in Grafana and how to use them to gain insight into the behavior of your systems. In our conversation, we discussed how data visualization could benefit your database environment:

Percona: How did you get into database technology? What do you love about it?

Daniel: I’m a developer and my first job was working on a transport logistics system, which was mostly composed of Stored Procedures in SQL Server 2000. Today, I would not build a system with all the logic in Stored Procedures – but that database knowledge is the foundation that I built everything else on. Databases and their data flows will always be the core of most interesting systems. More recently, I have switched from Windows to working with MariaDB on Linux. Grafana Labs uses Percona Server for MySQL for most of our internal applications (worldPing and Hosted Grafana). Working with Grafana also means working with time series databases like Graphite, which is also very interesting.

I enjoy working with data as it is one of the ways to learn how users use a system. Design decisions are theories until you have data to either back them up or disprove them.

Percona: Your presenting a session called “Visualize Your Data With Grafana”. How does monitoring make DBAs life easier, and how do graphs make this information easier to apply for DBAs?

Daniel: Good monitoring provides top-level metrics (throughput, number of errors, performance) for alerting, and other lower-level metrics to allow you to dig into the details and quickly diagnose and resolve an outage. Monitoring also helps you find any constraints (for example, finding bottlenecks for query performance: CPU, row locks, disk, buffer pool size, etc.). Performance monitoring allows you to see trends and lets you know when it is time to scale out or purchase more hardware.

Monitoring can also be used to communicate with business people. It is a way of translating lots of different system metrics into a measurable user experience. Visualizing your data with graphs is a very good way to communicate that information, both within your team and with your business stakeholders. Building dashboards with the metrics that are important to you rather than just the standard checklists (CPU, disk, network etc.) allows you to measure the user experience for your application and to see long-term trends.

Percona: Why Grafana? What does Grafana do better than other monitoring solutions?

Daniel: Grafana is the de facto standard in open source for visualizing time series data. It comes with tons of different ways to visualize your data (graphs, heat maps, gauges). Each data source comes with its own custom query editor that simplifies writing complex queries, and it is easy to create dynamic dashboards that look great on a TV.

Being open source, it can be connected to any data source/database, which makes it easy to unify different data sources in the same dashboard (for example, Prometheus or Graphite data combined with MySQL data). This also means your data is not subject to vendor lock-in like it is in other solutions. Grafana has a large and very active community that creates plugins and dashboards that extend Grafana into lots of niches, as well as providing ways to quickly get started with whatever you want to monitor.

Percona: What do you want attendees to take away from your session? Why should they attend?

Daniel: I want them to know that you can make the invisible visible, with that knowledge start to make better decisions based on data. I hope that my session helps someone take the first step to being more proactive in their monitoring by showing them what can be done with Grafana and other tools in the monitoring space.

In my session, I will give an overview of monitoring and metrics, followed by an intro to Grafana. I plan to show how to monitor MySQL and finish off with a quick look at the new MySQL data source for Grafana.

Percona: What are you most looking forward to at Percona Live Europe 2017?

Daniel: Firstly, it is always great to have an excuse to visit Ireland (I’m an Irishman living in Sweden). I’m also looking forward to getting feedback from the community on Grafana’s new MySQL data source plugin, as well as just talking to people and hearing about their experiences with database monitoring.

Want to find out more about Daniel and data visualization? Register for Percona Live Europe 2017, and see their talk Visualize Your Data With Grafana. Register now to get the best price! Use discount code SeeMeSpeakPLE17 to get 10% off your registration.

Percona Live Open Source Database Conference Europe 2017 in Dublin is the premier European open source event for the data performance ecosystem. It is the place to be for the open source community as well as businesses that thrive in the MySQL, MariaDB, MongoDB, time series database, cloud, big data and Internet of Things (IoT) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference Europe will be September 25-27, 2017 at the Radisson Blu Royal Hotel, Dublin.

Massive Parallel Log Processing with ClickHouse

September 13, 2017 - 1:17am

In this blog, I’ll look at how to use ClickHouse for parallel log processing.

Percona is seen primarily for our expertise in MySQL and MongoDB (at this time), but neither is quite suitable to perform heavy analytical workloads. There is a need to analyze data sets, and a very popular task is crunching log files. Below I’ll show how ClickHouse can be used to efficiently perform this task. ClickHouse is attractive because it has multi-core parallel query processing, and it can even execute a single query using multiple CPUs in the background.

I am going to check how ClickHouse utilizes multiple CPU cores and threads. I will use a server with two sockets, equipped with “Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz” in each. That gives a total of 28 CPU cores / 56 CPU threads.

To analyze workload, I’ll use an Apache log file from one of Percona’s servers. The log has 1.56 billion rows, and uncompressed it takes 274G. When inserted into ClickHouse, the table on disk takes 9G.

How do we insert the data into ClickHouse? There is a lot of scripts to transform Apache log format to CSV, which ClickHouse can accept. As for the base, I used this one:

https://gist.github.com/sepehr/fff4d777509fa7834531

and my modification you can find here:

https://github.com/vadimtk/clickhouse-misc/blob/master/apachelog-to-csv.pl

The ClickHouse table definition:

CREATE TABLE default.apachelog ( remote_host String, user String, access_date Date, access_time DateTime, timezone String, request_method String, request_uri String, status UInt32, bytes UInt32, referer String, user_agent String) ENGINE = MergeTree(access_date, remote_host, 8192)

To test how ClickHouse scales on multiple CPU cores/threads, I will execute the same query by allocating from 1 to 56 CPU threads for ClickHouse processes. This can be done as:

ps -eLo cmd,tid | grep clickhouse-server | perl -pe 's/.* (d+)$/1/' | xargs -n 1 taskset -cp 0-$i

where $i is (N CPUs-1).

We must also take into account that not all queries are equal. Some are easier to execute in parallel than others. So I will test three different queries. In the end, we can’t get around Amdahl’s Law!

The first query should be easy to execute in parallel:

select extract(request_uri,'(w+)$') p,sum(bytes) sm,count(*) c from apachelog group by p order by c desc limit 100

Speedup:

CPUs Time, sec Speedup to 1 CPU 1 823.646 1 2 413.832 1.990291 3 274.548 3.000007 4 205.961 3.999039 5 164.997 4.991885 6 137.455 5.992114 7 118.079 6.975381 8 103.015 7.995399 9 92.01 8.951701 10 82.853 9.941052 11 75.334 10.93326 12 69.23 11.89724 13 63.848 12.90011 14 59.388 13.8689 15 55.433 14.85841 16 52.158 15.79136 17 49.054 16.7906 18 46.331 17.77743 19 43.985 18.72561 20 41.795 19.70681 21 39.763 20.71388 22 38.031 21.65723 23 36.347 22.66063 24 34.917 23.58868 25 33.626 24.49432 26 32.42 25.40549 27 31.21 26.39045 28 30.135 27.33187 29 29.947 27.50346 30 29.709 27.72379 31 29.283 28.1271 32 28.979 28.42217 33 28.807 28.59187 34 28.477 28.9232 35 28.146 29.26334 36 27.921 29.49916 37 27.613 29.8282 38 27.366 30.09742 39 27.06 30.43777 40 26.817 30.71358 41 26.644 30.913 42 26.394 31.2058 43 26.215 31.41888 44 25.994 31.686 45 25.762 31.97135 46 25.554 32.23159 47 25.243 32.62869 48 25.102 32.81197 49 24.946 33.01716 50 24.668 33.38925 51 24.537 33.56751 52 24.278 33.92561 53 24.035 34.26861 54 23.839 34.55036 55 23.734 34.70321 56 23.587 34.91949

 

It’s much more interesting to chart these results:

From the chart, we can see that the query scales linearly up to 28 cores. After that, it continues to scale up to 56 threads (but with a lesser slope). I think this is related to the CPU architecture (remember we have 28 physical cores and 56 CPU “threads”). Let’s look at the results again. With one available CPU, the query took 823.6 sec to execute. With all available CPUs, it took 23.6 sec. So the total speedup is 34.9 times.

But let’s consider a query that allows a lesser degree of parallelism. For example, this one:

select access_date c2, count(distinct request_uri) cnt from apachelog group by c2 order by c2 limit 300

This query uses aggregation that counts unique URIs, which I am sure limits the counting process to a single shared structure. So some part of the execution is limited to a single process. I won’t show the full results for all 1 to 56 CPUs, but for one CPU the execution time is 177.715 sec, and for 56 CPUs the execution time is 11.564 sec. The total speedup is 15.4 times.

The speedup chart looks like this:

As we suspected, this query allows less parallelism. What about even heavier queries? Let’s consider this one:

SELECT y, request_uri, cnt FROM (SELECT access_date y, request_uri, count(*) AS cnt FROM apachelog GROUP BY y, request_uri ORDER BY y ASC ) ORDER BY y,cnt DESC LIMIT 1 BY y

In that query, we build a derived table (to resolve the subquery) and I expect it will limit the parallelism even further. And it does: with one CPU the query takes 183.063 sec to execute. With 56 CPUs it takes 28.572 sec. So the speedup is only 6.4 times.

The chart is:

Conclusions

ClickHouse can capably utilize multiple CPU cores available on the server, and query execution is not limited by a single CPU (like in MySQL). The degree of parallelism is defined by the complexity of the query, and in the best case scenario, we see linear scalability with the number of CPU cores. For the scaling on multiple servers you can see my previous post:

https://www.percona.com/blog/2017/06/22/clickhouse-general-analytical-workload-based-star-schema-benchmark/

However, if query execution is serial, it limits the speedup (as described in Amdahl’s Law).

One example is a 1.5 billion record Apache log, and we can see that ClickHouse can execute complex analytical queries within tens of seconds.

cscope: Searching Code Efficiently

September 12, 2017 - 10:58am

In this post, we will discuss how to search code with the help of cscope. Let’s begin by checking its description and capabilities (quoting directly from http://cscope.sourceforge.net/):

Cscope is a developer’s tool for browsing source code.

  • Allows searching code for:
    • all references to a symbol
    • global definitions
    • functions called by a function
    • functions calling a function
    • text string
    • regular expression pattern
    • a file
    • files including a file
  • Curses based (text screen)
  • An information database is generated for faster searches and later reference
  • The fuzzy parser supports C, but is flexible enough to be useful for C++ and Java, and for use as a generalized ‘grep database’ (use it to browse large text documents!)

Of course, developers aren’t the only ones browsing the code (as implied by the tool’s description). In the Support team, we find ourselves having to check code many times. This tool is a great aid in doing so. As you can imagine already, this tool can replace find and grep -R "<keyword(s)>" *, and will even add more functionality! Not only this, but our searches run faster (since they are indexed).

The main focus of this post is to explore cscope’s searching capabilities regarding code, but note that you can also use it for text searches that aren’t linked to function names or symbols (supporting regular expressions) and for file searches. This also means that even if the tool doesn’t recognize a function name, you can still use the text search as a fallback.

There is an online manual page, for quick reference:

http://cscope.sourceforge.net/cscope_man_page.html

To install it under RHEL/CentOS, simply issue:

shell> yum install cscope

You can use cscope with MySQL, Percona Server for MySQL or MariaDB code alike. In my case, I had a VM with Percona Server for MySQL 5.7.18 already available, so I’ve used that for demonstration purposes.

We should first get the source code for the exact version we are working with, and build the cscope database (used by the tool to perform searches):

shell> wget https://www.percona.com/downloads/Percona-Server-LATEST/Percona-Server-5.7.18-15/source/tarball/percona-server-5.7.18-15.tar.gz shell> tar xzf percona-server-5.7.18-15.tar.gz shell> cd percona-server-5.7.18-15 shell> cscope -bR

-b will build the database only, without accessing the CLI; -R will recursively build the symbol database from the directory it’s executed, down. We can also add -q for fast symbol lookup, at the expense of a larger database (we’ll check how much more below).

Now that we have built the cscope database, we will see a new file created: cscope.out. If we used -q, we will also see: cscope.in.out and cscope.po.out. Their sizes depend on the size of the codebase in question. Here are the sizes before and after building the cscope database (with -q):

shell> du -d 1 -h .. 615M ../percona-server-5.7.18-15 shell> cscope -bqR shell> du -h cscope.* 8.2M cscope.in.out 69M cscope.out 103M cscope.po.out shell> du -d 1 -h .. 794M ../percona-server-5.7.18-15

This gives around 30% increase in size while using -q, and around 10% increase without it. Your mileage may vary: be aware of this if you are using it on a test server with many different versions, or if the project size is considerably larger. It shouldn’t be much of a problem, but it’s something to take into account.

Ok, enough preamble already, let’s see it in action! To access the CLI, we can use cscope -d.

A picture is worth a thousand words. The following output corresponds to searching for the MAX_MAX_ALLOWED_PACKET symbol:

If there are multiple potential matches, the tool lists them for our review. If there is only one match, it will automatically open the file, with the cursor at the appropriate position. To check a match, either select it with the arrow keys and hit enter, or use the number/letter listed. When you are done and need to get back to cscope to continue checking other matches, simply exit the text editor (which can be defined by using CSCOPE_EDITOR). To get back to the main menu to modify the search, press CTRL-f. To exit the tool press CTRL-d. Lastly, CTRL-c toggles case insensitive mode on and off.

To show how the tool displays searches with many hits, let’s search for functions that call printf:

We can now see that letters are also used to list options, and that we can hit space to page down for more matches (from a total of 4508).

Lastly, as mentioned before if everything else fails and you are not able to find the function or symbol you need (due to limitations or bugs), you can use the “Find this text string” and “Find this egrep pattern” functionality.

I hope this brief tour of cscope has been useful, and helps you get you started using it. Note that you can use it for other projects, and it can be handy if you need to dive into the Linux kernel too.

Addendum

For even more power, you can read this vim tutorial (http://cscope.sourceforge.net/cscope_vim_tutorial.html), or set up ctags (http://ctags.sourceforge.net/) along with cscope.

Upcoming Webinar September 14, 2017: Supercharge Your Analytics with ClickHouse

September 12, 2017 - 9:03am

Join Percona’s CTO Vadim Tkachenko @VadimTk and Altinity’s Co-Founder, Alexander Zaitsev as they present Supercharge Your Analytics with ClickHouse on Thursday, September 14, 2017, at 10:00 am PDT / 1:00 pm EDT (UTC-7).

Reserve Your Spot

 

ClickHouse is a real-time analytical database system. Even though they’re only celebrating one year as open source software, it has already proved itself ready for serious workloads.

We will talk about ClickHouse in general, some of its internals and why it is so fast. ClickHouse works in conjunction with MySQL – traditionally weak for analytical workloads – and this presentation demonstrates how to make the two systems work together.

There will also be an in-person presentation on How to Build Analytics for 100bn Logs a Month with ClickHouse at the meetup Wednesday, September 13, 2017. RSVP here.

Alexander Zaitsev will also be speaking at Percona Live Europe 2017 on Building Multi-Petabyte Data Warehouses with ClickHouse on Wednesday, September 27 at 11:30 am. Use the promo code “SeeMeSpeakPLE17” for 15% off.

Alexander Zaitsev
Altinity’s Co-Founder
Alexander is a co-founder of Altinity. He has 20 years of engineering and engineering management experience in several international companies. Alexander is expert in high scale analytics systems design and implementation. He designed and deployed petabyte scale data warehouses, including one of earliest ClickHouse deployments outside of Yandex.

Vadim Tkachenko
CTO
Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products.

He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition. Previously, he founded a web development company in his native Ukraine and spent two years in the High Performance Group within the official MySQL support team. Vadim received a BS in Economics and an MS in computer science from the National Technical University of Ukraine. He now lives in California with his wife and two children.

Updating InnoDB Table Statistics Manually

September 11, 2017 - 12:00pm

In this post, we will discuss how to fix cardinality for InnoDB tables manually.

As a support engineer, I often see situations when the cardinality of a table is not correct. When InnoDB calculates the cardinality of an index, it does not scan the full table by default. Instead it looks at random pages, as determined by options innodb_stats_sample_pages, innodb_stats_transient_sample_pages and innodb_stats_persistent_sample_pages, or by the CREATE TABLE option STATS_SAMPLE_PAGES. The default value for persistent statistics is 20. This approach works fine when the number of unique values in your secondary key grows in step with the size of the table. But what if you have a column that has a comparatively small number of unique values? This could be a common service, many-to-many relationship table, for example, or just a table containing a list of sell orders that belong to one of a dozen shops owned by the company. Such tables could grow up to billions of rows with a small (less than 100) number of unique shop IDs.

At some point, InnoDB will report the wrong values for such indexes. Really! If 20 pages have 100 unique shop IDs, how many unique shop IDs would 20000 pages have? 100 times 1000? This seems logical, and after a certain number of rows such indexes will have extraordinarily large cardinality values.

ANALYZE TABLE will not help, because it uses the same algorithm. Increasing the number of “stats” sample pages would help, but it has its own downside: the more pages you have to examine, the slower ANALYZE TABLE runs. While this command is not blocking, it still creates side effects as described in this blog post. And the longer it runs, the less control you have.

Another issue with InnoDB statistics: even if it is persistent and STATS_AUTO_RECALC is set to 0, it still adds values for secondary indexes as shown in lp:1538765. Eventually, after you insert million of rows, your statistics get corrupted. ANALYZE TABLE  can fix it only if you specify a very large number of “stats” sample pages.

Can we do anything about it?

InnoDB stores statistics in the “mysql” database, in the tables innodb_table_stats and innodb_index_stats. Since they are regular MySQL tables, privileged users can access them. We can update them and modify statistics as we like. And these statistics are used by the Optimizer!

I created a small example showing how to do this trick. I used Percona Server for MySQL version 5.7.19, but the trick will work on any supported MySQL and Percona Server for MySQL version.

First, let’s create test tables. The first table has shops, with a few shop profiles with the shop ID and name:

create table shops( shop_id int not null auto_increment primary key, name varchar(32) ) engine=innodb;

The second table refers to the “shops” table:

create table goods( id int not null auto_increment primary key, shop_id int not null, name varchar(32), create_date datetime DEFAULT NULL, key (shop_id, create_date) ) engine=innodb;

Let’s check how many unique shops we have:

mysql> select count(distinct shop_id) from shops; +-------------------------+ | count(distinct shop_id) | +-------------------------+ | 100 | +-------------------------+ 1 row in set (0.02 sec)

With 100 distinct shops, and a key on (shop_id, create_date), we expect cardinality in table goods to be not much different than this query result:

mysql> select count(distinct id) as `Cardinality for PRIMARY`, -> count(distinct shop_id) as `Cardinality for shop_id column in index shop_id`, -> count(distinct shop_id, create_date) as `Cardinality for create_date column in index shop_id` -> from goods *************************** 1. row *************************** Cardinality for PRIMARY: 8000000 Cardinality for shop_id column in index shop_id: 100 Cardinality for create_date column in index shop_id: 169861 1 row in set (2 min 8.74 sec)

However, SHOW INDEX returns dramatically different values for the column shop_id:

mysql> show index from goods; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | goods | 0 | PRIMARY | 1 | id | A | 7289724 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 1 | shop_id | A | 13587 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 2 | create_date | A | 178787 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.09 sec)

ANALYZE TABLE does not help:

mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (0.88 sec) mysql> show index from goods; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | goods | 0 | PRIMARY | 1 | id | A | 7765796 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 1 | shop_id | A | 14523 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 2 | create_date | A | 168168 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.00 sec)

As a result, if we join the two tables, Optimizer chooses the wrong JOIN order and query execution plan:

mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ | 1 | SIMPLE | shops | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 100 | 100.00 | Using where; Using index | | 1 | SIMPLE | goods | NULL | ref | shop_id | shop_id | 4 | test.shops.shop_id | 534 | 11.11 | Using index condition | +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ 2 rows in set, 1 warning (0.13 sec) mysql> P md5sum PAGER set to 'md5sum' mysql> select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); 4a94dabc4bfbfb7dd225bcb50278055b - 31896 rows in set (43.32 sec)

If compared to STRAIGHT_JOIN order:

mysql> explain select goods.* from goods straight_join shops on(goods.shop_id = shops.shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ | 1 | SIMPLE | goods | NULL | range | shop_id | shop_id | 10 | NULL | 31997 | 100.00 | Using index condition | | 1 | SIMPLE | shops | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.goods.shop_id | 1 | 100.00 | Using index | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ 2 rows in set, 1 warning (0.14 sec) mysql> P md5sum PAGER set to 'md5sum' mysql> select goods.* from goods straight_join shops on(goods.shop_id = shops.shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); 4a94dabc4bfbfb7dd225bcb50278055b - 31896 rows in set (7.94 sec)

The time difference for a small 8M row table is around six times! For a big table with many columns, it would be even larger.

Is STRAIGHT_JOIN the only solution for this case?

No! It’s also not a great solution because if the query is complicated and involves more than two tables, it may be affected by bug fixes and improvements in the Optimizer code. Then the query order might not be optimal for new versions and updates. Therefore, you’ll need to test such queries at each upgrade, including minor ones.

So why does ANALYZE TABLE not work? Because the default number of pages it uses to calculate statistics is too small for the difference. You can increase the table option STATS_SAMPLE_PAGES  until you find a proper one. The drawback is that the greater you set STATS_SAMPLE_PAGES, the longer it takes for ANALYZE TABLE to finish. Also, if you update a large portion of the table, you are often affected by lp:1538765. At some point, the statistics will again be inaccurate.

Now let’s try our manual statistics update trick

InnoDB stores its persistent statistics in the tables mysql.innodb_table_stats  and mysql.innodb_index_stats:

mysql> alter table goods stats_persistent=1, stats_auto_recalc=0; Query OK, 0 rows affected (0.11 sec) Records: 0 Duplicates: 0 Warnings: 0 +---------------+------------+---------------------+---------+----------------------+--------------------------+ | database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes | +---------------+------------+---------------------+---------+----------------------+--------------------------+ | test | goods | 2017-09-05 00:21:12 | 7765796 | 34624 | 17600 | +---------------+------------+---------------------+---------+----------------------+--------------------------+ 1 row in set (0.00 sec) mysql> select * from mysql.innodb_index_stats where table_name='goods'; +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | test | goods | PRIMARY | 2017-09-05 00:21:12 | n_diff_pfx01 | 7765796 | 20 | id | | test | goods | PRIMARY | 2017-09-05 00:21:12 | n_leaf_pages | 34484 | NULL | Number of leaf pages in the index | | test | goods | PRIMARY | 2017-09-05 00:21:12 | size | 34624 | NULL | Number of pages in the index | | test | goods | shop_id | 2017-09-05 00:21:12 | n_diff_pfx01 | 14523 | 20 | shop_id | | test | goods | shop_id | 2017-09-05 00:21:12 | n_diff_pfx02 | 168168 | 20 | shop_id,create_date | | test | goods | shop_id | 2017-09-05 00:21:12 | n_diff_pfx03 | 8045310 | 20 | shop_id,create_date,id | | test | goods | shop_id | 2017-09-05 00:21:12 | n_leaf_pages | 15288 | NULL | Number of leaf pages in the index | | test | goods | shop_id | 2017-09-05 00:21:12 | size | 17600 | NULL | Number of pages in the index | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ 8 rows in set (0.00 sec)

And we can update these tables directly:

mysql> update mysql.innodb_table_stats set n_rows=8000000 where table_name='goods'; Query OK, 1 row affected (0.18 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=8000000 where stat_description in('id', 'shop_id,create_date,id') and table_name='goods'; Query OK, 2 rows affected (0.08 sec) Rows matched: 2 Changed: 2 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=100 where stat_description in('shop_id') and table_name='goods'; Query OK, 1 row affected (0.09 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=169861 where stat_description in('shop_id,create_date') and table_name='goods'; Query OK, 1 row affected (0.08 sec) Rows matched: 1 Changed: 1 Warnings: 0

I took index values from earlier, as calculated by this query:

select count(distinct id) as `Cardinality for PRIMARY`, count(distinct shop_id) as `Cardinality for shop_id column in index shop_id`, count(distinct shop_id, create_date) as `Cardinality for create_date column in index shop_id` from goods;

mysql> select * from mysql.innodb_table_stats where table_name='goods'; +---------------+------------+---------------------+---------+----------------------+--------------------------+ | database_name | table_name | last_update | n_rows | clustered_index_size | sum_of_other_index_sizes | +---------------+------------+---------------------+---------+----------------------+--------------------------+ | test | goods | 2017-09-05 00:47:45 | 8000000 | 34624 | 17600 | +---------------+------------+---------------------+---------+----------------------+--------------------------+ 1 row in set (0.00 sec) mysql> select * from mysql.innodb_index_stats where table_name='goods'; +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | database_name | table_name | index_name | last_update | stat_name | stat_value | sample_size | stat_description | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ | test | goods | PRIMARY | 2017-09-05 00:48:32 | n_diff_pfx01 | 8000000 | 20 | id | | test | goods | PRIMARY | 2017-09-05 00:21:12 | n_leaf_pages | 34484 | NULL | Number of leaf pages in the index | | test | goods | PRIMARY | 2017-09-05 00:21:12 | size | 34624 | NULL | Number of pages in the index | | test | goods | shop_id | 2017-09-05 00:49:13 | n_diff_pfx01 | 100 | 20 | shop_id | | test | goods | shop_id | 2017-09-05 00:49:26 | n_diff_pfx02 | 169861 | 20 | shop_id,create_date | | test | goods | shop_id | 2017-09-05 00:48:32 | n_diff_pfx03 | 8000000 | 20 | shop_id,create_date,id | | test | goods | shop_id | 2017-09-05 00:21:12 | n_leaf_pages | 15288 | NULL | Number of leaf pages in the index | | test | goods | shop_id | 2017-09-05 00:21:12 | size | 17600 | NULL | Number of pages in the index | +---------------+------------+------------+---------------------+--------------+------------+-------------+-----------------------------------+ 8 rows in set (0.00 sec)

Now the statistics are up to date, but not used:

mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ | 1 | SIMPLE | shops | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 100 | 100.00 | Using where; Using index | | 1 | SIMPLE | goods | NULL | ref | shop_id | shop_id | 4 | test.shops.shop_id | 534 | 11.11 | Using index condition | +----+-------------+-------+------------+-------+---------------+---------+---------+--------------------+------+----------+--------------------------+ 2 rows in set, 1 warning (0.04 sec)

To finalize the changes, we need to run FLUSH TABLE goods:

mysql> FLUSH TABLE goods; Query OK, 0 rows affected (0.00 sec) mysql> explain select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ | 1 | SIMPLE | goods | NULL | range | shop_id | shop_id | 10 | NULL | 31997 | 100.00 | Using index condition | | 1 | SIMPLE | shops | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.goods.shop_id | 1 | 100.00 | Using index | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+-------+----------+-----------------------+ 2 rows in set, 1 warning (0.28 sec) mysql> P md5sum PAGER set to 'md5sum' mysql> select goods.* from goods join shops using(shop_id) where create_date BETWEEN CONVERT_TZ('2015-11-01 00:00:00', 'MET','GMT') AND CONVERT_TZ('2015-11-07 23:59:59', 'MET','GMT') and goods.shop_id in(4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486); 4a94dabc4bfbfb7dd225bcb50278055b - 31896 rows in set (7.79 sec)

Now everything is good.

But FLUSH TABLE is a blocking operation, right? Won’t it block queries and create a worse scenario than described for ANALYZE TABLE in this post?

At first glance this is true. But we can use the same trick Percona Toolkit uses: set lock_wait_timeout to 1 and call FLUSH in a loop. To demonstrate how it works, I use a similar scenario as described in the ANALYZE TABLE blog post.

First, let’s reset the statistics to ensure our FLUSH works as expected:

mysql> analyze table goods; +------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +------------+---------+----------+----------+ | test.goods | analyze | status | OK | +------------+---------+----------+----------+ 1 row in set (0.38 sec) mysql> show indexes from goods; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | goods | 0 | PRIMARY | 1 | id | A | 7765796 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 1 | shop_id | A | 14523 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 2 | create_date | A | 168168 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.00 sec)

And then update mysql.innodb_*_stats tables manually. Then check that Optimizer still sees outdated statistics:

mysql> update mysql.innodb_table_stats set n_rows=8000000 where table_name='goods'; Query OK, 1 row affected (0.09 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=8000000 where stat_description in('id', 'shop_id,create_date,id') and table_name='goods'; Query OK, 2 rows affected (0.09 sec) Rows matched: 2 Changed: 2 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=100 where stat_description in('shop_id') and table_name='goods'; Query OK, 1 row affected (0.11 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> update mysql.innodb_index_stats set stat_value=169861 where stat_description in('shop_id,create_date') and table_name='goods'; Query OK, 1 row affected (0.10 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> show indexes from goods; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | goods | 0 | PRIMARY | 1 | id | A | 7765796 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 1 | shop_id | A | 14523 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 2 | create_date | A | 168168 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.00 sec)

Now let’s start a long running query in one session that blocks our FLUSH TABLE command:

mysql> select sleep(1) from goods limit 1000, 300;

And let’s run FLUSH TABLE in a loop:

sveta@Thinkie:~/build/ps-5.7/mysql-test$ until (`mysqlmtr -P13001 -e "set lock_wait_timeout=1; flush table goods;" test`); do sleep 1; done ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction ERROR 1205 (HY000) at line 1: Lock wait timeout exceeded; try restarting transaction ...

Now let’s ensure we can access the table:

mysql> select * from goods order by id limit 10; ^C

We cannot! We cannot even connect to the database where the table is stored:

sveta@Thinkie:~/build/ps-5.7/mysql-test$ mysqlmtr -P13001 test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A ^C

The reason for this is that while the FLUSH TABLE command was killed due to the metadata lock wait timeout, it also requested table lock for flushing and blocked other incoming queries.

But we can enclose FLUSH TABLE into LOCK TABLE ... WRITE; ... UNLOCK TABLES; operations. In this case, the LOCK TABLE command gets blocked until all queries release metadata lock on the table. Then it exclusively locks the table, FLUSH TABLE runs and then the script immediately unlocks the table. Since closing the session causes an implicit unlock, I used a PHP one-liner to have everything in a single session:

$ php -r ' > $link = new mysqli("127.0.0.1", "root", "", "test", 13001); > $link->query("set lock_wait_timeout=1"); > while(!$link->query("lock table goods write")) {sleep(1);} > $link->query("flush table goods"); > $link->query("unlock tables");'

We can confirm if a parallel session can access the table:

mysql> select * from goods order by id limit 10; +----+---------+----------------------------------+---------------------+ | id | shop_id | name | create_date | +----+---------+----------------------------------+---------------------+ | 1 | 58 | 5K0z2sHTgjWKKdryTaniQdZmjGjA9wls | 2015-09-19 00:00:00 | | 2 | 17 | xNll02kgUTWAFURj6j5lL1zXAubG0THG | 2013-10-19 00:00:00 | | 3 | 30 | clHX7uQopKmoTtEFH5LYBgQncsxRtTIB | 2017-08-01 00:00:00 | | 4 | 93 | bAzoQTN98AmFjPOZs7PGfbiGfaf9Ye4b | 2013-02-24 00:00:00 | | 5 | 20 | rQuTO5GHjP60kDbN6WoPpE2S8TtMbrVL | 2017-08-05 00:00:00 | | 6 | 37 | WxqxA5tBHxikaKbuvbIF84H9QuaCnqQ3 | 2013-10-18 00:00:00 | | 7 | 13 | DoYnFpQZSVV8UswBsWklgGBUc8zW9mVW | 2017-02-06 00:00:00 | | 8 | 81 | dkNxMQyZNZuTrONEX4gxRLa0DOedatIs | 2015-07-05 00:00:00 | | 9 | 12 | Z0t2uQ9itexpPf01KUpa7qBWlT5fBmXR | 2014-06-25 00:00:00 | | 10 | 90 | 6urABBQyaUVVyxljvd11D3kUxbdDRPRV | 2013-10-23 00:00:00 | +----+---------+----------------------------------+---------------------+ 10 rows in set (0.00 sec) mysql> update goods set name='test' where id=100; Query OK, 1 row affected (0.08 sec) Rows matched: 1 Changed: 1 Warnings: 0

After the PHP script finishes its job, statistics are corrected:

mysql> show index from goods; +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | goods | 0 | PRIMARY | 1 | id | A | 8000000 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 1 | shop_id | A | 100 | NULL | NULL | | BTREE | | | | goods | 1 | shop_id | 2 | create_date | A | 169861 | NULL | NULL | YES | BTREE | | | +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.00 sec)

Conclusion

We can manually update persistent InnoDB statistics to fix Optimizer plans for our queries, with almost no impact on a live server.

Visit Percona Store


General Inquiries

For general inquiries, please send us your question and someone will contact you.