Buy Percona SupportBuy Now
Subscribe to Latest MySQL Performance Blog posts feed
Updated: 1 hour 15 min ago

Benchmark MongoDB with sysbench

May 13, 2016 - 9:17am

In this blog post, we’ll discuss how to benchmark MongoDB with sysbench.

In an earlier post, I mentioned our use of sysbench-mongodb (via this fork) to run benchmarks of MongoDB servers. I now want to share our work extending sysbench to make it work with MongoDB.

If you’re not familiar with sysbench, it’s a great project developed by Alexey Kopytov that lets you run different types of benchmarks (referred to as “tests” by the tool), including database benchmarks. The database tests are implemented in Lua scripts, which means you can customize them as needed (or even write new ones from scratch) – something useful for simulating specific workloads.

All of the database tests in sysbench assume an SQL-based database, so instead of trying to shoehorn MongoDB tests into this framework I modified the connect/disconnect functions to handle MongoDB, and then implemented new functions specific for this database.

You can find the work (which is still in progress but usable, and in fact currently used by us in benchmarks) on the dev-mongodb-support-1.0 branch of our sysbench fork.

To use it, you just need to specify the –mongo-url argument (others too, as needed, but this is the one that must be present for sysbench to detect a MongoDB test is requested), and then provide the path to the Lua script you want to run. The following is an example:

sysbench --mongo-write-concern=1 --mongo-url="mongodb://localhost" --mongo-database-name=sbtest --test=sysbench/sysbench/tests/mongodb/oltp.lua --oltp_table_size=60000000 --oltp_tables_count=16 --num-threads=512 --rand-type=pareto --report-interval=10 --max-requests=0 --max-time=600 --oltp-point-selects=10 --oltp-simple-ranges=1 --oltp-sum-ranges=1 --oltp-order-ranges=1 --oltp-distinct-ranges=1 --oltp-index-updates=1 --oltp-non-index-updates=1 --oltp-inserts=1 run

To build this branch, you’ll first need to build and install (or otherwise obtain) the mongo-c-driver project, as that is what we use to connect to MongoDB. Once that’s done, building is just a matter of running the following commands from the repo’s root:

./autogen.sh ./configure make sudo make install #optionally

The changes should not affect the other database tests in sysbench, though I have only verified that the MySQL ones continue to work.

Right now, the workload from sysbench-mongodb is implemented in Lua scripts (oltp.lua), and work is in progress to allow freeform operations to be created with new Lua scripts (by providing functions that take JSON as the argument). As an alternative, you may want to check out this much-less-tested (and currently unstable) branch based on luamongo. It already supports the creation of arbitrary workloads in Lua. In this case, you also need to build luamongo, which is included.

With either branch, you can add new tests by implementing new Lua scripts (though the dev-mongodb-support-1.0 branch still needs a few functions implemented on the C side to support arbitrary operations from the Lua side).

We think there are still some types of operations needed to improve sysbench’s usefulness for MongoDB, such as queries involving arrays, union, the $in operator, geospatial operators, and in place updates.

We hope you find this useful, and we welcome suggestions and bug reports to improve it.

Happy benchmarking!

ProxySQL versus MaxScale for OLTP RO workloads

May 12, 2016 - 10:52am

In this blog post, we’ll discuss ProxySQL versus MaxScale for OLTP RO workloads.

Continuing my series of READ-ONLY benchmarks (you can find the other posts here: https://www.percona.com/blog/2016/04/07/mysql-5-7-sysbench-oltp-read-results-really-faster/ and https://www.percona.com/blog/2016/03/28/mysql-5-7-primary-key-lookup-results-is-it-really-faster), in this post I want to see how much overhead a proxy adds. At this

In my opinion, there are only two solid proxy software options for MySQL at the moment: ProxySQL and MaxScale. In the past, there was also MySQL Proxy, but it is pretty much dead for now. Its replacement, MySQl Router, is still in the very early stages and seriously lacks any features that would compete with ProxySQL and MaxScale. This will most likely change in the future – when MySQL Router adds more features, I will reevaluate them then!

To test the proxies, I will start with a very simple setup to gauge basic performance characteristics. I will use a sysbench client and proxy running on the same box. Sysbench connects to the proxy via local socket (for minimal network and TCP overhead), and the proxy is connected to a remote MySQL via a 10Gb network. This way, the proxy and sysbench share the same server resources.

Other parameters:

  • CPU: 56 logical CPU threads servers Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
  • sysbench ten tables x 10mln rows, Pareto distribution
  • OS: Ubuntu 15.10 (Wily Werewolf)
  • MySQL 5.7
  • MaxScale version 1.4.1
  • ProxySQL version 1.2.0b

You can find more details about benchmarks, scripts and configs here: https://github.com/Percona-Lab/benchmark-results/tree/201603-mysql55-56-57-RO/remote-OLTP-proxy-may.

An important parameter to consider is how much of the CPU resources you allocate for a proxy. Both ProxySQL and MaxScale allow you to configure how many threads they can use to process user requests and to route queries. I’ve found that 16 threads for ProxySQL 8 threads for  MaxScale is optimal (I will also show 16 threads for MaxScale in this). Both proxies also allow you to setup simple load-balancing configurations, or to work in read-write splitting mode. In this case, I will use simple load balancing, since there are no read-write splitting requirements in a read-only workload).

ProxySQL

First result: How does ProxySQL perform compared to vanilla MySQL 5.7?

As we can see, there is a noticeable drop in performance with ProxySQL. This is expected, as ProxySQL does extra work to process queries. What is good though is that ProxySQL scales with increasing user connections.

One of the tricks that ProxySQL has is a “fast-forward” mode, which minimizes overhead from processing (but as a drawback, you can’t use many of the other features). Out of curiosity, let’s see how the “fast-forward” mode performs:

MaxScale

Now let’s see what happens with MaxScale. Before showing the next chart, let me not it contains “error bars,” which are presented as vertical bars. Basically, an “error bar” shows a standard deviation: the longer the bar, the more variation was observed during the experiment. We want to see less variance, as it implies more stable performance.

Here are results for MaxScale versus ProxySQL:

We can see that with lower numbers of threads both proxies are nearly similar, but MaxScale has a harder time scaling over 100 threads. On average, MaxScale’s throughput is worse, and there is a lot of variation. In general, we can see that MaxScale demands more CPU resources and uses more of the CPU per request (compared to ProxySQL). This holds true if we run MaxScale with 16 threads (instead of 8):

MaxScale with 16 threads does not handle the workload well, and there is a lot of variation along with some visible scalability issues.

To summarize, here is a chart with relative performance (vanilla MySQL 5.7 is shown as 1):

While this chart does show that MaxScale has less overhead from 1-6 threads, it doesn’t scale as user load increases.

Quick start MySQL testing using Docker (on a Mac!)

May 11, 2016 - 9:38am

In this post, we’ll discuss how you can quick start MySQL testing using Docker, specifically in a Mac environment.

Like a lot of people, I’m hearing a lot about Docker and it’s got me curious. The Docker ecosystem seems to be moving quickly, however, and simple “getting started” or “how-to” type articles that are easy to find for well-established technologies seem to be out-of-date or non-existent for Docker. I’ve been playing with Docker on Mac for a bit, but it is definitely a second-class citizen in the Docker world. However, I saw Giuseppe’s blog on the new Docker beta for Mac and decided to try it for myself. These steps work for the beta version on a Mac (and probably Windows), but they should work with Linux as well (using the GA release, currently Docker 1.11.1).

The new Docker beta for Mac requires that you register for the beta program, and receive a download code from Docker. I got mine in about a day, but I would assume it won’t be long before the full version is released.

Once installed, I needed to setup some Docker containers for common MySQL versions so that I can easily have some sandboxes. The method I used is below:

jayj@~ [510]$ docker network create test 90005b3ffa9fef1f817ee4965e794a567404c9a8d5bf07320514e7d848d59ff9 jayj@~ [511]$ docker run --name=mysql57 --net=test -e MYSQL_ALLOW_EMPTY_PASSWORD=yes -d mysql/mysql-server:5.7 6c80fa89610dbd5418ba474ad7d5451cd061f80a8a72ff2e718341827a08144b jayj@~ [512]$ docker run -it --rm --net=test -e MYSQL_HOST=mysql57 mysql/shell init Creating a Classic Session to root@mysql57:3306 Enter password: No default schema selected. enableXProtocol: Installing plugin mysqlx... enableXProtocol: done

A quick summary of what I did above:

  1. I created a network called “test” for my containers to share, essentially this is a dedicated private network between containers.  I like this because multiple containers can listen on the same port and I don’t have to fight with ports on my host OS.
  2. I started a MySQL 5.7 image from Oracle’s official MySQL Docker container bound to that test network.
  3. I used the MySQL/shell image (also from Oracle) to initialize the mysqlx plugin on my 5.7 server. Notice I didn’t enter a password because I created the server without one (insecure, but it’s a sandbox).

The shell init uses a temporary container that is removed (–rm) after the run, so you don’t pollute your docker ps -a a output.

So, now I want to be able to use the standard MySQL command line and/or the new MySQL shell to access this container.  To  make this really clean, I added some bash aliases:

alias mysqlsh='docker run -it --rm --net=test mysql/shell' alias mysql='docker run -it --rm -e MYSQL_ALLOW_EMPTY_PASSWORD=yes --net=test --entrypoint="mysql" mysql/mysql-server:5.7'

With these in effect, I can call them directly and pass normal command line options to connect to my mysql57 image just as if I was using a native MySQL CLI binary.

Using the MySQL CLI from the 5.7 image:

jayj@~ [524]$ mysql -h mysql57 Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 4 Server version: 5.7.12 MySQL Community Server (GPL) Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> show schemas; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | sys | +--------------------+ 4 rows in set (0.01 sec)

Using the MySQL shell:

jayj@~ [527]$ mysqlsh -h mysql57 -u root --session-type=node Creating a Node Session to root@mysql57:33060 Enter password: No default schema selected. Welcome to MySQL Shell 1.0.3 Development Preview Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help', 'h' or '?' for help. Currently in JavaScript mode. Use sql to switch to SQL mode and execute queries. mysql-js> sql Switching to SQL mode... Commands end with ; mysql-sql> show schemas; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | sys | +--------------------+ 4 rows in set (0.00 sec) mysql-sql>

Now if I want to run check MySQL 5.5 for something, I can just do this:

jayj@~ [530]$ docker run --name=mysql55 --net=test -e MYSQL_ALLOW_EMPTY_PASSWORD=yes -d mysql/mysql-server:5.5 Unable to find image 'mysql/mysql-server:5.5' locally 5.5: Pulling from mysql/mysql-server a3ed95caeb02: Already exists ffe36b360c6d: Already exists 646f220a8b5d: Pull complete ed65e4fea7ed: Pull complete d34b408b18dd: Pull complete Digest: sha256:12f0b7025d1dc0e7b40fc6c2172106cdf73b8832f2f910ad36d65228d9e4c433 Status: Downloaded newer image for mysql/mysql-server:5.5 6691dd9d42c73f53baf2968bcca92b7f4d26f54bb01d967be475193305affd4f jayj@~ [531]$ mysql -h mysql55 Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 1 Server version: 5.5.49 MySQL Community Server (GPL) Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql> show schemas; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | +--------------------+ 3 rows in set (0.00 sec)

or, Percona Server:

jayj@~ [534]$ docker run --name=ps57 --net=test -e MYSQL_ALLOW_EMPTY_PASSWORD=yes -d percona/percona-server:5.7 Unable to find image 'percona/percona-server:5.7' locally 5.7: Pulling from percona/percona-server a3ed95caeb02: Pull complete a07226856d92: Pull complete eee62d87a612: Pull complete 4c6755120a98: Pull complete 10eab0da5972: Pull complete d5159a6502a4: Pull complete e595a1a01d00: Pull complete Digest: sha256:d57f0ce736f5403b1714ff8d1d6b91d5a7ee7271f30222c2bc2c5cad4b4e6950 Status: Downloaded newer image for percona/percona-server:5.7 9db503852747bc1603ab59455124663e8cedf708ac6d992cff9b43e2fbebd167 jayj@~ [537]$ mysql -h ps57 Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 2 Server version: 5.7.10-3 Percona Server (GPL), Release 3, Revision 63dafaf Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql>

So all this is nice – once the images are cached locally, spinning new containers up and down is painless and fast. All this sandbox work is cleanly separated from my workstation OS. There are probably other things I’d want to be able to do with this setup that I haven’t figured out yet (e.g., loading data files, running code to connect to these containers, etc.) – but I’ll figure those out in the future.

Query Rewrite plugin can harm performance

May 10, 2016 - 10:53am

In this blog post, we’ll discuss how the Query Rewrite plugin can harm performance.

MySQL 5.7 comes with Query Rewrite plugin, which allows you to modify queries coming to the server. (You can view the details here: https://dev.mysql.com/doc/refman/5.7/en/rewriter-query-rewrite-plugin.html.)

It is based on the audit plugin API, and unfortunately it suffers from serious scalability issues (which seems to be the case for all API-based audit plugins).

I want to share the results for sysbench OLTP RO with and without the query rewrite plugin — but with one very simple rewrite rule, which doesn’t affect any queries. This is the rule from the documentation:

INSERT INTO query_rewrite.rewrite_rules (pattern, replacement) -> VALUES('SELECT ?', 'SELECT ? + 1');

There are results for both cases:

As you can see, the server with the Query Rewrite plugin can’t scale after 100 threads.

When we look at the PMP profile, it shows the following:

170 __lll_lock_wait,__GI___pthread_mutex_lock,native_mutex_lock,my_mutex_lock,inline_mysql_mutex_lock,plugin_unlock_list,mysql_a udit_release,handle_connection,pfs_spawn_thread,start_thread,clone 164 __lll_lock_wait,__GI___pthread_mutex_lock,native_mutex_lock,my_mutex_lock,inline_mysql_mutex_lock,plugin_foreach_with_mask,m ysql_audit_acquire_plugins,mysql_audit_notify,invoke_pre_parse_rewrite_plugins,mysql_parse,dispatch_command,do_command,handle_connec tion,pfs_spawn_thread,start_thread,clone 77 __lll_lock_wait,__GI___pthread_mutex_lock,native_mutex_lock,my_mutex_lock,inline_mysql_mutex_lock,plugin_lock,acquire_plugin s,plugin_foreach_with_mask,mysql_audit_acquire_plugins,mysql_audit_notify,invoke_pre_parse_rewrite_plugins,mysql_parse,dispatch_comm and,do_command,handle_connection,pfs_spawn_thread,start_thread,clone 12 __lll_unlock_wake,__pthread_mutex_unlock_usercnt,__GI___pthread_mutex_unlock,native_mutex_unlock,my_mutex_unlock,inline_mysq l_mutex_unlock,plugin_unlock_list,mysql_audit_release,handle_connection,pfs_spawn_thread,start_thread,clone 10 __lll_unlock_wake,__pthread_mutex_unlock_usercnt,__GI___pthread_mutex_unlock,native_mutex_unlock,my_mutex_unlock,inline_mysq l_mutex_unlock,plugin_lock,acquire_plugins,plugin_foreach_with_mask,mysql_audit_acquire_plugins,mysql_audit_notify,invoke_pre_parse_ rewrite_plugins,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone 10 __lll_unlock_wake,__pthread_mutex_unlock_usercnt,__GI___pthread_mutex_unlock,native_mutex_unlock,my_mutex_unlock,inline_mysq l_mutex_unlock,plugin_foreach_with_mask,mysql_audit_acquire_plugins,mysql_audit_notify,invoke_pre_parse_rewrite_plugins,mysql_parse, dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone 7 __lll_lock_wait,__GI___pthread_mutex_lock,native_mutex_lock,my_mutex_lock,inline_mysql_mutex_lock,Table_cache::lock,open_tab le,open_and_process_table,open_tables,open_tables_for_query,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command ,do_command,handle_connection,pfs_spawn_thread,start_thread,clone 6 __GI___pthread_mutex_lock,native_mutex_lock,my_mutex_lock,inline_mysql_mutex_lock,plugin_unlock_list,mysql_audit_release,han dle_connection,pfs_spawn_thread,start_thread,clone 6 __GI___pthread_mutex_lock,native_mutex_lock,my_mutex_lock,inline_mysql_mutex_lock,plugin_foreach_with_mask,mysql_audit_acqui re_plugins,mysql_audit_notify,invoke_pre_parse_rewrite_plugins,mysql_parse,dispatch_command,do_command,handle_connection,pfs_spawn_t hread,start_thread,clone

So clearly it’s related to a mutex acquired in the audit plugin API code. I filed a bug (https://bugs.mysql.com/bug.php?id=81298), but it’s discouraging to see that while the InnoDB code is constantly being improved for better scaling, other parts of the server can still suffer from global mutexes.

Percona Server 5.7 parallel doublewrite

May 9, 2016 - 1:35pm

In this blog post, we’ll discuss the ins and outs of Percona Server 5.7 parallel doublewrite.

After implementing parallel LRU flushing as described in the previous post, we went back to benchmarking. At first, we tested with the doublewrite buffer turned off. We wanted to isolate the effect of the parallel LRU flusher, and the results validated the design. Then we turned the doublewrite buffer back on and saw very little, if any, gain from the parallel LRU flusher. What happened? Let’s take a look at the data:

We see that the doublewrite buffer mutex is gone as expected and that the top waiters are the rseg mutexes and the index lock (shouldn’t this be fixed in 5.7?). Then we checked PMP:

2678 nanosleep(libpthread.so.0),...,buf_LRU_get_free_block(buf0lru.cc:1435),... 867 pthread_cond_wait,...,log_write_up_to(log0log.cc:1293),... 396 pthread_cond_wait,...,mtr_t::s_lock(sync0rw.ic:433),btr_cur_search_to_nth_level(btr0cur.cc:1022),... 337 libaio::??(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:2325),... 240 poll(libc.so.6),...,Protocol_classic::read_packet(protocol_classic.cc:810),...

Again we see that PFS is not telling the whole story, this time due to a missing annotation in XtraDB. Whereas the PFS results might lead us to leave the flushing analysis and focus on the rseg/undo/purge or check the index lock, PMP clearly shows that a lack of free pages is the biggest source of waits. Turning on the doublewrite buffer makes LRU flushing inadequate again. This data, however, doesn’t tell us why that is.

To see how enabling the doublewrite buffer makes LRU flushing perform worse, we collect PFS and PMP data only for the server flusher (cleaner coordinator, cleaner worker, and LRU flusher) threads and I/O completion threads:

If we zoom in from the whole server to the flushers only, the doublewrite mutex is back. Since we removed its contention for the single page flushes, it must be the batch doublewrite buffer usage by the flusher threads that causes it to reappear. The doublewrite buffer has a single area for 120 pages that is shared and filled by flusher threads. The page add to the batch action is protected by the doublewrite mutex, serialising the adds, and results in the following picture:

By now we should be wary of reviewing PFS data without checking its results against PMP. Here it is:

139 libaio::??(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:2448),LinuxAIOHandler::poll(os0file.cc:2594),... 56 pthread_cond_wait,...,os_event_wait_low(os0event.cc:534),buf_dblwr_add_to_batch(buf0dblwr.cc:1111),...,buf_flush_LRU_list_batch(buf0flu.cc:1555),...,buf_lru_manager(buf0flu.cc:2334),... 25 pthread_cond_wait,...,os_event_wait_low(os0event.cc:534),buf_flush_page_cleaner_worker(buf0flu.cc:3482),... 21 pthread_cond_wait,...,PolicyMutex<TTASEventMutex<GenericPolicy>(ut0mutex.ic:89),buf_page_io_complete(buf0buf.cc:5966),fil_aio_wait(fil0fil.cc:5754),io_handler_thread(srv0start.cc:330),... 8 pthread_cond_timedwait,...,buf_flush_page_cleaner_coordinator(buf0flu.cc:2726),...

As with the single-page flush doublewrite contention and the wait to get a free page in the previous posts, here we have an unannotated-for-Performance Schema doublewrite OS event wait (same bug 80979):

if (buf_dblwr->batch_running) { /* This not nearly as bad as it looks. There is only page_cleaner thread which does background flushing in batches therefore it is unlikely to be a contention point. The only exception is when a user thread is forced to do a flush batch because of a sync checkpoint. */ int64_t sig_count = os_event_reset(buf_dblwr->b_event); mutex_exit(&buf_dblwr->mutex); os_event_wait_low(buf_dblwr->b_event, sig_count); goto try_again; }

This is as bad as it looks (the comment is outdated). A running doublewrite flush blocks any doublewrite page add attempts from all the other flusher threads for the duration of the flush (up to 120 data pages written twice to storage):

The issue also occurs with MySQL 5.7 multi-threaded flusher but becomes more acute with the PS 5.7 multi-threaded LRU flusher. There is no inherent reason why all the parallel flusher threads must share the single doublewrite buffer. Each thread can have its own private buffer, and doing so allows us to add to the buffers and flush them independently. This means a lot of synchronisation simply disappears. Adding pages to parallel buffers is fully asynchronous:

And so is flushing them:

This behavior is what we shipped in the 5.7.11-4 release, and the performance results were shown in a previous post. To see how the private doublewrite buffer affects flusher threads, let’s look at isolated data for those threads again.

Performance Schema:

It shows the redo log mutex as the current top contention source from the PFS point of view, which is not caused directly by flushing.

PMP data looks better too:

112 libaio::??(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:2455),...,io_handler_thread(srv0start.cc:330),... 54 pthread_cond_wait,...,buf_dblwr_flush_buffered_writes(buf0dblwr.cc:1287),...,buf_flush_LRU_list(buf0flu.cc:2341),buf_lru_manager(buf0flu.cc:2341),... 35 pthread_cond_wait,...,PolicyMutex<TTASEventMutex<GenericPolicy>(ut0mutex.ic:89),buf_page_io_complete(buf0buf.cc:5986),...,io_handler_thread(srv0start.cc:330),... 27 pthread_cond_wait,...,buf_flush_page_cleaner_worker(buf0flu.cc:3489),... 10 pthread_cond_wait,...,enter(ib0mutex.h:845),buf_LRU_block_free_non_file_page(ib0mutex.h:845),buf_LRU_block_free_hashed_page(buf0lru.cc:2567),...,buf_page_io_complete(buf0buf.cc:6070),...,io_handler_thread(srv0start.cc:330),...

The buf_dblwr_flush_buffered_writes now waits for its own thread I/O to complete and doesn’t block other threads from proceeding. The other top mutex waits belong to the LRU list mutex, which is again not caused directly by flushing.

This concludes the description of the current flushing implementation in Percona Server. To sum up, in these post series we took you through the road to the current XtraDB 5.7 flushing implementation:

  • Under high concurrency I/O-bound workloads, the server has a high demand for free buffer pages. This demand can be satisfied by either LRU batch flushing, either single page flushing.
  • Single page flushes cause a lot of doublewrite buffer contention and are bad even without the doublewrite.
  • Same as in XtraDB 5.6, we removed the single page flushing altogether.
  • Existing cleaner LRU flushing could not satisfy free page demand.
  • Multi-threaded LRU flushing design addresses this issue – if the doublewrite buffer is disabled.
  • If the doublewrite buffer is enabled, MT LRU flushing contends on it, negating its improvements.
  • Parallel doublewrite buffers address this bottleneck.

CPU governor performance

May 6, 2016 - 3:53pm

In this blog, we’ll examine how CPU governor performance affects MySQL.

It’s been a while since we looked into CPU governors and with the new Intel CPUs and new Linux distros, I wanted to check how CPU governors affect MySQL performance.

Before jumping to results, let’s review what drivers manage CPU frequency. Traditionally, the default driver was “acpi-cpufreq”, but for the recent Intel CPUs and new Linux kernel it was changed to “intel_pstate”.

To check what driver is being used, run the command cpupower frequency-info .

cpupower frequency-info analyzing CPU 0: driver: acpi-cpufreq CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 10.0 us. hardware limits: 1.20 GHz - 2.00 GHz available frequency steps: 2.00 GHz, 2.00 GHz, 1.90 GHz, 1.80 GHz, 1.70 GHz, 1.60 GHz, 1.50 GHz, 1.40 GHz, 1.30 GHz, 1.20 GHz available cpufreq governors: conservative, ondemand, userspace, powersave, performance current policy: frequency should be within 1.20 GHz and 2.00 GHz. The governor "ondemand" may decide which speed to use within this range. current CPU frequency is 1.20 GHz (asserted by call to hardware). cpufreq stats: 2.00 GHz:29.48%, 2.00 GHz:0.00%, 1.90 GHz:0.00%, 1.80 GHz:0.00%, 1.70 GHz:0.00%, 1.60 GHz:0.00%, 1.50 GHz:0.00%, 1.40 GHz:0.00%, 1.30 GHz:0.37%, 1.20 GHz:70.15% (7) boost state support: Supported: yes Active: yes

In this case, we can see that the driver is “acpi-cpufreq”, and the governor is “ondemand”.

On my server (running Ubuntu 16.04, running “Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz” CPUs), I get following output by default settings:

analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 0.97 ms. hardware limits: 1.20 GHz - 3.00 GHz available cpufreq governors: performance, powersave current policy: frequency should be within 1.20 GHz and 3.00 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency is 1.50 GHz (asserted by call to hardware). boost state support: Supported: yes Active: yes

So, it’s interesting to see that “intel_pstate” with the “performance” governor is chosen by default, and the CPU frequency range is 1.20GHz to 3.00GHz (even though the CPU specification is 2.ooGHz). If we check CPU specification page, it says that 2.00GHz is the “base frequency” and “3.00GHz” is the “Max Turbo” frequency.

In contrast to “intel_pstate”, “acpi-cpufreq” says “frequency should be within 1.20 GHz and 2.00 GHz.”

Also, “intel_pstate” only supports “performance” and “powersave” governors, while “acpi-cpufreq” has a wider range. For this blog, I only tested “ondemand” and “performance”.

Switching between CPU drivers is not easy, as it requires a server reboot — you need to pass a parameter to the kernel startup line. In Ubuntu, you can do this in /etc/default/grub by changing GRUB_CMDLINE_LINUX_DEFAULT to GRUB_CMDLINE_LINUX_DEFAULT="intel_pstate=disable", which will disable intel_pstate and will load acpi-cpufreq.

Is there a real difference in performance between different CPU drivers and CPU governors? To check , I took a sysbench OLTP read-only workload over a 10Gb network, where the data fits into memory (so it is CPU-burning workload).

The results are as follows. This is a chart for absolute throughput:

And to better understand relative performance, here is a chart on how other governors perform compared to “intel-pstate” with the performance governor. In this case, I showed relative performance to “PSTATE performance”, which equals “1”. In the chart, the orange bar is “PSTATE powersave” and shows the relative difference between “PSTATE powersave” and “PSTATE performance” (=1):

Here are the takeaways I see:

  • The combination of CPU driver and CPU governors still affect performance
  • ACPI ondemand might be not the best choice to achieve the best throughput
  • Intel_pstate “powersave” is slower on a fewer number of threads (I guess the Linux scheduler assign execution to “sleeping” CPU cores)
  • Both ACPI and Intel_pstate “performance” governor shows the best (and practically identical) performance
  • My Ubuntu 16.04 starts with “intel_pstate” + “performance” governor by default, but you still may want to check what the settings are in your case (and change to “performance” if it is not set)

Percona Server 5.7: multi-threaded LRU flushing

May 5, 2016 - 6:34am

In this blog post, we’ll discuss how to use multi-threaded LRU flushing to prevent bottlenecks in MySQL.

In the previous post, we saw that InnoDB 5.7 performs a lot of single-page LRU flushes, which in turn are serialized by the shared doublewrite buffer. Based on our 5.6 experience we have decided to attack the single-page flush issue first.

Let’s start with describing a single-page flush. If the working set of a database instance is bigger than the available buffer pool, existing data pages will have to be evicted or flushed (and then evicted) to make room for queries reading in new pages. InnoDB tries to anticipate this by maintaining a list of free pages per buffer pool instance; these are the pages that can be immediately used for placing the newly-read data pages. The target length of the free page list is governed by the innodb_lru_scan_depth parameter, and the cleaner threads are tasked with refilling this list by performing LRU batch flushing. If for some reason the free page demand exceeds the cleaner thread flushing capability, the server might find itself with an empty free list. In an attempt to not stall the query thread asking for a free page, it will then execute a single-page LRU flush (buf_LRU_get_free_block calling buf_flush_single_page_from_LRU in the source code), which is performed in the context of the query thread itself.

The problem with this flushing mode is that it will iterate over the LRU list of a buffer pool instance, while holding the buffer pool mutex in InnoDB (or the finer-grained LRU list mutex in XtraDB). Thus, a server whose cleaner threads are not able to keep up with the LRU flushing demand will have further increased mutex pressure – which can further contribute to the cleaner thread troubles. Finally, once the single-page flusher finds a page to flush it might have trouble in getting a free doublewrite buffer slot (as shown previously). That suggested to us that single-page LRU flushes are never a good idea.  The flame graph below demonstrates this:

Note how a big part of the server run time is attributed to a flame rooted at JOIN::optimize, whose run time in turn is almost fully taken by buf_dblwr_write_single_page in two branches.

The easiest way not to avoid a single-page flush is, well, simply not to do it! Wait until a cleaner thread finally provides some free pages for the query thread to use. This is what we did in XtraDB 5.6 with the innodb_empty_free_list_algorithm server option (which has a “backoff” default). This is also present in XtraDB 5.7, and resolves the issues of increased contentions for the buffer pool (LRU list) mutex and doublewrite buffer single-page flush slots. This approach handles the the empty free page list better.

Even with this strategy it’s still a bad situation to be in, as it causes query stalls when page cleaner threads aren’t able to keep up with the free page demand. To understand why this happens, let’s look into a simplified scheme of InnoDB 5.7 multi-threaded LRU flushing:

The key takeaway from the picture is that LRU batch flushing does not necessarily happen when it’s needed the most. All buffer pool instances have their LRU lists flushed first (for free pages), and flush lists flushed second (for checkpoint age and buffer pool dirty page percentage targets). If the flush list flush is in progress, LRU flushing will have to wait until the next iteration. Further, all flushing is synchronized once per second-long iteration by the coordinator thread waiting for everything to complete. This one second mark may well become a thirty or more second mark if one of the workers is stalled (with the telltale sign: “InnoDB: page_cleaner: 1000ms intended loop took 49812ms”) in the server error log. So if we have a very hot buffer pool instance, everything else will have to wait for it. And it’s long been known that buffer pool instances are not used uniformly (some are hotter and some are colder).

A fix should:

  • Decouple the “LRU list flushing” from “flush list flushing” so that the two can happen in parallel if needed.
  • Recognize that different buffer pool instances require different amounts of flushing, and remove the synchronization between the instances.

We developed a design based on the above criteria, where each buffer pool instance has its own private LRU flusher thread. That thread monitors the free list length of its instance, flushes, and sleeps until the next free list length check. The sleep time is adjusted depending on the free list length: thus a hot instance LRU flusher may not sleep at all in order to keep up with the demand, while a cold instance flusher might only wake up once per second for a short check.

The LRU flushing scheme now looks as follows:

This has been implemented in the Percona Server 5.7.10-3 RC release, and this design the simplified the code as well. LRU flushing heuristics are simple, and any LRU flushing is now removed from the legacy cleaner coordinator/worker threads – enabling more efficient flush list flushing as well. LRU flusher threads are the only threads that can flush a given buffer pool instance, enabling further simplification: for example, InnoDB recovery writer threads simply disappear.

Are we done then? No. With the single-page flushes and single-page flush doublewrite bottleneck gone, we hit the doublewrite buffer again. We’ll cover that in the next post.

MySQL High Availability: The Road Ahead for Percona and XtraDB Cluster

May 4, 2016 - 7:07am

This blog post discusses what is going on in the MySQL high availability market, and what Percona’s plans are for helping customers with high availability solutions.

One thing I like to tell people is that you shouldn’t view Percona as a “software” company, but as a “solution” company. Our goal has always been to provide the best solution that meets each customer’s situation, rather than push our own software, regardless of whether it is the best fit or not. As a result, we have customers running all kinds of MySQL “flavors”: MySQL, MariaDB, Percona Server, Amazon RDS and Google Cloud SQL. We’re happy to help customers be successful with the technology of their choice, and advise them on alternatives when we see a better fit.

One area where I have been increasingly uneasy is our advanced high availability support with Percona XtraDB Cluster and other Galera-based technologies. In 2011, when we started working on Percona XtraDB Cluster together with Codership, we needed to find a way to arrange investment into the development of Galera technology to bring it to market. So we made a deal, which, while providing needed development resources, also required us to price Percona XtraDB Cluster support as a very expensive add-on option. While this made sense at the time, it also meant few companies could afford XtraDB Cluster support from Percona, especially at large scale.

As a few years passed, the Galera technology became the mainstream high-end high availability option. In addition to being available in Percona XtraDB Cluster, it has been included in MariaDB, as well as Galera Cluster for MySQL. Additionally, the alternative technology to solve the same problem – MySQL Group Replication – started to be developed by the MySQL Team at Oracle. With these all changes, it was impossible for us to provide affordable support for Percona XtraDB Cluster due to our previous commercial agreement with Codership that reflected a very different market situation than we now find ourselves facing.

As a result, over a year ago we exited our support partnership agreement with Codership and moved the support and development function in-house. These changes have proven to be positive for our customers, allowing us to better focus on their priorities and provide better response time for issues, as these no longer require partner escalation.

Today we’re taking the next natural step – we will no longer require customers to purchase Percona XtraDB Cluster as a separate add-on. Percona will include support for XtraDB Cluster and other Galera-based replication technologies in our Enterprise and Premier support levels, as well as our Percona Care and Managed Services subscriptions. Furthermore, we are going to support Oracle’s MySQL Group Replication technology at no additional cost too, once it becomes generally available, so our customers have access to the best high availability technology for their deployment.

As part of this change, you will also see us focusing on hardening XtraDB Cluster and Galera technology, making it better suited for demanding business workloads, as well as more secure and easier to use. All of our changes will be available as 100% open source solutions and will also be contributed back to the Galera development team to incorporate into their code base if they wish.

I believe making the Galera code better is the most appropriate action for us at this point!

MySQL 5.7: initial flushing analysis and why Performance Schema data is incomplete

May 3, 2016 - 12:22pm

In this post, we’ll examine why in an initial flushing analysis we find that Performance Schema data is incomplete.

Having shown the performance impact of Percona Server 5.7 patches, we can now discuss their technical reasoning and details. Let’s revisit the MySQL 5.7.11 performance schema synch wait graph from the previous post, for the case of unlimited InnoDB concurrency:

First of all, this graph is a little “nicer” than reality, which limits its diagnostic value. There are two reasons for this. The first one is that page cleaner worker threads are invisible to Performance Schema (see bug 79894). This alone limits PFS value in 5.7 if, for example, one tries to select only the events in the page cleaner threads or monitors low concurrency where the cleaner thread count is non-negligible part of the total threads.

To understand the second reason, let’s look into PMP for the same setting. Note that selected intermediate stack frames were removed for clarity, especially in the InnoDB mutex implementation.

660 pthread_cond_wait,enter(ib0mutex.h:850),buf_dblwr_write_single_page(ib0mutex.h:850),buf_flush_write_block_low(buf0flu.cc:1096),buf_flush_page(buf0flu.cc:1096),buf_flush_single_page_from_LRU(buf0flu.cc:2217),buf_LRU_get_free_block(buf0lru.cc:1401),... 631 pthread_cond_wait,buf_dblwr_write_single_page(buf0dblwr.cc:1213),buf_flush_write_block_low(buf0flu.cc:1096),buf_flush_page(buf0flu.cc:1096),buf_flush_single_page_from_LRU(buf0flu.cc:2217),buf_LRU_get_free_block(buf0lru.cc:1401),... 337 pthread_cond_wait,PolicyMutex<TTASEventMutex<GenericPolicy>(ut0mutex.ic:89),get_next_redo_rseg(trx0trx.cc:1185),trx_assign_rseg_low(trx0trx.cc:1278),trx_set_rw_mode(trx0trx.cc:1278),lock_table(lock0lock.cc:4076),... 324 libaio::??(libaio.so.1),LinuxAIOHandler::collect(os0file.cc:2448),LinuxAIOHandler::poll(os0file.cc:2594),... 241 pthread_cond_wait,PolicyMutex<TTASEventMutex<GenericPolicy>(ut0mutex.ic:89),trx_write_serialisation_history(trx0trx.cc:1578),trx_commit_low(trx0trx.cc:2135),... 147 pthread_cond_wait,enter(ib0mutex.h:850),trx_undo_assign_undo(ib0mutex.h:850),trx_undo_report_row_operation(trx0rec.cc:1918),... 112 pthread_cod_wait,mtr_t::s_lock(sync0rw.ic:433),btr_cur_search_to_nth_level(btr0cur.cc:1008),... 83 poll(libc.so.6),Protocol_classic::get_command(protocol_classic.cc:965),do_command(sql_parse.cc:935),handle_connection(connection_handler_per_thread.cc:301),... 64 pthread_cond_wait,Per_thread_connection_handler::block_until_new_connection(thr_cond.h:136),...

The top wait in both PMP and the graph is the 660 samples of enter mutex in buf_dblwr_write_single_pages, which is the doublewrite mutex. Now try to find the nearly as hot 631 samples of event wait in buf_dblwr_write_single_page in the PFS output. You won’t find it because InnoDB OS event waits are not annotated in Performance Schema. In most cases this is correct, as OS event waits tend to be used when there is no work to do. The thread waits for work to appear, or for time to pass. But in the report above, the waiting thread is blocked from proceeding with useful work (see bug 80979).

Now that we’ve shown the two reasons why PFS data is not telling the whole server story, let’s take PMP data instead and consider how to proceed. Those top two PMP waits suggest 1) the server is performing a lot of single page flushes, and 2) those single page flushes have their concurrency limited by the eight doublewrite single-page flush slots available, and that the wait for a free slot to appear is significant.

Two options become apparent at this point: either make the single-page flush doublewrite more parallel or reduce the single-page flushing in the first place. We’re big fans of the latter option since version 5.6 performance work, where we configured Percona Server to not perform single-page flushes at all by introducing the innodb_empty_free_list_algorithm option, with the “backoff” default.

The next post in the series will describe how we removed single-page flushing in 5.7.

Best Practices for Configuring Optimal MySQL Memory Usage

May 3, 2016 - 7:26am

In this blog post, we’ll discuss some of the best practices for configuring optimal MySQL memory usage.

Correctly configuring the use of available memory resources is one of the most important things you have to get right with MySQL for optimal performance and stability. As of MySQL 5.7, the default configuration uses a very limited amount of memory – leaving defaults is one of the worst things you can do. But configuring it incorrectly can result in even worse performance (or even crashes).

The first rule of configuring MySQL memory usage is you never want your MySQL to cause the operating system to swap. Even minor swapping activity can dramatically reduce MySQL performance. Note the keyword “activity” here. It is fine to have some used space in your swap file, as there are probably parts of the operating system that are unused when MySQL is running, and it’s a good idea to swap them out. What you don’t want is constant swapping going on during the operation, which is easily seen in the “si” and “so” columns in the vmstat output.

Example: No Significant Swapping

Example:  Heavy Swapping Going

If you’re running Percona Monitoring and Management, you can also look into the Swap Activity graph in System Overview Dashboard.

If you have spikes of more than 1MB/sec, or constant swap activity, you might need to revisit your memory configuration.

MySQL Memory allocation is complicated. There are global buffers, per-connection buffers (which depend on the workload), and some uncontrolled memory allocations (i.e., inside Stored Procedures), all contributing to difficulties in computing how much memory MySQL will really use for your workload. It is better to check it by looking at the virtual memory size (VSZ) that MySQL uses. You can get it from “top”, or by running ps aux | grep mysqld.

mysql     3939 30.3 53.4 11635184 8748364 ?    Sl   Apr08 9106:41 /usr/sbin/mysqld

The 5th column here shows VSZ usage (about 11GB).

Note that the VSZ is likely to change over time. It is often a good idea to plot it in your monitoring system and set an alert to ping you when it hits a specified threshold. Don’t allow the mysqld process VSZ exceed 90% of the system memory (and less if you’re running more than just MySQL on the system).

It’s a good idea to start on the safe side by conservatively setting your global and per connections buffers, and then increase them as you go. Many can be set online, including innodb_buffer_pool_size in MySQL 5.7.

So how do you decide how much memory to allocate to MySQL versus everything else? In most cases you shouldn’t commit more than 90% of your physical memory to MySQL, as you need to have some reserved for the operating system and things like caching binary log files, temporary sort files, etc.

There are cases when MySQL should use significantly less than 90% of memory:

  • If there are other important processes running on the same server, either all the time or periodically. If you have heavy batch jobs run from cron, which require a lot of memory, you’ll need to account for that.
  • If you want to use OS caching for some storage engines. With InnoDB, we recommend innodb_flush_method=O_DIRECT  in most cases, which won’t use Operating System File Cache. However, there have been cases when using buffered IO with InnoDB made sense. If you’re still running MyISAM, you will need OS cache for the “data” part of your tables. With TokuDB, using OS cache is also a good idea for some workloads.
  • If your workload has significant demands, Operating System Cache – MyISAM on disk temporary tables, sort files, some other temporary files which MySQL creates the need to be well-cached for optimal performance.

Once you know how much memory you want the MySQL process to have as a whole, you’ll need to think about for what purpose the memory should be used inside MySQL.The first part of memory usage in MySQL is workload related – if you have many connections active at the same time that run heavy selects using a lot of memory for sorting or temporary tables, you might need a lot of memory (especially if Performance Schema is enabled). In other cases this amount of memory is minimal. You’ll generally need somewhere between 1 and 10GB for this purpose.

Another thing you need to account for is memory fragmentation. Depending on the memory allocation library you’re using (glibc, TCMalloc, jemalloc, etc.), the operating system settings such as Transparent Huge Pages (THP) and workload may show memory usage to grow over time (until it reaches some steady state). Memory fragmentation can also account for 10% or more of additional memory usage.

Finally, let’s think about various global buffers and caching. In typical cases, you mainly only have innodb_buffer_pool_size to worry about. But you might also need to consider key_buffer_size,  tokudb_cache_size, query_cache_size  as well as table_cache and table_open_cache. These are also responsible for global memory allocation, even though they are not counted in bytes. Performance _Schema may also take a lot of memory, especially if you have a large number of connections or tables in the system.

When you specify the size of the buffers and caches, you should determine what you’re specifying. For innodb_buffer_pool_size, remember there is another 5-10% of memory that is allocated for additional data structures – and this number is larger if you’re using compression or set innodb_page_size smaller than 16K. For tokudb_cache_size, it’s important to remember that the setting specified is a guide, not a “hard” limit: the cache size can actually grow slightly larger than the specified limit.

For systems with large amounts of memory, the database cache is going to be by far the largest memory consumer, and you’re going to allocate most of your memory to it. When you add extra memory to the system, it is typically to increase the database cache size.

Let’s do some math for a specific example. Assume you have a system (physical or virtual) with 16GB of memory. We are only running MySQL on this system, with an InnoDB storage engine and use innodb_flush_method=O_DIRECT, so we can allocate 90% (or 14.4GB) of memory to MySQL. For our workload, we assume connection handling and other MySQL connection-based overhead will take up 1GB (leaving 13.4GB). 0.4GB is likely to be consumed by various other global buffers (innodb_log_buffer_size, Table Caches, other miscellaneous needs, etc.), which now leaves 13GB. Considering the 5-7% overhead that the InnodB Buffer Pool has, a sensible setting is innodb_buffer_pool_size=12G – what we very commonly see working well for systems with 16GB of memory.

Now that we have configured MySQL memory usage, we also should look at the OS configuration. The first question to ask is if we don’t want MySQL to swap, should we even have the swap file enabled?  In most cases, the answer is yes – you want to have the swap file enabled (strive for 4GB minimum, and no less than 25% of memory installed) for two reasons:

  • The operating system is quite likely to have some portions that are unused when it is running as a database server. It is better to let it swap those out instead of forcing it to keep it in memory.
  • If you’ve made a mistake in the MySQL configuration, or you have some rogue process taking much more memory than expected, it is usually a much better situation to lose performance due to a swap then to kill MySQL with an out of memory (OOM) error – potentially causing downtime.

As we only want the swap file used in emergencies, such as when there is no memory available or to swap out idle processes, we want to reduce Operating System tendency to swap   (echo 1 >  /proc/sys/vm/swappiness). Without this configuration setting you might find the OS swapping out portions of MySQL just because it feels it needs to increase the amount of available file cache (which is almost always a wrong choice for MySQL).

The next thing when it comes to OS configuration is setting the Out Of Memory killer. You may have seen message like this in your kernel log file:

Apr 24 02:43:18 db01 kernel: Out of memory: Kill process 22211 (mysqld) score 986 or sacrifice child

When MySQL itself is at fault, it’s pretty rational thing to do. However, it’s also possible the real problem was some of the batch activities you’re running: scripts, backups, etc. In this case, you probably want those processes to be terminated if the system does not have enough memory rather than MySQL.

To make MySQL a less likely candidate to be killed by the OOM killer, you can adjust the behavior to make MySQL less preferable with the following:

echo '-800' > /proc/$(pidof mysqld)/oom_score_adj

This will make the Linux kernel prefer killing other heavy memory consumers first.

Finally on a system with more than one CPU socket, you should care about NUMA when it comes to MySQL memory allocation. In newer MySQL versions, you want to enable innodb_numa_interleave=1. In older versions you can either manually run numactl --interleave=all  before you start MySQL server, or use the numa_interleave configuration option in Percona Server.

 

Upgrading to MySQL 5.7, focusing on temporal types

April 27, 2016 - 1:30pm

In this post, we’ll discuss how MySQL 5.7 handles the old temporal types during an upgrade.

MySQL changed the temporal types in MySQL 5.6.4, and it introduced a new feature: microseconds resolution in the TIME, TIMESTAMP and DATETIME types. Now these parameters can be set down to microsecond granularity. Obviously, this means format changes, but why is this important?

Are they converted automatically to the new format?

If we had tables in MySQL 5.5 that used TIME, TIMESTAMP or DATETIME are these fields are going to be converted to the new format when upgrading to 5.6? The answer is “NO.” Even if we run mysql_upgrade, it does not warn us about the old format. If we check the MySQL error log, we cannot find anything regarding this. But the newly created tables are going to use the new format so that we will have two different types of temporal fields.

How can we find these tables?

The following query gives us a summary on the different table formats:

SELECT CASE isc.mtype WHEN '6' THEN 'OLD' WHEN '3' THEN 'NEW' END FORMAT, count(*) TOTAL FROM information_schema.tables AS t INNER JOIN information_schema.columns AS c ON c.table_schema = t.table_schema AND c.table_name = t.table_name LEFT OUTER JOIN information_schema.innodb_sys_tables AS ist ON ist.name = concat(t.table_schema,'/',t.table_name) LEFT OUTER JOIN information_schema.innodb_sys_columns AS isc ON isc.table_id = ist.table_id AND isc.name = c.column_name WHERE c.column_type IN ('time','timestamp','datetime') AND t.table_schema NOT IN ('mysql','information_schema','performance_schema') AND t.table_type = 'base table' AND (t.engine = 'innodb') GROUP BY isc.mtype;

+--------+-------+ | FORMAT | TOTAL | +--------+-------+ | NEW | 1     | | OLD | 9 | +--------+-------+

Or we can use show_old_temporals, which will highlight the old formats during a show create table.

CREATE TABLE `mytbl` ( `ts` timestamp /* 5.5 binary format */ NOT NULL DEFAULT CURRENT_TIMESTAMP, `dt` datetime /* 5.5 binary format */ DEFAULT NULL, `t` time /* 5.5 binary format */ DEFAULT NULL ) DEFAULT CHARSET=latin1

MySQL can handle both types, but with the old format you cannot use microseconds, and the default DATETIME takes more space on disk.

Can I upgrade to MySQL 5.7?

Of course you can! But when mysql_upgrade is running it is going to convert the old fields into the new format by default. This basically means an alter table on every single table, which will contain one of the three types.

Depending on the number of tables, or the size of the tables, this could take hours – so you may need to do some planning.

.... test.t1 error : Table rebuild required. Please do "ALTER TABLE `t1` FORCE" or dump/reload to fix it! test.t2 error : Table rebuild required. Please do "ALTER TABLE `t2` FORCE" or dump/reload to fix it! test.t3 error : Table rebuild required. Please do "ALTER TABLE `t3` FORCE" or dump/reload to fix it! Repairing tables mysql.proxies_priv OK `test`.`t1` Running : ALTER TABLE `test`.`t1` FORCE status : OK `test`.`t2` Running : ALTER TABLE `test`.`t2` FORCE status : OK `test`.`t3` Running : ALTER TABLE `test`.`t3` FORCE status : OK Upgrade process completed successfully. Checking if update is needed.

Can we avoid this at upgrade?

We can run alter tables or use pt-online-schema-schange (to avoid locking) before an upgrade, but even without these preparations we can still avoid incompatibility issues.

My colleague Daniel Guzman Burgos pointed out that mysql_upgrade has an option called upgrade-system-tables. This will only upgrade the system tables, and nothing else.

Can we still write these fields?

The following query returns the schema and the table names that still use the old formats.

SELECT CASE isc.mtype WHEN '6' THEN 'OLD' WHEN '3' THEN 'NEW' END FORMAT, t.schema_name, t.table_name FROM information_schema.tables AS t INNER JOIN information_schema.columns AS c ON c.table_schema = t.table_schema AND c.table_name = t.table_name LEFT OUTER JOIN information_schema.innodb_sys_tables AS ist ON ist.name = concat(t.table_schema,'/',t.table_name) LEFT OUTER JOIN information_schema.innodb_sys_columns AS isc ON isc.table_id = ist.table_id AND isc.name = c.column_name WHERE c.column_type IN ('time','timestamp','datetime') AND t.table_schema NOT IN ('mysql','information_schema','performance_schema') AND t.table_type = 'base table' AND (t.engine = 'innodb');

+--------+--------------+------------+ | FORMAT | table_schema | table_name | +--------+--------------+------------+ | OLD | test | t | | OLD | test | t | | OLD | test | t | | NEW | sys | sys_config | +--------+--------------+------------+ 4 rows in set (0.03 sec) mysql> select version(); +-----------+ | version() | +-----------+ | 5.7.11-4 | +-----------+ 1 row in set (0.00 sec)

As we can see, we’re using 5.7 and table “test.t” still has the old format.

The schema:

CREATE TABLE `t` ( `id` int(11) NOT NULL AUTO_INCREMENT, `t1` time DEFAULT NULL, `t2` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `t3` datetime DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1

mysql> select * from t; +----+----------+---------------------+---------------------+ | id | t1 | t2 | t3 | +----+----------+---------------------+---------------------+ | 1 | 20:28:00 | 2016-04-09 01:41:58 | 2016-04-23 22:22:01 | | 2 | 20:28:00 | 2016-04-09 01:41:59 | 2016-04-23 22:22:02 | | 3 | 20:28:00 | 2016-04-09 01:42:01 | 2016-04-23 22:22:03 | | 4 | 20:28:00 | 2016-04-09 01:42:03 | 2016-04-23 22:22:04 | | 5 | 20:28:00 | 2016-04-09 01:42:08 | 2016-04-23 22:22:05 | +----+----------+---------------------+---------------------+

Let’s try to insert a new row:

mysql> insert into `t` (t1,t3) values ('20:28','2016:04:23 22:22:06'); Query OK, 1 row affected (0.01 sec) mysql> select * from t; +----+----------+---------------------+---------------------+ | id | t1 | t2 | t3 | +----+----------+---------------------+---------------------+ | 1 | 20:28:00 | 2016-04-09 01:41:58 | 2016-04-23 22:22:01 | | 2 | 20:28:00 | 2016-04-09 01:41:59 | 2016-04-23 22:22:02 | | 3 | 20:28:00 | 2016-04-09 01:42:01 | 2016-04-23 22:22:03 | | 4 | 20:28:00 | 2016-04-09 01:42:03 | 2016-04-23 22:22:04 | | 5 | 20:28:00 | 2016-04-09 01:42:08 | 2016-04-23 22:22:05 | | 6 | 20:28:00 | 2016-04-09 01:56:38 | 2016-04-23 22:22:06 | +----+----------+---------------------+---------------------+ 6 rows in set (0.00 sec)

It was inserted without a problem, and we can’t see any related info/warnings in the error log.

Does the Replication work?

In many scenarios, when you are upgrading a replicaset, the slaves are upgraded first. But will the replication work? The short answer is “yes.” I configured row-based replication between MySQL 5.6 and 5.7. The 5.6 was the master, and it had all the temporal types in the old format. On 5.7, I had new and old formats.

I replicated from old format to old format, and from old format to new format, and both are working.

Conclusion

Before upgrading to MySQL 5.7, tables should be altered to use the new format. If it isn’t done, however, the upgrade is still possible without altering all the tables – the drawbacks are you cannot use microseconds, and it takes more space on disk. If you had to upgrade to 5.7, however, you could change the format later using alter table or pt-online-schema-schange.

 

How We Made MySQL Great Again, or Upgrading MySQL with Orchestrator

April 26, 2016 - 12:56pm

In this blog post, we’ll discuss upgrading MySQL with Orchestrator.

I recently had a client, Life360, that wanted to upgrade from Percona Server 5.5 to Percona Server 5.6, and implement GTID in their high transaction environment. They had co-masters and multiple read slaves.

Orchestrator made this job much easier for us. My colleague, Tibi, recently posted about Orchestrator here and here.

Daniel from Life360 saw Orchestrator and was very interested. So here is how he setup Orchestrator in his own words:

I did some initial testing with the vagrant boxes provided in the Orchestrator repo, to see how to configure the agents and get the Orchestrator server to do what we want.

I then moved to install the Orchestrator server, Orchestrator backend on RDS, and deploy the clients on the slaves and masters in our Amazon VPC MySQL instances.

Once the server setup was done, the clients were auto-detected through CNAME discovery of the masters, and the agents talked to the server (it took a while as CNAMES wasn’t working as expected, but that’s fixed in the new server version).

We were pretty amazed at the number of actions you can do through orchestrator itself, such as: moving slaves to a different master through drag and drop, enabling GTID on a node with the push of a button, setting up GTID based failover, taking LVM snapshots using Orchestrator Agent, etc.

We went ahead and tested the master change on drag and drop, and after a few successful attempts, we even brought it back to where it was initially. After those tests, we were pretty confident that we could leverage Orchestrator as one of our main tools to assist in the coming upgrade.

Here is a screenshot of the initial setup:

Manjot: Once Daniel had Orchestrator setup, he wanted to leverage it to help with the MySQL upgrade. We set out to create a plan that worked within his constraints and still kept best practices in mind.

First, we installed Percona Server 5.6 fresh on our dedicated backup slave. That first 5.6 slave was created with MyDumper to achieve forward compatibility and not have any legacy tablespaces. Since MyDumper was already installed with the Percona Backup Service that Life360 has, this was fairly easy to accomplish.

The MyDumper slave rebuild works in the following way:

To take a mydumper backup:

  1. Go to your desired backups directory
  2. Install mydumper (sudo apt-get install mydumper)
  3. mydumper -t 8 -L mydumper.log –compress

To restore:

  1. Make sure MyDumper is installed: sudo apt-get install mydumper
  2. Copy the MyDumper backups over to a backups dir
  3. Export your BACKUP_DIR as env var
  4. Run this to restore with MyLoader (from https://gist.github.com/Dnile/4658b338d4a101cbe2eeb5080ebddf8e):
    #!/usr/bin/env sh cd $BACKUP_DIR export DESTHOST=127.0.0.1 export BACKUP_DIR=/vol_mysql/backups mysqld --skip-grant-tables & for i in `ls -1 *-schema.dump.gz | cut -d'-' -f1`; do mysql -h $DESTHOST -e "CREATE DATABASE IF NOT EXISTS $i"; zcat $i-schema.dump.gz | mysql -h $DESTHOST $i; zcat $i-schema-post.dump.gz | mysql -h $DESTHOST $i; done /usr/bin/myloader --host=$DESTHOST --directory=$BACKUP_DIR --enable-binlog --threads=10 --queries-per-transaction=20 -v 3 chown -R mysql:mysql /var/lib/mysql/

Once the first 5.6 slave was caught up, we used Xtrabackup to backup 5.6 and then restored to each slave, cycling them out of the read slave pool one at a time.

Once all the slaves were upgraded, we created a new 5.6 master and had it replicate off our primary 5.5 master.

Then we moved all of the slaves to replicate off the new 5.6 master.

Life360 had long cron jobs that ran on the second 5.5 master. We moved the cron applications to write to the primary 5.5 master, and locked all tables. We then stopped replication on the second co-master. Daniel stopped MySQL and decommissioned it.

We then moved all application writes to the new 5.6 master. While Orchestrator can use external scripts to move IPs, we used a manual process here to change application DSNs and HAProxy configuration.

On the 5.5 master that remained, we used Orchestrator to set it to read only.

Daniel says this didn’t do a whole lot to get rid of connections that were still open on this server.

On the new master, we used the stop slave and reset slave buttons in the Orchestrator panel so it would no longer slave from the old master.

Once some of the thousands of connections had moved to the new master, we stopped MySQL on the 5.5 master, which took care of the rest and the application “gracefully” reconnected to the new 5.6 master.

There was some write downtime, as some connections did not drop off until they were forced to because php-fpm refused to let go. There is also always a high volume of transactions in this environment.

At this point our topology looks like this (ignore the globe icons for now):

But as always Daniel wanted MOAR. It was time for GTID. While we could have done this during the upgrade, Life360 wanted to manage risk and not make too many production changes at one time.

We followed Percona’s guide, Online GTID Deployment, but used Orchestrator to shuffle the old and new masters and toggle read_only on and off. This made our job a lot easier and faster, and saved us from any downtime.

The globes in the topology screenshot above show that the slaves are now using GTID replication.

Orchestrator makes upgrades and changes much easier than before, just use caution and understand what it is doing in the background.

 

Percona Live 2016: Closing Comments and Prize Giveaway

April 21, 2016 - 5:04pm

Well, that is it for Percona Live 2016! This year was bigger and better than last year and included more sponsors, speakers, technologies, and talks than ever before. Once again we’d like to thank everybody who participated, and the entire open source community in general: without your dedication and spirit, none of this would be possible.

At the prize ceremony, many of the exhibitors gave away a prize to the people who filled out and turned in a completed conference passport, as well as some gifts for those who completed surveys for the talks. Prizes ranged from $50 gift certificates, signed copies of Bill Nye’s book Unstoppable, an Amazon Echo, GoPro cameras, an Oculus Rift, a Playstation, and more. All the winners left happy (except maybe the guy who got the Mr. Spock socks, although I would have been pleased).

Thanks for coming, and we’ll see you in Amsterdam (and next year)!

Below is a video of the ceremony in full (it’s about 15 minutes long).

 

Percona Live 2016: MongoDB Security

April 21, 2016 - 3:18pm

It’s almost time for the closing remarks and passport prize give away at Percona Live 2016, but still the talks keep coming. I was able to quickly pop into a lecture on MongoDB Security with Stephane Combaudon, MySQL Consultant at Slice Technologies.

Stephane went over some of the reported security issues with MongoDB and explained that MongoDB has good security features. Data security is a concern for most people, and recent reports in the news show that significant amounts of user details stored in MongoDB are easily accessible. This doesn’t mean that MongoDB can’t secure your data. As Stephane explained, MongoDB can be correctly secured – but some features are not enabled by default. In this session, we learned the main items that need to be checked to get a robust MongoDB deployment. Those items include:

  • Network security: Stopping people from connecting to your MongoDB instances
  • Operating system security: stopping people from connecting to MongoDB and taking control of your servers
  • User security: how to make sure that users can only interact with specific portions of the data

I had a chance to quickly speak with Stephane after his talk:

See the rest of the Percona Live 2016 schedule here.

Percona Live 2016: Sometimes “Less is More” for dash-boarding and graphing

April 21, 2016 - 2:12pm

We’re starting to wind down here at Percona Live 2016, but there are still talks to see even this late in the conference! One of the talks I saw was Sometimes “Less is More” for dash-boarding and graphing with Michael Blue, Senior DBA at Big Fish Games.

In this talk, Michael discussed how monitoring specific metrics can be more help than all the metrics. In a monitor/measure everything environment, to get the big picture sometimes “Less is More.” There are good monitoring tools available for DBAs, with tons of metrics to measure, but at a larger scale it seems impossible to get that big picture view of your environment without spending much of your time going over all the metrics your graphing, sifting through emails of false positives alerts, and reading tons of your cron job outputs. In the talk, Michael explained the approaches he took at Big Fish to create better dashboards for all audiences without visual overload, which helped the DBAs find potential issues that were not caught via conventional monitoring. This session included:

This session included:

  • The basics of visualization
  • Picking metrics to measure at scale
  • How they leverage custom dashboards, Graphite, and the MySQL Enterprise Monitor

I had a chance  to speak with Michael after his talk:

 Check out more of the Percona Live 2016 schedule here.

Percona Live 2016: Percona Live Game Night!

April 21, 2016 - 1:09pm

Wednesday night at Percona Live 2016 was reserved for fun, fun, fun! Once again, the Percona Live Game Night proved to be a popular and amazing event. There were more games this year than last, as well as food, drinks, and lots of friendly competition!

This year, besides the ever-crowd-pleasing Meltdown Challenge, there were Segway Races, pool, foosball, shuffleboard, Wii Boxing, Pac-Man Attack, a shootout gallery, darts, as well as virtual reality stations and a death-defying trampoline.

You can see Percona’s CEO Peter Zaitsev demonstrating how you use it, pro-level:

Below are some more photos of this outstanding night:

Coed boxing: guys, you need to improve your skills!

.

Some very intense foosball action!

.

This Pac-Man Attack reminds me of 1983!

.

Keep an eye on your wallets, gentlemen, I detect a hustle.

.

More trampoline.

.

For those who like less effort with their trampoline, virtual reality.

.

A little social lubrication.

.

Happy attendees (must have stopped at the previous picture).

.

Hmm, that guy looks a bit confused. Must be too much tech talk for one day!

.

I’d stay away from this table. Just saying.

.

More happy.

.

Thanks to everybody who came out and participated in an awesome night! We’ll see you all next year!

 

 

Percona Live 2016: MySQL Community Awards Ceremony

April 21, 2016 - 12:23pm

We wrapped up Thursday morning’s Percona Live 2016 keynotes (April 21)  with The MySQL Community Award Ceremony. The MySQL Community Awards is a community-based initiative, with the goal of publicly recognizing contributors to the MySQL ecosystem. The idea is to publicly recognize contributors to the MySQL ecosystem. The entire process of discussing, voting and awarding is controlled by an independent group of community members, typically based on past winners or their representatives, as well as known contributors. It is a self-appointed, self-declared, self-making-up-the-rules-as-it-goes committee. It is also very aware of the importance of the community; a no-nonsense, non-political, adhering to tradition, self-criticizing committee.

The categories can be found here.

We celebrated the members of the open source community, and some big open source contributors from the past year! The winners this year were:

Community Contributors of the Year:

  • Bill Karwin, Senior Database Architect and Software Engineer at SchoolMessenger
  • Domas Mituzas, Small Data Engineer at Facebook
  • Yoshinori Matsunobu, Data Engineer at Facebook

Application of the Year:

Corporate Contributor of the Year:

 

Congratulation to all the winners! You can watch the whole ceremony below (it’s about 15 minutes long).

Percona Live 2016: Day Three Keynote Talks

April 21, 2016 - 11:52am

We’re heading into the final day here at Percona Live 2016! People are looking a little tired, but still excited for some excellent talks today. Once again the day started off with two great keynote lectures. Read to the end for an important announcement!

Peter Zaitsev, CEO of Percona
Winning with Open Source Databases

Peter discussed what makes the open source community so vibrant and powerful, and why it is essential to preserve that spirit. Data is critical to the success of your business. You rely on your database and its supporting systems to power the applications that drive your business. These systems must be reliable, scalable, efficient – and increasingly, open source. With the complexity of today’s web applications and the databases, systems and hardware that support them, organizations must use the right open source tools and technology for each job – without getting locked into a proprietary solution. With Percona, customers are assured a choice in technology options that are completely open source (and include enterprise features). We help our customers find the right technology for their specific needs, rather than sell a one-size-fits-all product. Percona is a true open source partner that helps you optimize your database performance to better run your business.

Patrick McFadin, Chief Evangelist at DataStax

Take back the power in your cloud applications with Apache Cassandra

Patrick discussed how cloud applications can help you to develop the applications you need in your business, but also outline why the cloud isn’t a panacea for every business issue. Database engineers have had to support the crazy dreams of application developers since the beginning of the internet. Patrick says it’s time to take back the power! He believes that Apache Cassandra is the tool that can help you eliminate downtime or span your data around the world with ease. Deploying to the cloud isn’t always easy, but Cassandra might be able to give your application developers the best chance they can get and sleep easy at night.

 

Post MongoDB World New York Conference, June 30th, Hilton Mid-Town Manhattan

Peter also made an important announcement: Percona and ObjectRocket/Rackspace will be sponsoring a free post-MongoDB World Community Event! The event will take place on Thursday, June 30th at the Hilton Mid-Town Manhattan.

All are welcome.

Don’t miss out on this amazing opportunity to share ideas and get insights after MongoDB World.

Check out the rest of today’s Percona Live 2016 schedule here.

Percona Live 2016: What’s New in MySQL

April 20, 2016 - 10:14pm

The second day of Percona Live 2016 is wrapping up, but I still wanted to get to one last lecture. Fortunately, there was a great one available: What’s New in MySQL with Geir Høydalsvik, Senior Software Development Director at Oracle, and Simon Mudd, DBA at booking.com.

MySQL 5.7 is an extremely exciting new version of the world’s most popular open source database that is 2x faster than MySQL 5.6, while also improving usability, manageability, and security. MySQL 5.7, now in GA, is the result of almost three years of development and brings improvements to all areas of the product. Some key enhancements include:

  • MySQL improved InnoDB scalability and temporary table performance, enabling faster online and bulk load operations, and more.
  • They’ve added native JSON support.
  • For increased availability and performance, they’ve included multi-source replication, multi-threaded slave enhancements, online GTIDs, and enhanced semi-sync replication.
  • They’ve added numerous new monitoring capabilities so the Performance Schema for better insights, reduced the footprint and overhead, and significantly improved ease of use with the new SYS Schema.
  • They are fulfilling “secure by default” requirements, and many new MySQL 5.7 features will help users keep their database secure.
  • They have rewritten large parts of the parser, optimizer, and cost model. This has improved maintainability, extendability, and performance.
  • GIS is completely new in MySQL 5.7, as is InnoDB spatial indexes, the use of Boost.Geometry, and increased completeness and standard compliance.

 I had a chance to talk with Geir and Simon after their lecture:

Check out the Percona Live 2016 schedule for more talks!

Percona Live 2016: MySQL and Docker Strategies

April 20, 2016 - 6:19pm

The afternoon sessions at Percona Live 2016 are just as intriguing as the morning sessions. I’m trying to hit as many as I can, especially talks on interesting and upcoming technologies that are improving deployments and manageability. One of the talks I saw in this vein was MySQL and Docker Strategies, given by Patrick Galbraith, Senior Systems Engineer at Hewlett Packard, and Giuseppe Maxia, Quality Assurance Director at VMware.

Docker is a great new project for managing containers. Containers provide operating system resource isolation and allocation benefits as virtual machines, yet are more lightweight and allow you a lot of flexibility in how you can design your application and database architecture. This talk covered many of the useful things one can do with Docker and MySQL, such as:

  • Build ready-to-launch containers that are “batteries included” MySQL servers
  • Backup and restore MySQL databases using a container strategy that allows both containers and host systems access to the same data
  • Network containers across hosts
  • Implement container clustering solutions such as CoreOS, Kubernetes, and Mesos and how each of these solutions solves a similar requirement
  • Automate containerized MySQL using Ansible

Patrick and Giuseppe also provided summary details and a demonstration of their recent contribution to Kubernetes (a Galera cluster application), which showed how one can have a quickly deployable MySQL synchronous replication cluster in minutes and use it with other Kubernetes applications.

I had a chance to speak with Giuseppe after the talk:

 

Check out the Percona Live 2016 schedule for more talks!


General Inquiries

For general inquiries, please send us your question and someone will contact you.