Buy Percona SupportBuy Now

How to validate a backup

Lastest Forum Posts - August 9, 2016 - 5:46am
Dear all,

During a scheduled backup procedure, how can I be sure that a backup is correct?
I used to work with Oracle Databases. With Oracle, we used to check size of files to determine the backup is valid or not. If the size of backup files were smaller than the actual size, we found out that we should run the backup procedure again.
As I know so far, the best solution for this issue is restoring the whole backup and running 'table checks' but restoring and checking tables take a lot of time. Is there any other solution to solve this problem without restoring?

XtraBackup --compact and stream=tar compatibles with each other?

Lastest Forum Posts - August 9, 2016 - 1:07am
Hi everybody,

I'm using innobackupex and I want to perform a backup with --compact and --stream=tar options but when I restore this bak doesn't work... and I don't know why.
It seems as if the secondary indexes didn't create correctly before the retoration though I use the option '--rebuild-indexes'.

Backup command:

Code: innobackupex --user=$db_user --password=$db_pass --compact --stream=tar ./ --no-timestamp | gzip - > $WORKING_DIR/$BCK_FILE Restore command:

Code: tar -zxif $src -C $TMP_DIR/database innobackupex --apply-log --rebuild-indexes $TMP_DIR/database innobackupex --copy-back $TMP_DIR/database The return of the instruction --apply-log is always different from zero...

Anybody knows if this two methods are compatible with each other?
Somebody could give an example with all the steps to be performed this kind of backup?

Thank you so much.

Best regards,


Docker Images for MySQL Group Replication 5.7.14

Latest MySQL Performance Blog posts - August 8, 2016 - 9:06am

In this post, I will point you to Docker images for MySQL Group Replication testing.

There is a new release of MySQL Group Replication plugin for MySQL 5.7.14. It’s a “beta” plugin and it is probably the last (or at lease one of the final pre-release packages) before Group Replication goes GA (during Oracle OpenWorld 2016, in our best guess).

Since it is close to GA, it would be great to get a better understanding of this new technology. Unfortunately, MySQL Group Replication installation process isn’t very user-friendly.

Or, to put it another way, totally un-user-friendly! It consists of a mere “50 easy steps” – by which I think they mean “easy” to mess up.

Matt Lord, in his post, acknowledges: “getting a working MySQL service consisting of 3 Group Replication members is not an easy “point and click” or automated single command style operation.”

I’m not providing a review of MySQL Group Replication 5.7.14 yet – I need to play around with it a lot more. To make this process easier for myself, and hopefully more helpful to you, I’ve prepared Docker images for the testing of MySQL Group Replication.

Docker Images

To start the first node, run:

docker run -d --net=cluster1 --name=node1 perconalab/mysql-group-replication --group_replication_bootstrap_group=ON

To join all following nodes:

docker run -d --net=cluster1 --name=node2 perconalab/mysql-group-replication --group_replication_group_seeds='node1:6606'

Of course, you need to have Docker Network running:

docker network create cluster1

I hope this will make the testing process easier!

2 Node cluster locks db on 1 end, fails to reconnect automatically

Lastest Forum Posts - August 8, 2016 - 5:15am

I have a 2 node Percona cluster (percona-xtradb-cluster-56, 5.6.26-25.12-1.wheezy).

I had an issue where it seems db2 become unavailable due to some network issue.

While this happening db1 did not crash but locked down the database completely what freeradius was using. I guess this is normal behaviour.

During this time the database on db2 was accessible.

I have a feeling that there was not a long network outage between the nodes rather the auto-reconnect mechanism was failing because after I did restart db1 (just 4 minutes later of the last autoreconnect attempt) the cluster resynced and the dbs on db1 become accessible again.

My questions are:

1, How can I have the db at least in read only mode when the cluster split on both ends? In my case it would be useful for the radius to be still able to do authentication without updating infos in the database.

2, Can this be anyhow caused that my setup is using wsrep_sst_method=rsync instead of wsrep_sst_method=xtrabackup-v2?
I had no problem with this before.

3, How to increase the reconnect retry value to very high?


MHA for Mysql error masterha_check_repl

Lastest Forum Posts - August 7, 2016 - 11:19am
I am configuring MHA for mysql, i have tested masterha_check_ssh /etc/app1.cnf and my SSH connections are OK, but when im trying masterha_check_repl /etc/app1.cnf i got some errors Redundant argument in sprintf at .../ ln427, 190, Error happend on monitoring servers (Got exit code 1 (No master dead)).
The thing i got this error even when i confirmed that mysql replication is working i tested it but if i run masterha_check_repl is not running, what i can review any advise.


Percona XtraDB Cluster on Ceph

Latest MySQL Performance Blog posts - August 4, 2016 - 3:31pm

This post discusses how XtraDB Cluster and Ceph are a good match, and how their combination allows for faster SST and a smaller disk footprint.

My last post was an introduction to Red Hat’s Ceph. As interesting and useful as it was, it wasn’t a practical example. Like most of the readers, I learn about and see the possibilities of technologies by burning my fingers on them. This post dives into a real and novel Ceph use case: handling of the Percona XtraDB Cluster SST operation using Ceph snapshots.

If you are familiar with Percona XtraDB Cluster, you know that a full state snapshot transfer (SST) is required to provision a new cluster node. Similarly, SST can also be triggered when a cluster node happens to have a corrupted dataset. Those SST operations consist essentially of a full copy of the dataset sent over the network. The most common SST methods are Xtrabackup and rsync. Both of these methods imply a significant impact and load on the donor while the SST operation is in progress.

For example, the whole dataset will need to be read from the storage and sent over the network, an operation that requires a lot of IO operations and CPU time. Furthermore, with the rsync SST method, the donor is under a read lock for the whole duration of the SST. Consequently, it can take no write operations. Such constraints on SST operations are often the main motivations beyond the reluctance of using Percona XtraDB cluster with large datasets.

So, what could we do to speed up SST? In this post, I will describe a method of performing SST operations when the data is not local to the nodes. You could easily modify the solution I am proposing for any non-local data source technology that supports snapshots/clones, and has an accessible management API. Off the top of my head (other than Ceph) I see AWS EBS and many SAN-based storage solutions as good fits.

The challenges of clone-based SST

If we could use snapshots and clones, what would be the logical steps for an SST? Let’s have a look at the following list:

  1. New node starts (joiner) and unmounts its current MySQL datadir
  2. The joiner and asks for an SST
  3. The donor creates a consistent snapshot of its MySQL datadir with the Galera position
  4. The donor sends to the joiner the name of the snapshot to use
  5. The joiner creates a clone of the snapshot name provided by the donor
  6. The joiner mounts the snapshot clone as the MySQL datadir and adjusts ownership
  7. The joiner initializes MySQL on the mounted clone

As we can see, all these steps are fairly simple, but hide some challenges for an SST method base on cloning. The first challenge is the need to mount the snapshot clone. Mounting a block device requires root privileges – and SST scripts normally run under the MySQL user. The second challenge I encountered wasn’t expected. MySQL opens the datadir and some files in it before the SST happens. Consequently, those files are then kept opened in the underlying mount point, a situation that is far from ideal. Fortunately, there are solutions to both of these challenges as we will see below.

SST script

So, let’s start with the SST script. The script is available in my Github at:

You should install the script in the /usr/bin directory, along with the other user scripts. Once installed, I recommend:

chown root.root /usr/bin/wsrep_sst_ceph chmod 755 /usr/bin/wsrep_sst_ceph

The script has a few parameters that can be defined in the [sst] section of the my.cnf file.

The Ceph pool where this node should create the clone. It can be a different pool from the one of the original dataset. For example, it could have a replication factor of 1 (no replication) for a read scaling node. The default value is: mysqlpool
What mount point to use. It defaults to the MySQL datadir as provided to the SST script.
The options used to mount the filesystem. The default value is: rw,noatime
The Ceph keyring file to authenticate against the Ceph cluster with cephx. The user under which MySQL is running must be able to read the file. The default value is: /etc/ceph/ceph.client.admin.keyring
Whether or not the script should cleanup the snapshots and clones that are no longer is used. Enable = 1, Disable = 0. The default value is: 0
Root privileges

In order to allow the SST script to perform privileged operations, I added an extra SST role: “mount”. The SST script on the joiner will call itself back with sudo and will pass “mount” for the role parameter. To allow the elevation of privileges, the follow line must be added to the /etc/sudoers file:

mysql ALL=NOPASSWD: /usr/bin/wsrep_sst_ceph

Files opened by MySQL before the SST

Upon startup, MySQL opens files at two places in the code before the SST completes. The first one is in the function mysqld_main , which sets the current working directory to the datadir (an empty directory at that point).  After the SST, a block device is mounted on the datadir. The issue is that MySQL tries to find the files in the empty mount point directory. I wrote a simple patch, presented below, and issued a pull request:

diff --git a/sql/ b/sql/ index 90760ba..bd9fa38 100644 --- a/sql/ +++ b/sql/ @@ -5362,6 +5362,13 @@ a file name for --log-bin-index option", opt_binlog_index_name); } } } + + /* + * Forcing a new setwd in case the SST mounted the datadir + */ + if (my_setwd(mysql_real_data_home,MYF(MY_WME)) && !opt_help) + unireg_abort(1); /* purecov: inspected */ + if (opt_bin_log) { /*

With this patch, I added a new my_setwd call right after the SST completed. The Percona engineering team approved the patch, and it should be added to the upcoming release of Percona XtraDB Cluster.

The Galera library is the other source of opened files before the SST. Here, the fix is just in the configuration. You must define the base_dir Galera provider option outside of the datadir. For example, if you use /var/lib/mysql as datadir and cephmountpoint, then you should use:


Of course, if you have other provider options, don’t forget to add them there.


So, what are the steps required to use Ceph with Percona XtraDB Cluster? (I assume that you have a working Ceph cluster.)

1. Join the Ceph cluster

The first thing you need is a working Ceph cluster with the needed CephX credentials. While the setup of a Ceph cluster is beyond the scope of this post, we will address it in a subsequent post. For now, we’ll focus on the client side.

You need to install the Ceph client packages on each node. On my test servers using Ubuntu 14.04, I did:

wget -q -O- '' | sudo apt-key add - sudo apt-add-repository 'deb trusty main' apt-get update apt-get install ceph

These commands also installed all the dependencies. Next, I copied the Ceph cluster configuration file /etc/ceph/ceph.conf:

[global] fsid = 87671417-61e4-442b-8511-12659278700f mon_initial_members = odroid1, odroid2 mon_host =,, auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_journal = /var/lib/ceph/osd/journal osd_journal_size = 128 osd_pool_default_size = 2

and the authentication file /etc/ceph/ceph.client.admin.keyring from another node. I made sure these files were readable by all. You can define more refined privileges for a production system with CephX, the security layer of Ceph.

Once everything is in place, you can test if it is working with this command:

root@PXC3:~# ceph -s cluster 87671417-61e4-442b-8511-12659278700f health HEALTH_OK monmap e2: 3 mons at {odroid1=,odroid2=,serveur-famille=} election epoch 474, quorum 0,1,2 odroid1,odroid2,serveur-famille mdsmap e204: 1/1/1 up {0=odroid3=up:active} osdmap e995: 4 osds: 4 up, 4 in pgmap v275501: 1352 pgs, 5 pools, 321 GB data, 165 kobjects 643 GB used, 6318 GB / 7334 GB avail 1352 active+clean client io 16491 B/s rd, 2425 B/s wr, 1 op/s

Which gives the current state of the Ceph cluster.

2. Create the Ceph pool

Before we can use Ceph, we need to create a first RBD image, put a filesystem on it and mount it for MySQL on the bootstrap node. We need at least one Ceph pool since the RBD images are stored in a Ceph pool.  We create a Ceph pool with the command:

ceph osd pool create mysqlpool 512 512 replicated

Here, we have defined the pool mysqlpool with 512 placement groups. On a larger Ceph cluster, you might need to use more placement groups (again, a topic beyond the scope of this post). The pool we just created is replicated. Each object in the pool will have two copies as defined by the osd_pool_default_size parameter in the ceph.conf file. If needed, you can modify the size of a pool and its replication factor at any moment after the pool is created.

3. Create the first RBD image

Now that we have a pool, we can create a first RBD image:

root@PXC1:~# rbd -p mysqlpool create PXC --size 10240 --image-format 2

and “map” the RBD image to a host block device:

root@PXC1:~# rbd -p mysqlpool map PXC /dev/rbd1

The commands return the local RBD block device that corresponds to the RBD image. The other steps are not specific to RBD images, we need to create a filesystem and prepare the mount points.

The rest of the steps are not specific to RBD images. We need to create a filesystem and prepare the mount points:

mkfs.xfs /dev/rbd1 mount /dev/rbd1 /var/lib/mysql -o rw,noatime,nouuid chown mysql.mysql /var/lib/mysql mysql_install_db --datadir=/var/lib/mysql --user=mysql mkdir /var/lib/galera chown mysql.mysql /var/lib/galera

You need to mount the RBD device and run the mysql_install_db tool only on the bootstrap node. You need to create the directories /var/lib/mysql and /var/lib/galera on the other nodes and adjust the permissions similarly.

4. Modify the my.cnf files

You will need to set or adjust the specific wsrep_sst_ceph settings in the my.cnf file of all the servers. Here are the relevant lines from the my.cnf file of one of my cluster node:

[mysqld] wsrep_provider=/usr/lib/ wsrep_provider_options="base_dir=/var/lib/galera" wsrep_cluster_address=gcomm://,, wsrep_node_address= wsrep_sst_method=ceph wsrep_cluster_name=ceph_cluster [sst] cephlocalpool=mysqlpool cephmountoptions=rw,noatime,nodiratime,nouuid cephkeyring=/etc/ceph/ceph.client.admin.keyring cephcleanup=1

At this point, we can bootstrap the cluster on the node where we mounted the initial RBD image:

/etc/init.d/mysql bootstrap-pxc

5. Start the other XtraDB Cluster nodes

The first node does not perform an SST, so nothing exciting so far. With the patched version of MySQL (the above patch), starting MySQL on a second node triggers a Ceph SST operation. In my test environment, the SST take about five seconds to complete on low-powered VMs. Interestingly, the duration is not directly related to the dataset size. Because of this, a much larger dataset, on a quiet database, should take about the exact same time. A very busy database may need more time, since an SST requires a “flush tables with read lock” at some point.

So, after their respective Ceph SST, the other two nodes have:

root@PXC2:~# mount | grep mysql /dev/rbd1 on /var/lib/mysql type xfs (rw,noatime,nodiratime,nouuid) root@PXC2:~# rbd showmapped id pool image snap device 1 mysqlpool PXC2-1463776424 - /dev/rbd1 root@PXC3:~# mount | grep mysql /dev/rbd1 on /var/lib/mysql type xfs (rw,noatime,nodiratime,nouuid) root@PXC3:~# rbd showmapped id pool image snap device 1 mysqlpool PXC3-1464118729 - /dev/rbd1

The original RBD image now has two snapshots that are mapped to the clones mounted by other two nodes:

root@PXC3:~# rbd -p mysqlpool ls PXC PXC2-1463776424 PXC3-1464118729 root@PXC3:~# rbd -p mysqlpool info PXC2-1463776424 rbd image 'PXC2-1463776424': size 10240 MB in 2560 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.108b4246146651 format: 2 features: layering flags: parent: mysqlpool/PXC@1463776423 overlap: 10240 MB


Apart from allowing faster SST, what other benefits do we get from using Ceph with Percona XtraDB Cluster?

The first benefit is the inherent data duplication over the network removes the need for local data replication. Thus, instead of using raid-10 or raid-5 with an array of disks, we could use a simple raid-0 stripe set if the data is already replicated to more than one server.

The second benefit is a bit less obvious: you don’t need as much storage. Why? A Ceph clone only stores the delta from its original snapshot. So, for large, read intensive datasets, the disk space savings can be very significant. Of course, over time, the clone will drift away from its parent snapshot and will use more and more space. When we determine that a Ceph clone uses too much disk space, we can simply refresh the clone by restarting MySQL and forcing a full SST. The SST script will automatically drop the old clone and snapshot when the cephcleanup option is set, and it will create a new fresh clone. You can easily evaluate how much space is consumed by the clone using the following commands:

root@PXC2:~# rbd -p mysqlpool du PXC2-1463776424 warning: fast-diff map is not enabled for PXC2-1463776424. operation may be slow. NAME PROVISIONED USED PXC2-1463776424 10240M 164M

Also, nothing prevents you using a different configuration of Ceph pools in the same XtraDB cluster. Therefore a Ceph clone can use a different pool than its parent snapshot. That’s the whole purpose of the cephlocalpool parameter. Strictly speaking, you only need one node to use a replicated pool, as the other nodes could run on clones that are stored data in a non-replicated pool (saving a lot of storage space). Furthermore, we can define the OSD affinity of the non-replicated pool in a way that it stores data on the host where it is used, reducing the cross node network latency.

Using Ceph for XtraDB Cluster SST operation demonstrates one of the array of possibilities offered to MySQL by Ceph. We continue to work with the Red Hat team and Red Hat Ceph Storage architects to find new and useful ways of addressing database issues in the Ceph environment. There are many more posts to come, so stay tuned!

DISCLAIMER: The wsrep_sst_ceph script isn’t officially supported by Percona.

pt-online-schema-change causing transactions wait for AUTO-INC lock and crash server

Lastest Forum Posts - August 4, 2016 - 12:19pm
1. ~2 Million rows, 37G table
2. inserts only table
3. inserts 500~1000 rows per min

try to use pt-online-schema-change tool to add a column to an existing index but it is causing lots of transactions waiting for auto-inc lock and evantually causing server overloaded and shuts down.

we tried make chunk-time smaller but not help. the parameters we use are: --no-check-replication-filters --chunk-time 0.05 --sleep 1


TRANSACTION 18459639428, ACTIVE 19 sec setting auto-inc lock
mysql tables in use 2, locked 2
3 lock struct(s), heap size 360, 0 row lock(s), undo log entries 2


TABLE LOCK table `my_db`.`_the_table_new` trx id 18459639428 lock mode AUTO-INC waiting

1. From processlist, we found there's a bulk insert ("INSERT LOW_PRIORITY IGNORE INTO ") that inserting about 800 rows but takes 173 secs. not sure why it take that long. any explanations?

2. It seems there is no need to hold auto-inc lock for `_the_table_new` because the rows copying from `_the_table_old` will always has the value for auto-increment column, right? any ways to prevent using auto-inc lock for the `_the_table_new`?

3. any suggestions for solving the issue?


Very slow database response. I suspect InnoDB .

Lastest Forum Posts - August 4, 2016 - 11:03am

I have two database servers with almost the same hardware and software configurations. The load of the servers is almost equal. The problem is that the first server works as snail compared with the second server.

For example If I run this query on the problematic server:

Code: mysql> SELECT table_schema "Database Name", SUM(data_length+index_length)/1024/1024 "Database Size (MB)" FROM information_schema.TABLES GROUP BY table_schema; Before to see the result like this:

Code: +----------------------------+--------------------+ | Database Name | Database Size (MB) | +----------------------------+--------------------+ | activtra_b2fx | 260.60385895 | | activtra_coper | 0.06334019 | ................................................................................ | ziwitrad_wp99 | 4.09765244 | +----------------------------+--------------------+ 127 rows in set (34.59 sec) mysql>
I need to wait between 5 and 35 seconds. I don't know why sometimes the server returning the response for 5 seconds and why other time I need to wait more than 30 seconds.

The same query, but executed on the second server returning response with 93 rows for less than 1 second. Usually the responses are returned between 0.1 and 0.3 seconds.

As you can see the difference is colossal and I don't know why?

Before to execute the above query on problematic server I checked the load and everything look good. Below you can see the check results:

Code: mysql> show processlist; +--------+-----------+-----------+-----------+---------+------+-------+------------------+-----------+---------------+-----------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read | +--------+-----------+-----------+-----------+---------+------+-------+------------------+-----------+---------------+-----------+ | 103183 | eximstats | localhost | eximstats | Sleep | 38 | | NULL | 0 | 0 | 0 | | 103327 | root | localhost | NULL | Query | 0 | NULL | show processlist | 0 | 0 | 0 | +--------+-----------+-----------+-----------+---------+------+-------+------------------+-----------+---------------+-----------+ 2 rows in set (0.00 sec) mysql> Code: # top top - 20:54:13 up 259 days, 9:16, 1 user, load average: 0.50, 2.05, 2.16 Tasks: 309 total, 1 running, 307 sleeping, 0 stopped, 1 zombie Cpu(s): 2.6%us, 0.6%sy, 0.0%ni, 96.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 12191156k total, 10825468k used, 1365688k free, 1277832k buffers Swap: 5242872k total, 607648k used, 4635224k free, 5509472k cached Code: # ./ -- MYSQL PERFORMANCE TUNING PRIMER -- - By: Matthew Montgomery - MySQL Version 5.5.45-37.4-log x86_64 Uptime = 2 days 2 hrs 56 min 29 sec Avg. qps = 11 Total Questions = 2042286 Threads Connected = 2 Server has been running for over 48hrs. It should be safe to follow these recommendations To find out more information on how each of these runtime variables effects performance visit: Visit for info about MySQL's Enterprise Monitoring and Advisory Service SLOW QUERIES The slow query log is enabled. Current long_query_time = 10.000000 sec. You have 352 out of 2042307 that take longer than 10.000000 sec. to complete Your long_query_time seems to be fine BINARY UPDATE LOG The binary update log is NOT enabled. You will not be able to do point in time recovery See WORKER THREADS Current thread_cache_size = 8 Current threads_cached = 6 Current threads_per_sec = 0 Historic threads_per_sec = 0 Your thread_cache_size is fine MAX CONNECTIONS Current max_connections = 151 Current threads_connected = 2 Historic max_used_connections = 24 The number of used connections is 15% of the configured maximum. Your max_connections variable seems to be fine. INNODB STATUS So far I'm seeing the above output in a split second. Then I wait additional 20-30 seconds before to continue with the below output of the command:

Code: Current InnoDB index space = 324 M Current InnoDB data space = 999 M Current InnoDB buffer pool free = 0 % Current innodb_buffer_pool_size = 512 M Depending on how much space your innodb indexes take up it may be safe to increase this value to up to 2 / 3 of total system memory MEMORY USAGE Max Memory Ever Allocated : 3.26 G Configured Max Per-thread Buffers : 3.90 G Configured Max Global Buffers : 2.64 G Configured Max Memory Limit : 6.54 G Physical Memory : 11.62 G Max memory limit seem to be within acceptable norms KEY BUFFER Current MyISAM index space = 203 M Current key_buffer_size = 2.00 G Key cache miss rate is 1 : 137 Key buffer free ratio = 78 % Your key_buffer_size seems to be fine QUERY CACHE Query cache is enabled Current query_cache_size = 128 M Current query_cache_used = 125 M Current query_cache_limit = 16 M Current Query cache Memory fill ratio = 98.16 % Current query_cache_min_res_unit = 4 K However, 2150 queries have been removed from the query cache due to lack of memory Perhaps you should raise query_cache_size MySQL won't cache query results that are larger than query_cache_limit in size SORT OPERATIONS Current sort_buffer_size = 2 M Current read_rnd_buffer_size = 256 K Sort buffer seems to be fine JOINS Current join_buffer_size = 16.00 M You have had 4748 queries where a join could not use an index properly join_buffer_size >= 4 M This is not advised You should enable "log-queries-not-using-indexes" Then look for non indexed joins in the slow query log. OPEN FILES LIMIT Current open_files_limit = 32930 files The open_files_limit should typically be set to at least 2x-3x that of table_cache if you have heavy MyISAM usage. Your open_files_limit value seems to be fine TABLE CACHE Current table_open_cache = 16384 tables Current table_definition_cache = 20480 tables You have a total of 6569 tables You have 12470 open tables. The table_cache value seems to be fine TEMP TABLES Current max_heap_table_size = 768 M Current tmp_table_size = 768 M Of 206436 temp tables, 30% were created on disk Perhaps you should increase your tmp_table_size and/or max_heap_table_size to reduce the number of disk-based temporary tables Note! BLOB and TEXT columns are not allow in memory tables. If you are using these columns raising these values might not impact your ratio of on disk temp tables. TABLE SCANS Current read_buffer_size = 8 M Current table scan ratio = 1249 : 1 read_buffer_size seems to be fine TABLE LOCKING Current Lock Wait ratio = 1 : 16787 Your table locking seems to be fine On the good working server the have need only 2-3 seconds to complete with all output without any delays in InnoDB part.

This is my /etc/my.cnf on problematic server:

Code: [mysqld] socket=/tmp/mysql.sock #set-variable = max_connections=500 max_user_connections=16 log_slow_queries=/var/log/mysql-slow.log #log-slow-queries #safe-show-database join_buffer_size=16M max_allowed_packet=268435456 open_files_limit=32768 # Skip reverse DNS lookup of clients skip-name-resolve query_cache_size=128M query_cache_limit=16M key_buffer=2048M table_cache=16384 table_definition_cache=20480 tmp_table_size=768M max_heap_table_size=768M read_buffer_size=8M innodb_buffer_pool_size=512M innodb_file_per_table=1 thread_cache_size=8 low_priority_updates=1 [client] socket=/tmp/mysql.sock Do you have any ideas on what could be the reason for this delay?

IST receiver unable to bind address to public IP

Lastest Forum Posts - August 4, 2016 - 3:34am
In AWS ec2, applications are only allowed to listen private IP rather than public IP. Public IP is not visible on the host. Anything send to public IP will be forward to corresponding private IP. IST receiver will try to bind the address specified by wsrep_node_address. For galera clusters across data centers, we must set wsrep_node_address to node public IP address.
Here's the problem:
No NIC has public IP.
Setting wsrep_node_address to public IP address will make IST receiver tries to bind that IP and fails:

2016-08-04 06:06:16 19852 [Warning] WSREP: Failed to prepare for incremental state transfer: Failed to open IST listener at tcp://', asio error 'Cannot assign requested address': 99 (Cannot assign requested address) at galera/src/ist.cpprepare():326. IST will be unavailable.

Setting wsrep_node_address to private IP address will cause the donor node tries to access joiner node via private IP address, which is not possible.

We need to separate the listening address (private) and access adresss (public). Setting wsrep_node_incoming_address is not helping in this case. The donor still tries to access joiner via wsrep_node_address (private).

Difference between query_time and response_time

Lastest Forum Posts - August 3, 2016 - 10:33pm
Version | 5.5.45-37.4-log Percona Server (GPL), Release 37.4, Revision 042e02b


I am a bit confused on the difference between query_time as reported in the mysql slow logs and the response time as reported by the Response Time Distribution (SHOW QUERY_RESPONSE_TIME.

The confusion stems from the observation that the mysql slow logs shows the maximum query_time as 347s (when checked with all logs).

While the distribution has entries
| | | |
| 0.000001 | 914 | 0.000000 |
| 0.000010 | 11716670 | 25.632509 |
| 0.000100 | 183210221 | 6030.333252 |
| 0.001000 | 30203398 | 8305.915448 |
| 0.010000 | 55028382 | 202486.117777 |
| 0.100000 | 51674821 | 1644526.564266 |
| 1.000000 | 7102115 | 2539651.042982 |
| 10.000000 | 278069 | 572907.555653 |
| 100.000000 | 6846 | 111699.020993 |
| 1000.000000 | 31 | 9921.227099 |
| 10000.000000 | 6 | 50372.869107 |
| 100000.000000 | 47 | 520054.762721 |

| 1000000.000000 | 0 | 0.000000 |

As seen from the highlighted (red color fonts) there are 6 +47 =53 queries that are above the range of the maximum query_time as reported in the slow logs

The above would either imply that the response_time and query_time from the two are different. If so what is the difference between them?
Or if they are the same, then why did the mysql slow log not report/log 6+47 =53 query information.

Testing Docker multi-host network performance

Latest MySQL Performance Blog posts - August 3, 2016 - 12:26pm

In this post, I’ll review Docker multi-host network performance.

In a past post, I tested Docker network. The MySQL Server team provided their own results, which are in line with my observations.

For this set of tests, I wanted to focus more on Docker networking using multiple hosts. Mostly because when we set up a high availability (HA) environment (using Percona XtraDB Cluster, for example) the expectation is that instances are running on different hosts.

Another reason for this test is that Docker recently announced the 1.12 release, which supports Swarm Mode. Swarm Mode is quite interesting by itself — with this release, Docker targets going deeper on Orchestration deployments in order to compete with Kubernetes and Apache Mesos. I would say Swarm Mode is still rough around the edges (expected for a first release), but I am sure Docker will polish this feature in the next few releases. Swarm Mode also expects that you run services on different physical hosts, and services

Swarm Mode also expects that you run services on different physical hosts, and services are communicated over Docker network. I wanted to see how much of a performance hit we get when we run over Docker network on multiple hosts.

Network performance is especially important for clustering setups like Percona XtraDB Cluster and  MySQL Group Replication (which just put out another Lab release).

For my setup, I used two physical servers connected over a 10GB network. Both servers use 56 cores total of Intel CPUs.

Sysbench setup: data fits into memory, and I will only use primary key lookups. Testing over the network gives the worst case scenario for network round trips, but it also gives a good visibility on performance impacts.

The following are options for Docker network:

  • No Docker containers (marked as “direct” in the following results)
  • Docker container uses “host” network (marked as “host”)
  • Docker container uses “bridge” network, where service port exposed via port forwarding (marked as “bridge”)
  • Docker container uses “overlay” network, both client and server are started in containers connected via overlay network (marked as “overlay” in the results). For “overlay” network it is possible to use third-party plugins, with different implementation of the network, the most known are:

For multi-host networking setup, only “overlay” (and plugins implementations) are feasible. I used “direct”, “host” and “bridge” only for the reference and as a comparison to measure the overhead of overlay implementations.

The results I observed are:

Client Server Throughput, tps Ratio to “direct-direct” Direct Direct 282780 1.0 Direct Host 280622 0.99 Direct Bridge 250104 0.88 Bridge Bridge 235052 0.83 overlay overlay 120503 0.43 Calico overlay Calico overlay 246202 0.87 Weave overlay Weave overlay 11554 0.044


  • “Bridge” network added overhead, about 12%, which is in line with my previous benchmark. I wonder, however, if this is Docker overhead or just the Linux implementation of bridge networks. Docker should be using the setup that I described in Running Percona XtraDB Cluster nodes with Linux Network namespaces on the same host, and I suspect that the Linux network namespaces and bridges add overhead. I need to do more testing to verify.
  • Native “Overlay” Docker network struggled from performance problems. I observed issues with ksoftirq using 100% of one CPU core, and I see similar reports. It seems that network interruptions in Docker “overlay” are not distributed properly across multiple CPUs. This is not the case with the “direct” and “bridge” configuration. I believe this is a problem with the Docker “overlay” network (hopefully, it will eventually be fixed).
  • Weave network showed absolutely terrible results. I see a lot of CPU allocated to “weave” containers, so I think there are serious scalability issues in their implementation.
  • Calico plugin showed the best result for multi-host containers, even better than “bridge-bridge” network setup

If you need to use Docker “overlay” network — which is a requirement if you are looking to deploy a multi-host environment or use Docker Swarm mode — I recommend you consider using the Calico network plugin for Docker. Native Docker “overlay” network can be used for prototype or quick testing cases, but at this moment it shows performance problems on high-end hardware.


Error when preparing a full backup

Lastest Forum Posts - August 3, 2016 - 8:29am

A while ago my provider had a power outage which caused a mysql corruption here is what I did to fix it:
-I was able to start mysql using innodb_force_recovery = 2 or 3 can't really remember
-I moved my table files somewhere else and deleted all content inside mysql folder
-I started Mysql service and it create ibdata1 file etc..
-I then copied back the tables files and I was able to start mysql with innodb_force_recovery = 0 and everything seems to be normal.

Today I tried to take a backup of my Database using innobackupex, and it seems to be working as it returned the following
innobackupex: Connection to database server closed
innobackupex: completed OK!

However when I tried to prepare the backup that I just took, I got the following error: (using the command innobackupex --apply-log /data/backname/

160803 11:24:28 innobackupex: Starting ibbackup with command: xtrabackup_56 --defaults-file="/data/2016-08-03_11-22-40/backup-my.cnf" --defaults-group="mysqld" --prepare --target-dir=/data/2016-08-03_11-22-40 --tmpdir=/tmp

xtrabackup_56 version 2.1.9 for MySQL server 5.6.17 Linux (i686) (revision id: 746)
xtrabackup: cd to /data/2016-08-03_11-22-40
xtrabackup: This target seems to be not prepared yet.
xtrabackup: xtrabackup_logfile detected: size=2097152, start_lsn=(1369520463)
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:10M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 1
xtrabackup: innodb_log_file_size = 2097152
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:10M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 1
xtrabackup: innodb_log_file_size = 2097152
xtrabackup: Starting InnoDB instance for recovery.
xtrabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
InnoDB: Using atomics to ref count buffer pool pages
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Compressed tables use zlib 1.2.3
InnoDB: Not using CPU crc32 instructions
InnoDB: Initializing buffer pool, size = 100.0M
InnoDB: Completed initialization of buffer pool
InnoDB: Highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 1369520463
InnoDB: Database was not shutdown normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages
InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 1369523291 (0%)
InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 2016-08-03 11:24:28 ad7dfb70 InnoDB: Assertion failure in thread 2910714736 in file line 1271
InnoDB: Failing assertion: !page || (ibool)!!page_is_comp(page) == dict_table_is_comp(index->table)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: about forcing recovery.
innobackupex: Error:
innobackupex: ibbackup failed at /usr/bin/innobackupex line 2560.

Any idea how can I fix this issue?

Thank you

How to prepare Streaming and Compressing Backups and Log Sequence Error

Lastest Forum Posts - August 3, 2016 - 12:00am
Hi everybody,

I'm using Streaming and Compressing Backups to perform my backup on a MariaDB 10.19:
Code: $ innobackupex --user=$db_user --password=$db_pass --compact --stream=tar ./ --no-timestamp | gzip - > $WORKING_DIR/$BCK_FILE All seems ok even when I restore the backup, innobackup output shows:

Code: ... innobackupex: Finished copying back files. 160801 13:52:31 innobackupex: completed OK! but when I start the mysqld I find the next message in its log:

Code: 2016-08-01 15:52:33 7f08d8ed87c0 InnoDB: Error: page 371 log sequence number 2340146 InnoDB: is in the future! Current system log sequence number 2337804. InnoDB: Your database may be corrupt or you may have copied the InnoDB InnoDB: tablespace but not the InnoDB log files. See InnoDB: InnoDB: for more information. ... a lot of times! In some cases I can't initialize the mysql.

On the page 23 of the manual 'PerconaXtraBackup-2.1.9.pdf' you can read:
'Note that the streamed backup will need to be prepared before restoration. Streaming mode does not prepare the backup.'

I have no idea of how prepare the backup before the restoration in this situation.
In the manual explains how prepare it using --apply-log argument but the backups aren't performed using stream and compressing mode.

Anybody knows how prepare it correctly?
How can I retore it?
Should I use the --apply-log to prepare it? How?

Thanks for your time.

Best rgrds,

Take Percona’s One-Click Database Security Downtime Poll

Latest MySQL Performance Blog posts - August 2, 2016 - 3:12pm

Take Percona’s database security downtime poll.

As Peter Zaitsev mentioned recently in his blog post on database support, the data breach costs can hit both your business reputation and your bottom line. Costs vary depending on the company size and market, but recent studies estimate direct costs ranging in average from $1.6M to 7.01M. Everyone agrees leaving rising security risks and costs unchecked is a recipe for disaster.

Reducing security-based outages doesn’t have a simple answer, but can be a combination of internal and external monitoring, support contracts, enhanced security systems, and a better understanding of security configuration settings.

Please take a few seconds and answer the following poll. It will help the community get an idea of how security breaches can impact their critical database environments.

If you’ve faced  specific issues, feel free to comment below. We’ll post a follow-up blog with the results!

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

You can see the results of our last blog poll on high availability here.

High Availability Poll Results

Latest MySQL Performance Blog posts - August 2, 2016 - 3:11pm

This blog reports the results of Percona’s high availability poll.

High availability (HA) is always a hot topic. The reality is that if your data is not available, your customers cannot do business with you. In fact, estimates show the average cost of downtime is about $5K per minute. With an average outage taking 40 minutes to correct, you could be looking at a potential cost of $200K if your MySQL instance goes down. Whether your database is on premise, or in public or private clouds, it is critical that your database deployment does not have a potentially devastating single point of failure.

The results from Percona’s high availability poll responses are in:

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

With over 700 participants, MySQL replication was the clear frontrunner when it comes to high availability solutions.

Percona has HA solutions available, come find out more at our website.

If you’re using other solutions or have specific issues, feel free to comment below.

Check out the latest Percona one-click poll on database security here.

Not able to add query analytics using pmm-admin tool

Lastest Forum Posts - August 2, 2016 - 6:35am
I installed pmm server and client on two different machines, following the instructions given here >>
Both pmm-server and client got installed properly but I was not able to see the Queries in the Query Analytics window. The pmm server GUI gave errors (PFA).

Also on the client side on giving the command "sudo pmm-admin add queries --user abc --password xyz" gave the following error:

"Error adding queries: "service" failed: exit status 1, Starting pmm-queries-exporter-42001
Unable to start, see /var/log/pmm-queries-exporter-42001.log"

The content of the file pmm-queries-exporter-42001.log:

flag needs an argument: -pid-file
Usage of /usr/local/percona/qan-agent/bin/percona-qan-agent:
-basedir string
Agent basedir (default "/usr/local/percona/qan-agent")
-listen string
Agent interface address (default "")
-pid-file string
PID file (default "")
Ping API
Print version

Also, on one other client I got the following error:

[MySQL] 2016/08/02 00:25:59 packets.go:118: write unix @->/var/run/mysqld/mysqld.sock: write: broken pipe
Cannot connect to MySQL: Error 1045: Access denied for user 'root'@'localhost' (using password: NO)

Please tell what I am doing wrong. Any help will be highly appreciated.


Introduction into storage engine troubleshooting: Q & A

Latest MySQL Performance Blog posts - August 1, 2016 - 3:43pm

In this blog, I will provide answers to the Q & A for the “Introduction into storage engine troubleshooting” webinar.

First, I want to thank everybody for attending the July 14 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: At which isolation level do pt-online-schema-change and pt-archive  copy data from a table?

A: Both tools do not change the server’s default transaction isolation level. Use either REPEATABLE READ or set it in my.cnf.

Q: Can I create an index to optimize a query which has group by A and order by B, both from different tables and A column is from the first table in the two table join?

A: Do you mean a query like SELECT ... FROM a, b GROUP BY a.A ORDER BY b.B ? Yes, this is possible:

mysql> explain select A, B, count(*) from a join b on( WHERE b.B < 4 GROUP BY a.A, b.B ORDER BY b.B ASC; +----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+ | 1 | SIMPLE | b | range | PRIMARY,B | B | 5 | NULL | 15 | Using where; Using index; Using temporary; Using filesort | | 1 | SIMPLE | a | ref | A | A | 5 | | 1 | Using index | +----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+ 2 rows in set (0.00 sec)

Q: Where can I find recommendations on what kind of engine to use for different application types or use cases?

A: Storage engines are always being actively developed, therefore I suggest that you don’t search for generic recommendations. These can be outdated just a few weeks after they are written. Study engines instead. For example, just a few years ago MyISAM was the only engine (among those officially supported) that could work with FULLTEXT indexes and SPATIAL columns. Now InnoDB supports both: FULLTEXT indexes since version 5.6 and GIS features in 5.7. Today I can recommend InnoDB as a general-purpose engine for all installations, and TokuDB for write-heavy workloads when you cannot use high-speed disks.

Alternative storage engines can help to realize specific business needs. For example, CONNECT brings data to your server from many sources, SphinxSE talks to the Sphinx daemon, etc.

Other alternative storage engines increase the speed of certain workloads. Memory, for example, can be a good fit for temporary tables.

Q: Can you please explain how we find the full text of the query when we query the view ‘statements_with_full_table_Scans’?

A: Do you mean view in sys schema? Sys schema views take information from summary_* and digests it in Performance Schema, therefore it does not contain full queries (only digests). Full text of the query can be found in the events_statements_*  tables in the Performance Schema. Note that even the events_statements_history_long  table can be rewritten very quickly, and you may want to save data from it periodically.

Q: Hi is TokuDB for the new document protocol?

A: As Alex Rubin showed in his detailed blog post, the new document protocol just converts NoSQL queries into SQL, and is thus not limited to any storage engine. To use documents and collections, a storage engine must support generated columns (which TokuDB currently does not). So support of X Protocol for TokuDB is limited to relational tables access.

Q: Please comment on “read committed” versus “repeatable read.”
Q: Repeatable read holds the cursor on the result set for the client versus read committed where the cursor is updated after a transaction.

A: READ COMMITTED and REPEATABLE READ are transaction isolation levels, whose details are explained here.
I would not correlate locks set on table rows in different transaction isolation modes with the result set. A transaction with isolation level REPEATABLE READ  instead creates a snapshot of rows that are accessed by the transaction. Let’s consider a table:

mysql> create table ti(id int not null primary key, f1 int) engine=innodb; Query OK, 0 rows affected (0.56 sec) mysql> insert into ti values(1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9); Query OK, 9 rows affected (0.03 sec) Records: 9 Duplicates: 0 Warnings: 0

Then start the transaction and select a few rows from this table:

mysql1> begin; Query OK, 0 rows affected (0.00 sec) mysql1> select * from ti where id < 5; +----+------+ | id | f1 | +----+------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | +----+------+ 4 rows in set (0.04 sec)

Now let’s update another set of rows in another transaction:

mysql2> update ti set f1 = id*2 where id > 5; Query OK, 4 rows affected (0.06 sec) Rows matched: 4 Changed: 4 Warnings: 0 mysql2> select * from ti; +----+------+ | id | f1 | +----+------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | | 6 | 12 | | 7 | 14 | | 8 | 16 | | 9 | 18 | +----+------+ 9 rows in set (0.00 sec)

You see that the first four rows – which we accessed in the first transaction – were not modified, and last four were modified. If InnoDB only saved the cursor (as someone answered above) we would expect to see the same result if we ran SELECT * ...  query in our old transaction, but it actually shows whole table content before modification:

mysql1> select * from ti; +----+------+ | id | f1 | +----+------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | | 6 | 6 | | 7 | 7 | | 8 | 8 | | 9 | 9 | +----+------+ 9 rows in set (0.00 sec)

So “snapshot”  is a better word than “cursor” for the result set. In the case of READ COMMITTED, the first transaction would see modified rows:

mysql1> drop table ti; Query OK, 0 rows affected (0.11 sec) mysql1> create table ti(id int not null primary key, f1 int) engine=innodb; Query OK, 0 rows affected (0.38 sec) mysql1> insert into ti values(1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9); Query OK, 9 rows affected (0.04 sec) Records: 9 Duplicates: 0 Warnings: 0 mysql1> set transaction isolation level read committed; Query OK, 0 rows affected (0.00 sec) mysql1> begin; Query OK, 0 rows affected (0.00 sec) mysql1> select * from ti where id < 5; +----+------+ | id | f1 | +----+------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | +----+------+ 4 rows in set (0.00 sec)

Let’s update all rows in the table this time:

mysql2> update ti set f1 = id*2; Query OK, 9 rows affected (0.04 sec) Rows matched: 9 Changed: 9 Warnings: 0

Now the first transaction sees both the modified rows with id >= 5 (not in the initial result set), but also the modified rows with id < 5 (which existed in the initial result set):

mysql1> select * from ti; +----+------+ | id | f1 | +----+------+ | 1 | 2 | | 2 | 4 | | 3 | 6 | | 4 | 8 | | 5 | 10 | | 6 | 12 | | 7 | 14 | | 8 | 16 | | 9 | 18 | +----+------+ 9 rows in set (0.00 sec)

General Inquiries

For general inquiries, please send us your question and someone will contact you.