Buy Percona SupportBuy Now

Percona XtraDB Cluster on Ceph

Latest MySQL Performance Blog posts - August 4, 2016 - 3:31pm

This post discusses how XtraDB Cluster and Ceph are a good match, and how their combination allows for faster SST and a smaller disk footprint.

My last post was an introduction to Red Hat’s Ceph. As interesting and useful as it was, it wasn’t a practical example. Like most of the readers, I learn about and see the possibilities of technologies by burning my fingers on them. This post dives into a real and novel Ceph use case: handling of the Percona XtraDB Cluster SST operation using Ceph snapshots.

If you are familiar with Percona XtraDB Cluster, you know that a full state snapshot transfer (SST) is required to provision a new cluster node. Similarly, SST can also be triggered when a cluster node happens to have a corrupted dataset. Those SST operations consist essentially of a full copy of the dataset sent over the network. The most common SST methods are Xtrabackup and rsync. Both of these methods imply a significant impact and load on the donor while the SST operation is in progress.

For example, the whole dataset will need to be read from the storage and sent over the network, an operation that requires a lot of IO operations and CPU time. Furthermore, with the rsync SST method, the donor is under a read lock for the whole duration of the SST. Consequently, it can take no write operations. Such constraints on SST operations are often the main motivations beyond the reluctance of using Percona XtraDB cluster with large datasets.

So, what could we do to speed up SST? In this post, I will describe a method of performing SST operations when the data is not local to the nodes. You could easily modify the solution I am proposing for any non-local data source technology that supports snapshots/clones, and has an accessible management API. Off the top of my head (other than Ceph) I see AWS EBS and many SAN-based storage solutions as good fits.

The challenges of clone-based SST

If we could use snapshots and clones, what would be the logical steps for an SST? Let’s have a look at the following list:

  1. New node starts (joiner) and unmounts its current MySQL datadir
  2. The joiner and asks for an SST
  3. The donor creates a consistent snapshot of its MySQL datadir with the Galera position
  4. The donor sends to the joiner the name of the snapshot to use
  5. The joiner creates a clone of the snapshot name provided by the donor
  6. The joiner mounts the snapshot clone as the MySQL datadir and adjusts ownership
  7. The joiner initializes MySQL on the mounted clone

As we can see, all these steps are fairly simple, but hide some challenges for an SST method base on cloning. The first challenge is the need to mount the snapshot clone. Mounting a block device requires root privileges – and SST scripts normally run under the MySQL user. The second challenge I encountered wasn’t expected. MySQL opens the datadir and some files in it before the SST happens. Consequently, those files are then kept opened in the underlying mount point, a situation that is far from ideal. Fortunately, there are solutions to both of these challenges as we will see below.

SST script

So, let’s start with the SST script. The script is available in my Github at:

https://github.com/y-trudeau/ceph-related-tools/raw/master/wsrep-sst/wsrep_sst_ceph

You should install the script in the /usr/bin directory, along with the other user scripts. Once installed, I recommend:

chown root.root /usr/bin/wsrep_sst_ceph chmod 755 /usr/bin/wsrep_sst_ceph

The script has a few parameters that can be defined in the [sst] section of the my.cnf file.

cephlocalpool
The Ceph pool where this node should create the clone. It can be a different pool from the one of the original dataset. For example, it could have a replication factor of 1 (no replication) for a read scaling node. The default value is: mysqlpool
cephmountpoint
What mount point to use. It defaults to the MySQL datadir as provided to the SST script.
cephmountoptions
The options used to mount the filesystem. The default value is: rw,noatime
cephkeyring
The Ceph keyring file to authenticate against the Ceph cluster with cephx. The user under which MySQL is running must be able to read the file. The default value is: /etc/ceph/ceph.client.admin.keyring
cephcleanup
Whether or not the script should cleanup the snapshots and clones that are no longer is used. Enable = 1, Disable = 0. The default value is: 0
Root privileges

In order to allow the SST script to perform privileged operations, I added an extra SST role: “mount”. The SST script on the joiner will call itself back with sudo and will pass “mount” for the role parameter. To allow the elevation of privileges, the follow line must be added to the /etc/sudoers file:

mysql ALL=NOPASSWD: /usr/bin/wsrep_sst_ceph

Files opened by MySQL before the SST

Upon startup, MySQL opens files at two places in the code before the SST completes. The first one is in the function mysqld_main , which sets the current working directory to the datadir (an empty directory at that point).  After the SST, a block device is mounted on the datadir. The issue is that MySQL tries to find the files in the empty mount point directory. I wrote a simple patch, presented below, and issued a pull request:

diff --git a/sql/mysqld.cc b/sql/mysqld.cc index 90760ba..bd9fa38 100644 --- a/sql/mysqld.cc +++ b/sql/mysqld.cc @@ -5362,6 +5362,13 @@ a file name for --log-bin-index option", opt_binlog_index_name); } } } + + /* + * Forcing a new setwd in case the SST mounted the datadir + */ + if (my_setwd(mysql_real_data_home,MYF(MY_WME)) && !opt_help) + unireg_abort(1); /* purecov: inspected */ + if (opt_bin_log) { /*

With this patch, I added a new my_setwd call right after the SST completed. The Percona engineering team approved the patch, and it should be added to the upcoming release of Percona XtraDB Cluster.

The Galera library is the other source of opened files before the SST. Here, the fix is just in the configuration. You must define the base_dir Galera provider option outside of the datadir. For example, if you use /var/lib/mysql as datadir and cephmountpoint, then you should use:

wsrep_provider_options="base_dir=/var/lib/galera"

Of course, if you have other provider options, don’t forget to add them there.

Walkthrough

So, what are the steps required to use Ceph with Percona XtraDB Cluster? (I assume that you have a working Ceph cluster.)

1. Join the Ceph cluster

The first thing you need is a working Ceph cluster with the needed CephX credentials. While the setup of a Ceph cluster is beyond the scope of this post, we will address it in a subsequent post. For now, we’ll focus on the client side.

You need to install the Ceph client packages on each node. On my test servers using Ubuntu 14.04, I did:

wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add - sudo apt-add-repository 'deb http://download.ceph.com/debian-infernalis/ trusty main' apt-get update apt-get install ceph

These commands also installed all the dependencies. Next, I copied the Ceph cluster configuration file /etc/ceph/ceph.conf:

[global] fsid = 87671417-61e4-442b-8511-12659278700f mon_initial_members = odroid1, odroid2 mon_host = 10.2.2.100, 10.2.2.20, 10.2.2.21 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_journal = /var/lib/ceph/osd/journal osd_journal_size = 128 osd_pool_default_size = 2

and the authentication file /etc/ceph/ceph.client.admin.keyring from another node. I made sure these files were readable by all. You can define more refined privileges for a production system with CephX, the security layer of Ceph.

Once everything is in place, you can test if it is working with this command:

root@PXC3:~# ceph -s cluster 87671417-61e4-442b-8511-12659278700f health HEALTH_OK monmap e2: 3 mons at {odroid1=10.2.2.20:6789/0,odroid2=10.2.2.21:6789/0,serveur-famille=10.2.2.100:6789/0} election epoch 474, quorum 0,1,2 odroid1,odroid2,serveur-famille mdsmap e204: 1/1/1 up {0=odroid3=up:active} osdmap e995: 4 osds: 4 up, 4 in pgmap v275501: 1352 pgs, 5 pools, 321 GB data, 165 kobjects 643 GB used, 6318 GB / 7334 GB avail 1352 active+clean client io 16491 B/s rd, 2425 B/s wr, 1 op/s

Which gives the current state of the Ceph cluster.

2. Create the Ceph pool

Before we can use Ceph, we need to create a first RBD image, put a filesystem on it and mount it for MySQL on the bootstrap node. We need at least one Ceph pool since the RBD images are stored in a Ceph pool.  We create a Ceph pool with the command:

ceph osd pool create mysqlpool 512 512 replicated

Here, we have defined the pool mysqlpool with 512 placement groups. On a larger Ceph cluster, you might need to use more placement groups (again, a topic beyond the scope of this post). The pool we just created is replicated. Each object in the pool will have two copies as defined by the osd_pool_default_size parameter in the ceph.conf file. If needed, you can modify the size of a pool and its replication factor at any moment after the pool is created.

3. Create the first RBD image

Now that we have a pool, we can create a first RBD image:

root@PXC1:~# rbd -p mysqlpool create PXC --size 10240 --image-format 2

and “map” the RBD image to a host block device:

root@PXC1:~# rbd -p mysqlpool map PXC /dev/rbd1

The commands return the local RBD block device that corresponds to the RBD image. The other steps are not specific to RBD images, we need to create a filesystem and prepare the mount points.

The rest of the steps are not specific to RBD images. We need to create a filesystem and prepare the mount points:

mkfs.xfs /dev/rbd1 mount /dev/rbd1 /var/lib/mysql -o rw,noatime,nouuid chown mysql.mysql /var/lib/mysql mysql_install_db --datadir=/var/lib/mysql --user=mysql mkdir /var/lib/galera chown mysql.mysql /var/lib/galera

You need to mount the RBD device and run the mysql_install_db tool only on the bootstrap node. You need to create the directories /var/lib/mysql and /var/lib/galera on the other nodes and adjust the permissions similarly.

4. Modify the my.cnf files

You will need to set or adjust the specific wsrep_sst_ceph settings in the my.cnf file of all the servers. Here are the relevant lines from the my.cnf file of one of my cluster node:

[mysqld] wsrep_provider=/usr/lib/libgalera_smm.so wsrep_provider_options="base_dir=/var/lib/galera" wsrep_cluster_address=gcomm://10.0.5.120,10.0.5.47,10.0.5.48 wsrep_node_address=10.0.5.48 wsrep_sst_method=ceph wsrep_cluster_name=ceph_cluster [sst] cephlocalpool=mysqlpool cephmountoptions=rw,noatime,nodiratime,nouuid cephkeyring=/etc/ceph/ceph.client.admin.keyring cephcleanup=1

At this point, we can bootstrap the cluster on the node where we mounted the initial RBD image:

/etc/init.d/mysql bootstrap-pxc

5. Start the other XtraDB Cluster nodes

The first node does not perform an SST, so nothing exciting so far. With the patched version of MySQL (the above patch), starting MySQL on a second node triggers a Ceph SST operation. In my test environment, the SST take about five seconds to complete on low-powered VMs. Interestingly, the duration is not directly related to the dataset size. Because of this, a much larger dataset, on a quiet database, should take about the exact same time. A very busy database may need more time, since an SST requires a “flush tables with read lock” at some point.

So, after their respective Ceph SST, the other two nodes have:

root@PXC2:~# mount | grep mysql /dev/rbd1 on /var/lib/mysql type xfs (rw,noatime,nodiratime,nouuid) root@PXC2:~# rbd showmapped id pool image snap device 1 mysqlpool PXC2-1463776424 - /dev/rbd1 root@PXC3:~# mount | grep mysql /dev/rbd1 on /var/lib/mysql type xfs (rw,noatime,nodiratime,nouuid) root@PXC3:~# rbd showmapped id pool image snap device 1 mysqlpool PXC3-1464118729 - /dev/rbd1

The original RBD image now has two snapshots that are mapped to the clones mounted by other two nodes:

root@PXC3:~# rbd -p mysqlpool ls PXC PXC2-1463776424 PXC3-1464118729 root@PXC3:~# rbd -p mysqlpool info PXC2-1463776424 rbd image 'PXC2-1463776424': size 10240 MB in 2560 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.108b4246146651 format: 2 features: layering flags: parent: mysqlpool/PXC@1463776423 overlap: 10240 MB

Discussion

Apart from allowing faster SST, what other benefits do we get from using Ceph with Percona XtraDB Cluster?

The first benefit is the inherent data duplication over the network removes the need for local data replication. Thus, instead of using raid-10 or raid-5 with an array of disks, we could use a simple raid-0 stripe set if the data is already replicated to more than one server.

The second benefit is a bit less obvious: you don’t need as much storage. Why? A Ceph clone only stores the delta from its original snapshot. So, for large, read intensive datasets, the disk space savings can be very significant. Of course, over time, the clone will drift away from its parent snapshot and will use more and more space. When we determine that a Ceph clone uses too much disk space, we can simply refresh the clone by restarting MySQL and forcing a full SST. The SST script will automatically drop the old clone and snapshot when the cephcleanup option is set, and it will create a new fresh clone. You can easily evaluate how much space is consumed by the clone using the following commands:

root@PXC2:~# rbd -p mysqlpool du PXC2-1463776424 warning: fast-diff map is not enabled for PXC2-1463776424. operation may be slow. NAME PROVISIONED USED PXC2-1463776424 10240M 164M

Also, nothing prevents you using a different configuration of Ceph pools in the same XtraDB cluster. Therefore a Ceph clone can use a different pool than its parent snapshot. That’s the whole purpose of the cephlocalpool parameter. Strictly speaking, you only need one node to use a replicated pool, as the other nodes could run on clones that are stored data in a non-replicated pool (saving a lot of storage space). Furthermore, we can define the OSD affinity of the non-replicated pool in a way that it stores data on the host where it is used, reducing the cross node network latency.

Using Ceph for XtraDB Cluster SST operation demonstrates one of the array of possibilities offered to MySQL by Ceph. We continue to work with the Red Hat team and Red Hat Ceph Storage architects to find new and useful ways of addressing database issues in the Ceph environment. There are many more posts to come, so stay tuned!

DISCLAIMER: The wsrep_sst_ceph script isn’t officially supported by Percona.

pt-online-schema-change causing transactions wait for AUTO-INC lock and crash server

Lastest Forum Posts - August 4, 2016 - 12:19pm
Fact:
-----------------
1. ~2 Million rows, 37G table
2. inserts only table
3. inserts 500~1000 rows per min

Problem:
-----------------
try to use pt-online-schema-change tool to add a column to an existing index but it is causing lots of transactions waiting for auto-inc lock and evantually causing server overloaded and shuts down.

we tried make chunk-time smaller but not help. the parameters we use are: --no-check-replication-filters --chunk-time 0.05 --sleep 1


Processlist:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2016-07-28 21:00:16 2b06d3430700TOO DEEP OR LONG SEARCH IN THE LOCK TABLE WAITS-FOR GRAPH, WE WILL ROLL BACK FOLLOWING TRANSACTION


*** TRANSACTION:
TRANSACTION 18459639428, ACTIVE 19 sec setting auto-inc lock
mysql tables in use 2, locked 2
3 lock struct(s), heap size 360, 0 row lock(s), undo log entries 2


REPLACE INTO xxxx

*** WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `my_db`.`_the_table_new` trx id 18459639428 lock mode AUTO-INC waiting
*** WE ROLL BACK TRANSACTION (2)


Question:
-------------
1. From processlist, we found there's a bulk insert ("INSERT LOW_PRIORITY IGNORE INTO ") that inserting about 800 rows but takes 173 secs. not sure why it take that long. any explanations?

2. It seems there is no need to hold auto-inc lock for `_the_table_new` because the rows copying from `_the_table_old` will always has the value for auto-increment column, right? any ways to prevent using auto-inc lock for the `_the_table_new`?

3. any suggestions for solving the issue?

Thanks!

Very slow database response. I suspect InnoDB .

Lastest Forum Posts - August 4, 2016 - 11:03am
Hello,

I have two database servers with almost the same hardware and software configurations. The load of the servers is almost equal. The problem is that the first server works as snail compared with the second server.

For example If I run this query on the problematic server:

Code: mysql> SELECT table_schema "Database Name", SUM(data_length+index_length)/1024/1024 "Database Size (MB)" FROM information_schema.TABLES GROUP BY table_schema; Before to see the result like this:

Code: +----------------------------+--------------------+ | Database Name | Database Size (MB) | +----------------------------+--------------------+ | activtra_b2fx | 260.60385895 | | activtra_coper | 0.06334019 | ................................................................................ | ziwitrad_wp99 | 4.09765244 | +----------------------------+--------------------+ 127 rows in set (34.59 sec) mysql>
I need to wait between 5 and 35 seconds. I don't know why sometimes the server returning the response for 5 seconds and why other time I need to wait more than 30 seconds.

The same query, but executed on the second server returning response with 93 rows for less than 1 second. Usually the responses are returned between 0.1 and 0.3 seconds.

As you can see the difference is colossal and I don't know why?

Before to execute the above query on problematic server I checked the load and everything look good. Below you can see the check results:


Code: mysql> show processlist; +--------+-----------+-----------+-----------+---------+------+-------+------------------+-----------+---------------+-----------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read | +--------+-----------+-----------+-----------+---------+------+-------+------------------+-----------+---------------+-----------+ | 103183 | eximstats | localhost | eximstats | Sleep | 38 | | NULL | 0 | 0 | 0 | | 103327 | root | localhost | NULL | Query | 0 | NULL | show processlist | 0 | 0 | 0 | +--------+-----------+-----------+-----------+---------+------+-------+------------------+-----------+---------------+-----------+ 2 rows in set (0.00 sec) mysql> Code: # top top - 20:54:13 up 259 days, 9:16, 1 user, load average: 0.50, 2.05, 2.16 Tasks: 309 total, 1 running, 307 sleeping, 0 stopped, 1 zombie Cpu(s): 2.6%us, 0.6%sy, 0.0%ni, 96.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 12191156k total, 10825468k used, 1365688k free, 1277832k buffers Swap: 5242872k total, 607648k used, 4635224k free, 5509472k cached Code: # ./tuning-primer.sh -- MYSQL PERFORMANCE TUNING PRIMER -- - By: Matthew Montgomery - MySQL Version 5.5.45-37.4-log x86_64 Uptime = 2 days 2 hrs 56 min 29 sec Avg. qps = 11 Total Questions = 2042286 Threads Connected = 2 Server has been running for over 48hrs. It should be safe to follow these recommendations To find out more information on how each of these runtime variables effects performance visit: http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html Visit http://www.mysql.com/products/enterprise/advisors.html for info about MySQL's Enterprise Monitoring and Advisory Service SLOW QUERIES The slow query log is enabled. Current long_query_time = 10.000000 sec. You have 352 out of 2042307 that take longer than 10.000000 sec. to complete Your long_query_time seems to be fine BINARY UPDATE LOG The binary update log is NOT enabled. You will not be able to do point in time recovery See http://dev.mysql.com/doc/refman/5.5/en/point-in-time-recovery.html WORKER THREADS Current thread_cache_size = 8 Current threads_cached = 6 Current threads_per_sec = 0 Historic threads_per_sec = 0 Your thread_cache_size is fine MAX CONNECTIONS Current max_connections = 151 Current threads_connected = 2 Historic max_used_connections = 24 The number of used connections is 15% of the configured maximum. Your max_connections variable seems to be fine. INNODB STATUS So far I'm seeing the above output in a split second. Then I wait additional 20-30 seconds before to continue with the below output of the command:

Code: Current InnoDB index space = 324 M Current InnoDB data space = 999 M Current InnoDB buffer pool free = 0 % Current innodb_buffer_pool_size = 512 M Depending on how much space your innodb indexes take up it may be safe to increase this value to up to 2 / 3 of total system memory MEMORY USAGE Max Memory Ever Allocated : 3.26 G Configured Max Per-thread Buffers : 3.90 G Configured Max Global Buffers : 2.64 G Configured Max Memory Limit : 6.54 G Physical Memory : 11.62 G Max memory limit seem to be within acceptable norms KEY BUFFER Current MyISAM index space = 203 M Current key_buffer_size = 2.00 G Key cache miss rate is 1 : 137 Key buffer free ratio = 78 % Your key_buffer_size seems to be fine QUERY CACHE Query cache is enabled Current query_cache_size = 128 M Current query_cache_used = 125 M Current query_cache_limit = 16 M Current Query cache Memory fill ratio = 98.16 % Current query_cache_min_res_unit = 4 K However, 2150 queries have been removed from the query cache due to lack of memory Perhaps you should raise query_cache_size MySQL won't cache query results that are larger than query_cache_limit in size SORT OPERATIONS Current sort_buffer_size = 2 M Current read_rnd_buffer_size = 256 K Sort buffer seems to be fine JOINS Current join_buffer_size = 16.00 M You have had 4748 queries where a join could not use an index properly join_buffer_size >= 4 M This is not advised You should enable "log-queries-not-using-indexes" Then look for non indexed joins in the slow query log. OPEN FILES LIMIT Current open_files_limit = 32930 files The open_files_limit should typically be set to at least 2x-3x that of table_cache if you have heavy MyISAM usage. Your open_files_limit value seems to be fine TABLE CACHE Current table_open_cache = 16384 tables Current table_definition_cache = 20480 tables You have a total of 6569 tables You have 12470 open tables. The table_cache value seems to be fine TEMP TABLES Current max_heap_table_size = 768 M Current tmp_table_size = 768 M Of 206436 temp tables, 30% were created on disk Perhaps you should increase your tmp_table_size and/or max_heap_table_size to reduce the number of disk-based temporary tables Note! BLOB and TEXT columns are not allow in memory tables. If you are using these columns raising these values might not impact your ratio of on disk temp tables. TABLE SCANS Current read_buffer_size = 8 M Current table scan ratio = 1249 : 1 read_buffer_size seems to be fine TABLE LOCKING Current Lock Wait ratio = 1 : 16787 Your table locking seems to be fine On the good working server the tuning-primer.sh have need only 2-3 seconds to complete with all output without any delays in InnoDB part.

This is my /etc/my.cnf on problematic server:

Code: [mysqld] socket=/tmp/mysql.sock #set-variable = max_connections=500 max_user_connections=16 log_slow_queries=/var/log/mysql-slow.log #log-slow-queries #safe-show-database join_buffer_size=16M max_allowed_packet=268435456 open_files_limit=32768 # Skip reverse DNS lookup of clients skip-name-resolve query_cache_size=128M query_cache_limit=16M key_buffer=2048M table_cache=16384 table_definition_cache=20480 tmp_table_size=768M max_heap_table_size=768M read_buffer_size=8M innodb_buffer_pool_size=512M innodb_file_per_table=1 thread_cache_size=8 low_priority_updates=1 [client] socket=/tmp/mysql.sock Do you have any ideas on what could be the reason for this delay?

IST receiver unable to bind address to public IP

Lastest Forum Posts - August 4, 2016 - 3:34am
In AWS ec2, applications are only allowed to listen private IP rather than public IP. Public IP is not visible on the host. Anything send to public IP will be forward to corresponding private IP. IST receiver will try to bind the address specified by wsrep_node_address. For galera clusters across data centers, we must set wsrep_node_address to node public IP address.
Here's the problem:
No NIC has public IP.
Setting wsrep_node_address to public IP address will make IST receiver tries to bind that IP and fails:

2016-08-04 06:06:16 19852 [Warning] WSREP: Failed to prepare for incremental state transfer: Failed to open IST listener at tcp://52.207.12.34:4568', asio error 'Cannot assign requested address': 99 (Cannot assign requested address) at galera/src/ist.cpprepare():326. IST will be unavailable.

Setting wsrep_node_address to private IP address will cause the donor node tries to access joiner node via private IP address, which is not possible.

We need to separate the listening address (private) and access adresss (public). Setting wsrep_node_incoming_address is not helping in this case. The donor still tries to access joiner via wsrep_node_address (private).

Difference between query_time and response_time

Lastest Forum Posts - August 3, 2016 - 10:33pm
Version | 5.5.45-37.4-log Percona Server (GPL), Release 37.4, Revision 042e02b

Hi,

I am a bit confused on the difference between query_time as reported in the mysql slow logs and the response time as reported by the Response Time Distribution (SHOW QUERY_RESPONSE_TIME.

The confusion stems from the observation that the mysql slow logs shows the maximum query_time as 347s (when checked with all logs).

While the distribution has entries
+----------------+-----------+----------------+
| | | |
+----------------+-----------+----------------+
| 0.000001 | 914 | 0.000000 |
| 0.000010 | 11716670 | 25.632509 |
| 0.000100 | 183210221 | 6030.333252 |
| 0.001000 | 30203398 | 8305.915448 |
| 0.010000 | 55028382 | 202486.117777 |
| 0.100000 | 51674821 | 1644526.564266 |
| 1.000000 | 7102115 | 2539651.042982 |
| 10.000000 | 278069 | 572907.555653 |
| 100.000000 | 6846 | 111699.020993 |
| 1000.000000 | 31 | 9921.227099 |
| 10000.000000 | 6 | 50372.869107 |
| 100000.000000 | 47 | 520054.762721 |

| 1000000.000000 | 0 | 0.000000 |
| TOO LONG | 0 | TOO LONG |
+----------------+-----------+----------------+

As seen from the highlighted (red color fonts) there are 6 +47 =53 queries that are above the range of the maximum query_time as reported in the slow logs

The above would either imply that the response_time and query_time from the two are different. If so what is the difference between them?
Or if they are the same, then why did the mysql slow log not report/log 6+47 =53 query information.

Testing Docker multi-host network performance

Latest MySQL Performance Blog posts - August 3, 2016 - 12:26pm

In this post, I’ll review Docker multi-host network performance.

In a past post, I tested Docker network. The MySQL Server team provided their own results, which are in line with my observations.

For this set of tests, I wanted to focus more on Docker networking using multiple hosts. Mostly because when we set up a high availability (HA) environment (using Percona XtraDB Cluster, for example) the expectation is that instances are running on different hosts.

Another reason for this test is that Docker recently announced the 1.12 release, which supports Swarm Mode. Swarm Mode is quite interesting by itself — with this release, Docker targets going deeper on Orchestration deployments in order to compete with Kubernetes and Apache Mesos. I would say Swarm Mode is still rough around the edges (expected for a first release), but I am sure Docker will polish this feature in the next few releases. Swarm Mode also expects that you run services on different physical hosts, and services

Swarm Mode also expects that you run services on different physical hosts, and services are communicated over Docker network. I wanted to see how much of a performance hit we get when we run over Docker network on multiple hosts.

Network performance is especially important for clustering setups like Percona XtraDB Cluster and  MySQL Group Replication (which just put out another Lab release).

For my setup, I used two physical servers connected over a 10GB network. Both servers use 56 cores total of Intel CPUs.

Sysbench setup: data fits into memory, and I will only use primary key lookups. Testing over the network gives the worst case scenario for network round trips, but it also gives a good visibility on performance impacts.

The following are options for Docker network:

  • No Docker containers (marked as “direct” in the following results)
  • Docker container uses “host” network (marked as “host”)
  • Docker container uses “bridge” network, where service port exposed via port forwarding (marked as “bridge”)
  • Docker container uses “overlay” network, both client and server are started in containers connected via overlay network (marked as “overlay” in the results). For “overlay” network it is possible to use third-party plugins, with different implementation of the network, the most known are:

For multi-host networking setup, only “overlay” (and plugins implementations) are feasible. I used “direct”, “host” and “bridge” only for the reference and as a comparison to measure the overhead of overlay implementations.

The results I observed are:

Client Server Throughput, tps Ratio to “direct-direct” Direct Direct 282780 1.0 Direct Host 280622 0.99 Direct Bridge 250104 0.88 Bridge Bridge 235052 0.83 overlay overlay 120503 0.43 Calico overlay Calico overlay 246202 0.87 Weave overlay Weave overlay 11554 0.044

 

Observations
  • “Bridge” network added overhead, about 12%, which is in line with my previous benchmark. I wonder, however, if this is Docker overhead or just the Linux implementation of bridge networks. Docker should be using the setup that I described in Running Percona XtraDB Cluster nodes with Linux Network namespaces on the same host, and I suspect that the Linux network namespaces and bridges add overhead. I need to do more testing to verify.
  • Native “Overlay” Docker network struggled from performance problems. I observed issues with ksoftirq using 100% of one CPU core, and I see similar reports. It seems that network interruptions in Docker “overlay” are not distributed properly across multiple CPUs. This is not the case with the “direct” and “bridge” configuration. I believe this is a problem with the Docker “overlay” network (hopefully, it will eventually be fixed).
  • Weave network showed absolutely terrible results. I see a lot of CPU allocated to “weave” containers, so I think there are serious scalability issues in their implementation.
  • Calico plugin showed the best result for multi-host containers, even better than “bridge-bridge” network setup
Conclusion

If you need to use Docker “overlay” network — which is a requirement if you are looking to deploy a multi-host environment or use Docker Swarm mode — I recommend you consider using the Calico network plugin for Docker. Native Docker “overlay” network can be used for prototype or quick testing cases, but at this moment it shows performance problems on high-end hardware.

 

Error when preparing a full backup

Lastest Forum Posts - August 3, 2016 - 8:29am
Hello,

A while ago my provider had a power outage which caused a mysql corruption here is what I did to fix it:
-I was able to start mysql using innodb_force_recovery = 2 or 3 can't really remember
-I moved my table files somewhere else and deleted all content inside mysql folder
-I started Mysql service and it create ibdata1 file etc..
-I then copied back the tables files and I was able to start mysql with innodb_force_recovery = 0 and everything seems to be normal.

Today I tried to take a backup of my Database using innobackupex, and it seems to be working as it returned the following
innobackupex: Connection to database server closed
innobackupex: completed OK!

However when I tried to prepare the backup that I just took, I got the following error: (using the command innobackupex --apply-log /data/backname/

160803 11:24:28 innobackupex: Starting ibbackup with command: xtrabackup_56 --defaults-file="/data/2016-08-03_11-22-40/backup-my.cnf" --defaults-group="mysqld" --prepare --target-dir=/data/2016-08-03_11-22-40 --tmpdir=/tmp

xtrabackup_56 version 2.1.9 for MySQL server 5.6.17 Linux (i686) (revision id: 746)
xtrabackup: cd to /data/2016-08-03_11-22-40
xtrabackup: This target seems to be not prepared yet.
xtrabackup: xtrabackup_logfile detected: size=2097152, start_lsn=(1369520463)
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:10M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 1
xtrabackup: innodb_log_file_size = 2097152
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:10M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 1
xtrabackup: innodb_log_file_size = 2097152
xtrabackup: Starting InnoDB instance for recovery.
xtrabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
InnoDB: Using atomics to ref count buffer pool pages
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Compressed tables use zlib 1.2.3
InnoDB: Not using CPU crc32 instructions
InnoDB: Initializing buffer pool, size = 100.0M
InnoDB: Completed initialization of buffer pool
InnoDB: Highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 1369520463
InnoDB: Database was not shutdown normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages
InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 1369523291 (0%)
InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 2016-08-03 11:24:28 ad7dfb70 InnoDB: Assertion failure in thread 2910714736 in file log0recv.cc line 1271
InnoDB: Failing assertion: !page || (ibool)!!page_is_comp(page) == dict_table_is_comp(index->table)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/...-recovery.html
InnoDB: about forcing recovery.
innobackupex: Error:
innobackupex: ibbackup failed at /usr/bin/innobackupex line 2560.

Any idea how can I fix this issue?

Thank you

How to prepare Streaming and Compressing Backups and Log Sequence Error

Lastest Forum Posts - August 3, 2016 - 12:00am
Hi everybody,

I'm using Streaming and Compressing Backups to perform my backup on a MariaDB 10.19:
Code: $ innobackupex --user=$db_user --password=$db_pass --compact --stream=tar ./ --no-timestamp | gzip - > $WORKING_DIR/$BCK_FILE All seems ok even when I restore the backup, innobackup output shows:

Code: ... innobackupex: Finished copying back files. 160801 13:52:31 innobackupex: completed OK! but when I start the mysqld I find the next message in its log:

Code: 2016-08-01 15:52:33 7f08d8ed87c0 InnoDB: Error: page 371 log sequence number 2340146 InnoDB: is in the future! Current system log sequence number 2337804. InnoDB: Your database may be corrupt or you may have copied the InnoDB InnoDB: tablespace but not the InnoDB log files. See InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html InnoDB: for more information. ... a lot of times! In some cases I can't initialize the mysql.

On the page 23 of the manual 'PerconaXtraBackup-2.1.9.pdf' you can read:
'Note that the streamed backup will need to be prepared before restoration. Streaming mode does not prepare the backup.'

I have no idea of how prepare the backup before the restoration in this situation.
In the manual explains how prepare it using --apply-log argument but the backups aren't performed using stream and compressing mode.

Anybody knows how prepare it correctly?
How can I retore it?
Should I use the --apply-log to prepare it? How?

Thanks for your time.

Best rgrds,
Boadrius

Take Percona’s One-Click Database Security Downtime Poll

Latest MySQL Performance Blog posts - August 2, 2016 - 3:12pm

Take Percona’s database security downtime poll.

As Peter Zaitsev mentioned recently in his blog post on database support, the data breach costs can hit both your business reputation and your bottom line. Costs vary depending on the company size and market, but recent studies estimate direct costs ranging in average from $1.6M to 7.01M. Everyone agrees leaving rising security risks and costs unchecked is a recipe for disaster.

Reducing security-based outages doesn’t have a simple answer, but can be a combination of internal and external monitoring, support contracts, enhanced security systems, and a better understanding of security configuration settings.

Please take a few seconds and answer the following poll. It will help the community get an idea of how security breaches can impact their critical database environments.

If you’ve faced  specific issues, feel free to comment below. We’ll post a follow-up blog with the results!

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

You can see the results of our last blog poll on high availability here.

High Availability Poll Results

Latest MySQL Performance Blog posts - August 2, 2016 - 3:11pm

This blog reports the results of Percona’s high availability poll.

High availability (HA) is always a hot topic. The reality is that if your data is not available, your customers cannot do business with you. In fact, estimates show the average cost of downtime is about $5K per minute. With an average outage taking 40 minutes to correct, you could be looking at a potential cost of $200K if your MySQL instance goes down. Whether your database is on premise, or in public or private clouds, it is critical that your database deployment does not have a potentially devastating single point of failure.

The results from Percona’s high availability poll responses are in:

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

With over 700 participants, MySQL replication was the clear frontrunner when it comes to high availability solutions.

Percona has HA solutions available, come find out more at our website.

If you’re using other solutions or have specific issues, feel free to comment below.

Check out the latest Percona one-click poll on database security here.

Not able to add query analytics using pmm-admin tool

Lastest Forum Posts - August 2, 2016 - 6:35am
Hello,
I installed pmm server and client on two different machines, following the instructions given here >> https://www.percona.com/doc/percona-...t/install.html
Both pmm-server and client got installed properly but I was not able to see the Queries in the Query Analytics window. The pmm server GUI gave errors (PFA).

Also on the client side on giving the command "sudo pmm-admin add queries --user abc --password xyz" gave the following error:

"Error adding queries: "service" failed: exit status 1, Starting pmm-queries-exporter-42001
Unable to start, see /var/log/pmm-queries-exporter-42001.log"

The content of the file pmm-queries-exporter-42001.log:

flag needs an argument: -pid-file
Usage of /usr/local/percona/qan-agent/bin/percona-qan-agent:
-basedir string
Agent basedir (default "/usr/local/percona/qan-agent")
-listen string
Agent interface address (default "127.0.0.1:9000")
-pid-file string
PID file (default "percona-agent.pid")
-ping
Ping API
-version
Print version

Also, on one other client I got the following error:


[MySQL] 2016/08/02 00:25:59 packets.go:118: write unix @->/var/run/mysqld/mysqld.sock: write: broken pipe
Cannot connect to MySQL: Error 1045: Access denied for user 'root'@'localhost' (using password: NO)

Please tell what I am doing wrong. Any help will be highly appreciated.

Thanks.



Introduction into storage engine troubleshooting: Q & A

Latest MySQL Performance Blog posts - August 1, 2016 - 3:43pm

In this blog, I will provide answers to the Q & A for the “Introduction into storage engine troubleshooting” webinar.

First, I want to thank everybody for attending the July 14 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:

Q: At which isolation level do pt-online-schema-change and pt-archive  copy data from a table?

A: Both tools do not change the server’s default transaction isolation level. Use either REPEATABLE READ or set it in my.cnf.

Q: Can I create an index to optimize a query which has group by A and order by B, both from different tables and A column is from the first table in the two table join?

A: Do you mean a query like SELECT ... FROM a, b GROUP BY a.A ORDER BY b.B ? Yes, this is possible:

mysql> explain select A, B, count(*) from a join b on(a.A=b.id) WHERE b.B < 4 GROUP BY a.A, b.B ORDER BY b.B ASC; +----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+ | 1 | SIMPLE | b | range | PRIMARY,B | B | 5 | NULL | 15 | Using where; Using index; Using temporary; Using filesort | | 1 | SIMPLE | a | ref | A | A | 5 | test.b.id | 1 | Using index | +----+-------------+-------+-------+---------------+------+---------+-----------+------+-----------------------------------------------------------+ 2 rows in set (0.00 sec)

Q: Where can I find recommendations on what kind of engine to use for different application types or use cases?

A: Storage engines are always being actively developed, therefore I suggest that you don’t search for generic recommendations. These can be outdated just a few weeks after they are written. Study engines instead. For example, just a few years ago MyISAM was the only engine (among those officially supported) that could work with FULLTEXT indexes and SPATIAL columns. Now InnoDB supports both: FULLTEXT indexes since version 5.6 and GIS features in 5.7. Today I can recommend InnoDB as a general-purpose engine for all installations, and TokuDB for write-heavy workloads when you cannot use high-speed disks.

Alternative storage engines can help to realize specific business needs. For example, CONNECT brings data to your server from many sources, SphinxSE talks to the Sphinx daemon, etc.

Other alternative storage engines increase the speed of certain workloads. Memory, for example, can be a good fit for temporary tables.

Q: Can you please explain how we find the full text of the query when we query the view ‘statements_with_full_table_Scans’?

A: Do you mean view in sys schema? Sys schema views take information from summary_* and digests it in Performance Schema, therefore it does not contain full queries (only digests). Full text of the query can be found in the events_statements_*  tables in the Performance Schema. Note that even the events_statements_history_long  table can be rewritten very quickly, and you may want to save data from it periodically.

Q: Hi is TokuDB for the new document protocol?

A: As Alex Rubin showed in his detailed blog post, the new document protocol just converts NoSQL queries into SQL, and is thus not limited to any storage engine. To use documents and collections, a storage engine must support generated columns (which TokuDB currently does not). So support of X Protocol for TokuDB is limited to relational tables access.

Q: Please comment on “read committed” versus “repeatable read.”
Q: Repeatable read holds the cursor on the result set for the client versus read committed where the cursor is updated after a transaction.

A: READ COMMITTED and REPEATABLE READ are transaction isolation levels, whose details are explained here.
I would not correlate locks set on table rows in different transaction isolation modes with the result set. A transaction with isolation level REPEATABLE READ  instead creates a snapshot of rows that are accessed by the transaction. Let’s consider a table:

mysql> create table ti(id int not null primary key, f1 int) engine=innodb; Query OK, 0 rows affected (0.56 sec) mysql> insert into ti values(1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9); Query OK, 9 rows affected (0.03 sec) Records: 9 Duplicates: 0 Warnings: 0

Then start the transaction and select a few rows from this table:

mysql1> begin; Query OK, 0 rows affected (0.00 sec) mysql1> select * from ti where id < 5; +----+------+ | id | f1 | +----+------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | +----+------+ 4 rows in set (0.04 sec)

Now let’s update another set of rows in another transaction:

mysql2> update ti set f1 = id*2 where id > 5; Query OK, 4 rows affected (0.06 sec) Rows matched: 4 Changed: 4 Warnings: 0 mysql2> select * from ti; +----+------+ | id | f1 | +----+------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | | 6 | 12 | | 7 | 14 | | 8 | 16 | | 9 | 18 | +----+------+ 9 rows in set (0.00 sec)

You see that the first four rows – which we accessed in the first transaction – were not modified, and last four were modified. If InnoDB only saved the cursor (as someone answered above) we would expect to see the same result if we ran SELECT * ...  query in our old transaction, but it actually shows whole table content before modification:

mysql1> select * from ti; +----+------+ | id | f1 | +----+------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | | 5 | 5 | | 6 | 6 | | 7 | 7 | | 8 | 8 | | 9 | 9 | +----+------+ 9 rows in set (0.00 sec)

So “snapshot”  is a better word than “cursor” for the result set. In the case of READ COMMITTED, the first transaction would see modified rows:

mysql1> drop table ti; Query OK, 0 rows affected (0.11 sec) mysql1> create table ti(id int not null primary key, f1 int) engine=innodb; Query OK, 0 rows affected (0.38 sec) mysql1> insert into ti values(1,1), (2,2), (3,3), (4,4), (5,5), (6,6), (7,7), (8,8), (9,9); Query OK, 9 rows affected (0.04 sec) Records: 9 Duplicates: 0 Warnings: 0 mysql1> set transaction isolation level read committed; Query OK, 0 rows affected (0.00 sec) mysql1> begin; Query OK, 0 rows affected (0.00 sec) mysql1> select * from ti where id < 5; +----+------+ | id | f1 | +----+------+ | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 4 | +----+------+ 4 rows in set (0.00 sec)

Let’s update all rows in the table this time:

mysql2> update ti set f1 = id*2; Query OK, 9 rows affected (0.04 sec) Rows matched: 9 Changed: 9 Warnings: 0

Now the first transaction sees both the modified rows with id >= 5 (not in the initial result set), but also the modified rows with id < 5 (which existed in the initial result set):

mysql1> select * from ti; +----+------+ | id | f1 | +----+------+ | 1 | 2 | | 2 | 4 | | 3 | 6 | | 4 | 8 | | 5 | 10 | | 6 | 12 | | 7 | 14 | | 8 | 16 | | 9 | 18 | +----+------+ 9 rows in set (0.00 sec)

After Dynamic expire_logs_days Change, Only Manual Flushing Works

Lastest Forum Posts - July 31, 2016 - 4:28am
Hello!

I've been using Percona Server 5.6 for years, absolutley pleased with it.

Because of some unanticipated increase in write activity, I dynamically changed the global expire_logs_days variable on Jul 27 from 5 days as I originally configured since startup (which has always worked fine) to a temporary new value of 2, then I successfully issued the query command FLUSH BINARY LOGS which removed the binlogs I expected fine, but since then the server is not flushing the binary logs on its own as I would expect (see the list below). The index file perfectly matches the actual directory contents (same file order, too) as I've used no OS commands outside of MySQL.

We're using standard MySQL asynchronous replication in a Master-Master setup, each Master has a Slave, and there is no replication lag at all.

5.6.31-77.0-log
Percona Server (GPL), Release 77.0, Revision 5c1061c
debian-linux-gnu (x86_64)
Ubuntu 14


Here's what MySQL currently says:

Code: mysql&gt; select @@expire_logs_days; +---------------------------+ | @@global.expire_logs_days | +---------------------------+ | 2 | +---------------------------+ 1 row in set (0.00 sec) mysql&gt; show binary logs; +------------------+------------+ | Log_name | File_size | +------------------+------------+ | mysql-bin.000012 | 1073741948 | | mysql-bin.000013 | 1073741882 | | mysql-bin.000014 | 1073742160 | | mysql-bin.000015 | 670091458 | | mysql-bin.000016 | 744025466 | +------------------+------------+ 5 rows in set (0.00 sec)
And here's my current binlog directory:

Code: drwxr-xr-x 2 mysql mysql 4.0K Jul 29 06:40 . drwxr-xr-x 4 mysql mysql 4.0K Jul 20 14:23 .. -rw-rw---- 1 mysql mysql 1.1G Jul 27 08:51 mysql-bin.000012 -rw-rw---- 1 mysql mysql 1.1G Jul 27 12:36 mysql-bin.000013 -rw-rw---- 1 mysql mysql 1.1G Jul 27 18:12 mysql-bin.000014 -rw-rw---- 1 mysql mysql 640M Jul 29 06:40 mysql-bin.000015 -rw-rw---- 1 mysql mysql 710M Jul 31 06:47 mysql-bin.000016 -rw-rw---- 1 mysql mysql 225 Jul 29 06:40 mysql-bin.index
So I guess my big question is, does MySQL respect the dynamically defined value of expire_logs_days for automatic flushing/rotation?

Thank you so much in advance for your kind feedback.

MariaDB 10.2 CHECK and DEFAULT clauses

Latest MySQL Performance Blog posts - July 29, 2016 - 12:35pm

In this blog post, we’ll look at the MariaDB 10.2 CHECK and DEFAULT clauses.

MariaDB 10.2 includes some long-awaited features. In this blog, we are going to discuss the improvements to some table definitions: the DEFAULT clause and the CHECK constraints. These clauses describe columns default values and rules for data validation.

Note that MariaDB 10.2 is still in alpha stage. This article describes the current state of these features, which could change before MariaDB 10.2 becomes GA.

The DEFAULT clause

The DEFAULT clause has always been supported in MariaDB/MySQL, but traditionally it only accepted literal values (like “hello world” or “2”). MariaDB 10.2 removes this limitation, so DEFAULT can now accept most SQL expressions. For example:

  • fiscal_year SMALLINT DEFAULT (YEAR(NOW()))
  • valid_until DATE DEFAULT (NOW() + INTERVAL 1 YEAR)
  • owner VARCHAR(100) DEFAULT (USER())

Additionally, MariaDB 10.2 allows you to set a DEFAULT value for the TEXT and BLOB columns. This was not possible in previous versions. While this might look like a small detail, it can be hard to add a column to an existing table that is used by production applications, if it cannot have a default value.

The DEFAULT clause has some very reasonable limitations. For example, it cannot contain a subquery or a stored function. An apparently strange limitation is that we can mention another column in DEFAULT only if it comes first in the CREATE TABLE command.

Note that DEFAULT can make use of non-deterministic functions even if the binary log uses the STATEMENT format. In this case, default non-deterministic values will be logged in the ROW format.

CHECK constraints

CHECK constraints are SQL expressions that are checked when a row is inserted or updated. If this expression result is false (0, empty string, empty date) or NULL, the statement will fail with an error. The error message states which CHECK failed in a way that is quite easy to parse:

ERROR 4022 (23000): CONSTRAINT `consistent_dates` failed for `test`.`author`

Some example of CHECK constraints:

  • CONSTRAINT non_empty_name CHECK (CHAR_LENGTH(name) > 0)
  • CONSTRAINT consistent_dates CHECK (birth_date IS NULL OR death_date IS NULL OR birth_date < death_date)
  • CONSTRAINT past_date CHECK (birth_date < NOW())

A possible trick is checking that a column is different from its default value. This forces users to assign values explicitly.

CHECK constraints cannot be added or altered. It is only possible to drop them. This is an important limitation for production servers.

Another limitation is that CHECK metadata are not accessible via the INFORMATION_SCHEMA. The only way to find out if a table has CHECK clauses is parsing the output of SHOW CREATE TABLE.

The exact behavior of CHECK constraints in a replication environment depends on the master binary log format. If it is STATEMENT, the slaves will apply CHECK constraints to events received from the master. If it is ROW, only the master will need to apply constraints, because failed statements will not be replicated.

Thus, in all cases, we recommend having identical constraints on master and slaves, and only using deterministic constraints.

Performance

While I didn’t run a professional benchmark, I can say that both DEFAULT and CHECK clauses don’t have a noticeable impact on a simple test where we insert one million rows (on my local machine).

However, these clauses evaluate an SQL expression each time a row is inserted or updated. The overhead is at least equal to the SQL expression performance. If high-performing writes are important, you will probably not want to use complex data validation.

To check how fast an expression is, we can use the BENCHMARK() function:

MariaDB [(none)]> SELECT BENCHMARK(10000000, (555 / 100 * 20)); +---------------------------------------+ | BENCHMARK(10000000, (555 / 100 * 20)) | +---------------------------------------+ | 0 | +---------------------------------------+ 1 row in set (1.36 sec) MariaDB [(none)]> SELECT BENCHMARK(100000000, MD5('hello world')); +------------------------------------------+ | BENCHMARK(100000000, MD5('hello world')) | +------------------------------------------+ | 0 | +------------------------------------------+ 1 row in set (14.84 sec)

In this example, we executed the specified expressions ten million times. BENCHMARK() always returns 0, but what we want to check is the execution time. We can see for example that evaluating MD5(‘hello world’) takes less than 0.000002 seconds. In some cases, we may want to retry the same expressions with different parameters (longer strings, higher numbers, etc.) to check if the execution times varies.

Unfortunately, we don’t have a status variable which tells us how many times MariaDB evaluated CHECK clauses. If our workload performs many writes, that variable could help us to find out if CHECK constraints are slowing down inserts. Maybe the MariaDB team can take this as a suggestion for the future.

MySQL Query Analytics configuration not found

Lastest Forum Posts - July 29, 2016 - 2:54am
Hi, all.
I've started queries:
pmm-admin list
queries host 42001 YES root:***@unix(/var/lib/mysql/mysql.sock) query_source=slowlog
mysql host 42002-42004 YES root:***@unix(/var/lib/mysql/mysql.sock)

But in QueryAnalytics I don't see any queries, instead, I see message:

QAN API error: "MySQL Query Analytics configuration not found".
Check the /var/log/qan-api.log file in docker container for more information.

But:
docker exec 1af5acecd98b cat /var/log/qan-api.log
Percona Query Analytics API is running...
2016/07/29 08:50:34 config.go:79: Loading /usr/local/percona/qan-api/src/github.com/percona/qan-api/conf/prod.conf
2016/07/29 08:50:34 config.go:119: API root dir /usr/local/percona/qan-api/src/github.com/percona/qan-api
INFO 2016/07/29 08:50:34 revel.go:206: Initialized Revel v0.12.0dev (2015-01-19) for >= go1.3
INFO 2016/07/29 08:50:34 main.go:31: Running revel server
Listening on 0.0.0.0:9001...
INFO 2016/07/29 09:30:28 ws.go:107: [Agent.Cmd] agent_id=2 0.0.9 http://host: connected

What did I wrong?

Can't start a joiner due to SST problems

Lastest Forum Posts - July 29, 2016 - 1:58am
I've successfully bootstraped a Cluster (mysqld Ver 5.6.30-76.3-56 for debian-linux-gnu on x86_64 (Percona XtraDB Cluster (GPL), Release rel76.3, Revision aa929cb, WSREP version 25.16, wsrep_25.16)).

When I start a second node there is a problem with SST via xtrabackup-v2:
Code: 2016-07-29 09:31:58 12359 [Note] WSREP: New cluster view: global state: 3f3843a3-54aa-11e6-af38-5b145f899741:195614, view# 4: Primary, number of nodes: 2, my index: 1, protocol version 3 2016-07-29 09:31:58 12359 [Warning] WSREP: Gap in state sequence. Need state transfer. 2016-07-29 09:31:58 12359 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address 'XYZ.40.39.13' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '12359' '' ' 2016-07-29 09:31:58 12359 [ERROR] execlp() failed: Permission denied 2016-07-29 09:31:58 12359 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address 'XYZ.40.39.13' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '12359' '' Read: '(null)' 2016-07-29 09:31:58 12359 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address 'XYZ.40.39.13' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '12359' '' : 1 (Operation not permitted) 2016-07-29 09:31:58 12359 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable. 2016-07-29 09:31:58 12359 [ERROR] Aborting There is no special config, just:
Code: wsrep_provider = /usr/lib/libgalera_smm.so wsrep_cluster_name = "legacy_cluster" wsrep_cluster_address = "gcomm://ZYX.4.123.242,XYZ.40.39.13" wsrep_node_name = legacy6 wsrep_node_address = XYZ.40.39.13 #wsrep_sst_method = xtrabackup-v2 #wsrep_sst_auth = sst:secret How could I further debug this problem? I've also tried SST with rsync, but got the same error messages.

Percona Monitoring and Management 1.0.2 Beta

Latest MySQL Performance Blog posts - July 28, 2016 - 12:39pm

Percona announces the release of Percona Monitoring and Management 1.0.2 Beta on 28 July 2016.

Like prior versions, PMM is distributed through Docker Hub and is free to download. Full instructions for download and installation of the server and client are available in the documentation.

Notable changes to the tool include:

  • Upgraded to Grafana 3.1.0.
  • Upgraded to Prometheus 1.0.1.
  • Set default metrics retention to 30 days.
  • Eliminated port 9001. Now the container uses only one configurable port, 80 by default.
  • Eliminated the need to specify ADDRESS variable when creating Docker container.
  • Completely re-wrote pmm-admin with more functions.
  • Added ability to stop all services using the new pmm-admin.
  • Added support to name instances using the new pmm-admin.
  • Query Analytics Application updates:
    • Redesigned queries profile table
    • Redesigned metrics table
    • Redesigned instance settings page
    • Added sparkline charts
    • Added ability to show more than ten queries
  • Various updates for MongoDB dashboards.

The full release notes are available in the documentation. The documentation also includes details on installation and architecture.

A demonstration of the tool has been set up at pmmdemo.percona.com.

We have also implemented forums for the discussion of PMM.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Some screen shots of the updates:

Note the new sparkline that shows the current load in context (so you know if the number is higher/normal/lower than normal), and the option to “Load next 10 queries” at the bottom of the listing.

Our admin tool was completely re-written with new functions:

pmm-admin –help output

 

pmm-admin list command output

 

pmm-admin check-network output, which provides information on the status of the client’s network connection to the server.

Upcoming Webinar August 2 10:00 am PDT: MySQL and Ceph

Latest MySQL Performance Blog posts - July 28, 2016 - 11:27am

Join Brent Compton, Kyle Bader and Yves Trudeau on August 2, 2016 at 10 am PDT (UTC-7) for a MySQL and Ceph webinar.

Many operators select OpenStack as their control plane of choice for providing both internal and external IT services. The OpenStack user survey repeatedly shows Ceph as the dominant backend for providing persistent storage volumes through OpenStack Cinder. When building applications and repatriating old workloads, developers are discovering the need to provide OpenStack infrastructure database services. Given MySQL’s ubiquity, and it’s reliance on persistent storage, it is of utmost importance to understand how to achieve the performance demanded by today’s applications. Databases like MySQL can be incredibly IO intensive, and Ceph offers a great opportunity to go beyond the limitations presented by a single scale-up system. Since Ceph provides a mutable object store with atomic operations, could MySQL store InnoDB pages directly in Ceph?

This talk reviews the general architecture of Ceph, and then discusses benchmark results from small to mid-size Ceph clusters. These benchmarks lead to the development of prescriptive guidance around tuning Ceph storage nodes (OSDs), the impact the amount of physical memory, and the presence of SSDs, high-speed networks or RAID controllers.

Click here to register now. Speakers: Brent Compton
Director Storage Solution Architectures, Red Hat
Brent Compton is Director Storage Solution Architectures at Red Hat. He leads the team responsible for building Ceph and Gluster storage reference architectures with Red Hat Storage partners. Before Red Hat, Brent was responsible for emerging non-volatile memory software technologies at Fusion-io. Previous enterprise software leadership roles include VP Product Management at Micromuse (now IBM Tivoli Netcool) and Product Marketing Director within HP’s OpenView software division. Brent also served as Director Middleware Development Platforms at the LDS Church and as CIO at Joint Commission International. Brent has a tight-knit family, and can be found on skis or a mountain bike whenever possible. Kyle Bader
Sr Solution Architect, Red Hat
Kyle Bader, a Red Hat senior architect, provides expertise in the design and operation of petabyte-scale storage systems using Ceph. He joined Red Hat as part of the 2014 Inktank acquisition. As a senior systems engineer at DreamHost, he helped implement, operate, and design Ceph and OpenStack-based systems for DreamCompute and DreamObjects cloud products. Yves Trudeau
Principal Architect
Yves is a Principal Consultant at Percona, specializing in MySQL High-Availability and scaling solutions. Before joining Percona in 2009, he worked as a senior consultant for MySQL AB and Sun Microsystems, assisting customers across North America with NDB Cluster and Heartbeat/DRBD technologies. Yves holds a Ph.D. in Experimental Physics from Université de Sherbrooke. He lives in Québec, Canada with his wife and three daughters.


General Inquiries

For general inquiries, please send us your question and someone will contact you.