]]>
]]>

You are here

Latest MySQL Performance Blog posts

Subscribe to Latest MySQL Performance Blog posts feed
Percona's MySQL & InnoDB performance and scalability blog
Updated: 26 min 23 sec ago

Using YUM to install specific MySQL/Percona Server versions

January 5, 2015 - 6:52am

Sometimes it is desired to use particular software versions in production, and not necessary the latest ones. There may be several reasons for that, where I think the most common is when a new version should spend some time in testing or a staging environment before getting to production. In theory each new version is supposed to be better as usually it contains a handful of bug fixes and even new or improved functionality. However there is also a risk of some regression or a new bug introduction as a side effect of code changes.

Quite often DBAs want the same MySQL version to be installed on all database instances, regardless of what actually is the latest version available in the software provider’s repository. There are several ways to achieve this:
* download specific version packages manually and then install them,
* have custom local repository mirror where you decide when and which version gets there, and just update from there using yum/apt,
* have database instance images with all software prepared,
* point to a particular version just using default package-management utility.

My idea was to remind about this last method as maybe the least known one.
In this article I will focus on YUM as it seems this is the only one currently offering multiple versions from official repositories of Oracle and Percona MySQL variants. APT theoretically is also able to install older versions, but command “apt-cache madison …” returns only the latest one for me. For example using Oracle repo:

root@ubuntu-14:~# apt-cache madison mysql-community-server mysql-community-server | 5.6.22-2ubuntu14.04 | http://repo.mysql.com/apt/ubuntu/ trusty/mysql-5.6 amd64 Packages mysql-community | 5.6.22-2ubuntu14.04 | http://repo.mysql.com/apt/ubuntu/ trusty/mysql-5.6 Sources

So let’s see how it looks like for YUM repositories. I have installed repositories from Oracle, MariaDB and Percona on Centos 6 test machine. This is what they offer for the main server package versions:

[root@localhost ~]# yum repolist repo id repo name status base CentOS-6 - Base 6,518 extras CentOS-6 - Extras 36 mariadb MariaDB 17 mysql-connectors-community MySQL Connectors Community 12 mysql-tools-community MySQL Tools Community 18 mysql56-community MySQL 5.6 Community Server 112 percona-release-noarch Percona-Release YUM repository - noarch 26 percona-release-x86_64 Percona-Release YUM repository - x86_64 432 updates CentOS-6 - Updates 530 repolist: 7,701 [root@localhost ~]# yum -q list available --showduplicates mysql-community-server.x86_64 Available Packages mysql-community-server.x86_64 5.6.15-1.el6 mysql56-community mysql-community-server.x86_64 5.6.16-1.el6 mysql56-community mysql-community-server.x86_64 5.6.17-4.el6 mysql56-community mysql-community-server.x86_64 5.6.19-2.el6 mysql56-community mysql-community-server.x86_64 5.6.20-4.el6 mysql56-community mysql-community-server.x86_64 5.6.21-2.el6 mysql56-community mysql-community-server.x86_64 5.6.22-2.el6 mysql56-community [root@localhost ~]# [root@localhost ~]# yum -q list available --showduplicates MariaDB-server.x86_64 Available Packages MariaDB-server.x86_64 10.0.15-1.el6 mariadb [root@localhost ~]# [root@localhost ~]# yum -q list available --showduplicates Percona-Server-server-56.x86_64 Available Packages Percona-Server-server-56.x86_64 5.6.13-rel61.0.461.rhel6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.14-rel62.0.483.rhel6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.15-rel63.0.519.rhel6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.16-rel64.0.el6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.16-rel64.1.el6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.16-rel64.2.el6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.17-rel65.0.el6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.17-rel66.0.el6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.19-rel67.0.el6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.20-rel68.0.el6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.21-rel69.0.el6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.21-rel70.0.el6 percona-release-x86_64 Percona-Server-server-56.x86_64 5.6.21-rel70.1.el6 percona-release-x86_64

So at least for both Oracle and Percona packages we can use yum to install several versions back (12 in case of Percona Server 5.6).

How can we do that? Let’s install Percona Server version 5.6.19. To get a full package name with it’s version, we join it’s name with version but the CPU family part needs to be removed or replaced to the end. So Percona-Server-server-56.x86_64 +  5.6.19-rel67.0.el6 -> Percona-Server-server-56-5.6.19-rel67.0.el6 or Percona-Server-server-56-5.6.19-rel67.0.el6.x86_64:

[root@localhost ~]# yum -q install Percona-Server-server-56-5.6.19-rel67.0.el6 Percona-Server-client-56-5.6.19-rel67.0.el6 Percona-Server-shared-56-5.6.19-rel67.0.el6 ============================================================================================================================= Package Arch Version Repository Size ============================================================================================================================= Installing: Percona-Server-client-56 x86_64 5.6.19-rel67.0.el6 percona-release-x86_64 6.8 M Percona-Server-server-56 x86_64 5.6.19-rel67.0.el6 percona-release-x86_64 19 M Percona-Server-shared-56 x86_64 5.6.19-rel67.0.el6 percona-release-x86_64 721 k Transaction Summary ============================================================================================================================= Install 3 Package(s) Is this ok [y/N]: y (...) [root@localhost ~]# rpm -qa|grep Percona Percona-Server-server-56-5.6.19-rel67.0.el6.x86_64 Percona-Server-client-56-5.6.19-rel67.0.el6.x86_64 Percona-Server-shared-56-5.6.19-rel67.0.el6.x86_64

But what if it happens that we have to revert to previous version? We can actually do that with YUM very quickly:

[root@localhost ~]# service mysql status SUCCESS! MySQL (Percona Server) running (1998) [root@localhost ~]# service mysql stop Shutting down MySQL (Percona Server).. SUCCESS! [root@localhost ~]# yum -q downgrade Percona-Server-server-56.x86_64 Percona-Server-client-56.x86_64 Percona-Server-shared-56.x86_64 ================================================================================================================================ Package Arch Version Repository Size ================================================================================================================================ Downgrading: Percona-Server-client-56 x86_64 5.6.17-rel66.0.el6 percona-release-x86_64 6.8 M Percona-Server-server-56 x86_64 5.6.17-rel66.0.el6 percona-release-x86_64 19 M Percona-Server-shared-56 x86_64 5.6.17-rel66.0.el6 percona-release-x86_64 720 k Transaction Summary ================================================================================================================================ Downgrade 3 Package(s) Is this ok [y/N]: y Giving mysqld 5 seconds to exit nicely (...) [root@localhost ~]# rpm -qa|grep Percona Percona-Server-shared-56-5.6.17-rel66.0.el6.x86_64 Percona-Server-server-56-5.6.17-rel66.0.el6.x86_64 Percona-Server-client-56-5.6.17-rel66.0.el6.x86_64

What if we want to downgrade, but let’s say few versions down? It is also possible with a single YUM command. By the way, after the last step we are on Percona Server 5.6.17 rel66.0 version and YUM nicely shows that when listing all available packages, see the screen shot below:

So let’s downgrade to Percona Server 5.6.15 rel63.0 – it’s actually very easy as the “downgrade” option also understands the version attribute:

[root@localhost ~]# yum -q downgrade Percona-Server-server-56-5.6.15-rel63.0.519.rhel6 Percona-Server-client-56-5.6.15-rel63.0.519.rhel6 Percona-Server-shared-56-5.6.15-rel63.0.519.rhel6 ===================================================================================================================================  Package                             Arch              Version                             Repository                         Size =================================================================================================================================== Downgrading:  Percona-Server-client-56            x86_64            5.6.15-rel63.0.519.rhel6            percona-release-x86_64            6.5 M  Percona-Server-server-56            x86_64            5.6.15-rel63.0.519.rhel6            percona-release-x86_64             18 M  Percona-Server-shared-56            x86_64            5.6.15-rel63.0.519.rhel6            percona-release-x86_64            691 k Transaction Summary =================================================================================================================================== Downgrade     3 Package(s) Is this ok [y/N]: y Giving mysqld 5 seconds to exit nicely (...)

We can do the same, or more complicated package operations, using the YUM transaction feature:

[root@localhost ~]# yum shell Loaded plugins: fastestmirror, security Setting up Yum Shell > remove Percona-Server-shared-56 Percona-Server-server-56 Percona-Server-client-56 Setting up Remove Process > install Percona-Server-server-56-5.6.15-rel63.0.519.rhel6 Percona-Server-client-56-5.6.15-rel63.0.519.rhel6 Percona-Server-shared-56-5.6.15-rel63.0.519.rhel6 (...) Setting up Install Process > run --> Running transaction check ---> Package Percona-Server-client-56.x86_64 0:5.6.15-rel63.0.519.rhel6 will be installed ---> Package Percona-Server-client-56.x86_64 0:5.6.17-rel66.0.el6 will be erased ---> Package Percona-Server-server-56.x86_64 0:5.6.15-rel63.0.519.rhel6 will be installed ---> Package Percona-Server-server-56.x86_64 0:5.6.17-rel66.0.el6 will be erased ---> Package Percona-Server-shared-56.x86_64 0:5.6.15-rel63.0.519.rhel6 will be obsoleting ---> Package Percona-Server-shared-56.x86_64 0:5.6.17-rel66.0.el6 will be erased (...) ================================================================================================================================ Package Arch Version Repository Size ================================================================================================================================ Installing: Percona-Server-client-56 x86_64 5.6.15-rel63.0.519.rhel6 percona-release-x86_64 6.5 M Percona-Server-server-56 x86_64 5.6.15-rel63.0.519.rhel6 percona-release-x86_64 18 M Percona-Server-shared-51 x86_64 5.1.73-rel14.12.624.rhel6 percona-release-x86_64 2.1 M replacing mysql-libs.x86_64 5.1.73-3.el6_5 Percona-Server-shared-56 x86_64 5.6.15-rel63.0.519.rhel6 percona-release-x86_64 691 k replacing mysql-libs.x86_64 5.1.73-3.el6_5 Removing: Percona-Server-client-56 x86_64 5.6.17-rel66.0.el6 @percona-release-x86_64 33 M Percona-Server-server-56 x86_64 5.6.17-rel66.0.el6 @percona-release-x86_64 86 M Percona-Server-shared-56 x86_64 5.6.17-rel66.0.el6 @percona-release-x86_64 3.4 M Transaction Summary ================================================================================================================================ Install 4 Package(s) Remove 3 Package(s) Total download size: 27 M Is this ok [y/N]: y (...) Removed: Percona-Server-client-56.x86_64 0:5.6.17-rel66.0.el6 Percona-Server-server-56.x86_64 0:5.6.17-rel66.0.el6 Percona-Server-shared-56.x86_64 0:5.6.17-rel66.0.el6 Installed: Percona-Server-client-56.x86_64 0:5.6.15-rel63.0.519.rhel6 Percona-Server-server-56.x86_64 0:5.6.15-rel63.0.519.rhel6 Percona-Server-shared-51.x86_64 0:5.1.73-rel14.12.624.rhel6 Percona-Server-shared-56.x86_64 0:5.6.15-rel63.0.519.rhel6 Replaced: mysql-libs.x86_64 0:5.1.73-3.el6_5 Finished Transaction > quit Leaving Shell [root@localhost ~]# rpm -qa|grep Percona Percona-Server-shared-56-5.6.15-rel63.0.519.rhel6.x86_64 Percona-Server-client-56-5.6.15-rel63.0.519.rhel6.x86_64 Percona-Server-shared-51-5.1.73-rel14.12.624.rhel6.x86_64 Percona-Server-server-56-5.6.15-rel63.0.519.rhel6.x86_64

The fact that we can use single command or single YUM transaction, instead of a manual rpm download and installation, is not the only advantage of this method. The other very important one is that we don’t have to worry about breaking package dependencies when we uninstall current version manually in order to install different one, as there may be many packages that depend on MySQL.

However, if anyone does “yum update” on this system, our packages will be upgraded to the latest version, which is what we probably don’t want to happen yet, and surely not without our supervision:

[root@localhost ~]# yum -q update =================================================================================================================================== Package Arch Version Repository Size =================================================================================================================================== Updating: Percona-Server-client-56 x86_64 5.6.21-rel70.1.el6 percona-release-x86_64 6.4 M Percona-Server-server-56 x86_64 5.6.21-rel70.1.el6 percona-release-x86_64 19 M Percona-Server-shared-56 x86_64 5.6.21-rel70.1.el6 percona-release-x86_64 721 k Transaction Summary =================================================================================================================================== Upgrade 3 Package(s) Is this ok [y/N]: N Exiting on user Command

To prevent that, we should lock our packages on current version. To achieve that, we need yum-plugin-versionlock package which allows us to do this:

[root@localhost ~]# yum versionlock Percona-Server-server-56 Percona-Server-client-56 Percona-Server-shared-56 Loaded plugins: fastestmirror, security, versionlock Adding versionlock on: 0:Percona-Server-server-56-5.6.15-rel63.0.519.rhel6 Adding versionlock on: 0:Percona-Server-client-56-5.6.15-rel63.0.519.rhel6 Adding versionlock on: 0:Percona-Server-shared-56-5.6.15-rel63.0.519.rhel6 versionlock added: 3 [root@localhost ~]# yum update Loaded plugins: fastestmirror, security, versionlock Setting up Update Process (...) No Packages marked for Update

Now these packages cannot be updated unless you clear the locks with

yum versionlock clear

(clear any locks) or particular locks:

[root@localhost ~]# yum -q versionlock list 0:Percona-Server-server-56-5.6.15-rel63.0.519.rhel6.* 0:Percona-Server-client-56-5.6.15-rel63.0.519.rhel6.* 0:Percona-Server-shared-56-5.6.15-rel63.0.519.rhel6.* [root@localhost ~]# yum versionlock delete '0:Percona-Server-client-56-5.6.15-rel63.0.519.rhel6.*' Loaded plugins: fastestmirror, security, versionlock Deleting versionlock for: 0:Percona-Server-client-56-5.6.15-rel63.0.519.rhel6.* versionlock deleted: 1 [root@localhost ~]# yum -q versionlock list 0:Percona-Server-server-56-5.6.15-rel63.0.519.rhel6.* 0:Percona-Server-shared-56-5.6.15-rel63.0.519.rhel6.* [root@localhost ~]# yum -q update ===================================================================================================================================  Package                              Arch               Version                          Repository                          Size =================================================================================================================================== Updating:  Percona-Server-client-56             x86_64             5.6.21-rel70.1.el6               percona-release-x86_64             6.4 M Transaction Summary =================================================================================================================================== Upgrade       1 Package(s) Is this ok [y/N]:

I think the fact that you can install particular MySQL and Percona Server versions using YUM lets you simplify software management tools recipes, like Chef, Puppet or Ansible. For example, the Chef’s yum_package resource has version attribute, and an example recipe for Percona Server 5.6.20 installation may look like this:

pkgs = ["Percona-Server-client-56","Percona-Server-shared-56","Percona-Server-server-56","Percona-Server-56-debuginfo"] pkgs.each do |pkg| yum_package pkg do version "5.6.20-rel68.0.el6" allow_downgrade true end end

Hopefully we will see similar functionality from APT MySQL repositories eventually.

The post Using YUM to install specific MySQL/Percona Server versions appeared first on MySQL Performance Blog.

The MySQL Query Cache: How it works, plus workload impacts (good and bad)

January 2, 2015 - 12:00am

Query caching is one of the prominent features in MySQL and a vital part of query optimization. It is important to know how it works as it has the potential to cause significant performance improvements – or a slowdown – of your workload.

The MySQL query cache is a global one shared among the sessions. It caches the select query along with the result set, which enables the identical selects to execute faster as the data fetches from the in memory. It is important to have everything identical, no new comments, spaces, or most significantly differences in the WHERE clause. Basically when you trigger a select query, if it is available in the cache; it fetches from there or it considers the query as a new one and will go to the parser.

Even though it has some nice advantages, the MySQL query cache has its own downsides too. Well, let’s think about this: If you are frequently updating the table, you are then invalidating the query cache for ALL queries cached for that table. So really, anytime you have a “frequently updated table” means you’re probably not going to get any sort of good usage from the query cache. See the below example.

mysql> SHOW STATUS LIKE "qcache%"; +-------------------------+---------+ | Variable_name | Value | +-------------------------+---------+ | Qcache_free_blocks | 1 | | Qcache_free_memory | 1031320 | | Qcache_hits | 0 | | Qcache_inserts | 0 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 1 | | Qcache_queries_in_cache | 0 | | Qcache_total_blocks | 1 | +-------------------------+---------+ 8 rows in set (0.06 sec) mysql> select * from d.t1; 405 rows in set (0.05 sec) mysql> select * from d.t1 where id=88995159; 1 row in set (0.00 sec) mysql> SHOW STATUS LIKE "qcache%"; +-------------------------+---------+ | Variable_name | Value | +-------------------------+---------+ | Qcache_free_blocks | 1 | | Qcache_free_memory | 1020600 | | Qcache_hits | 0 | | Qcache_inserts | 2 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 1 | | Qcache_queries_in_cache | 2 | | Qcache_total_blocks | 6 | +-------------------------+---------+ 8 rows in set (0.00 sec)

From the above  we are sure the queries are cached. Let us try an insert and see the status, it will invalidate the query cache and reclaim the memory.

mysql> insert into d.t1 (data)value('Welcome'); Query OK, 1 row affected (0.05 sec) mysql> SHOW STATUS LIKE "qcache%"; +-------------------------+---------+ | Variable_name | Value | +-------------------------+---------+ | Qcache_free_blocks | 1 | | Qcache_free_memory | 1031320 | | Qcache_hits | 0 | | Qcache_inserts | 2 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 1 | | Qcache_queries_in_cache | 0 | | Qcache_total_blocks | 1 | +-------------------------+---------+ 8 rows in set (0.00 sec)

Now let us think about how to decide the query cache size:

To exemplify:-  I am having a mysql instance with two tables “t” and “t1″. Table “t” is with numerous records and “t1″ is with a fewer records. Let’s restart the mysql and see the query cache details.

mysql> show variables like 'query_cache_size'; +------------------+---------+ | Variable_name | Value | +------------------+---------+ | query_cache_size | 1048576 | +------------------+---------+ 1 row in set (0.00 sec) mysql> SHOW STATUS LIKE "qcache%"; +-------------------------+---------+ | Variable_name | Value | +-------------------------+---------+ | Qcache_free_blocks | 1 | | Qcache_free_memory | 1031320 | | Qcache_hits | 0 | | Qcache_inserts | 0 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 1 | | Qcache_queries_in_cache | 0 | | Qcache_total_blocks | 1 | +-------------------------+---------+ 8 rows in set (0.01 sec)

From the above status note the below four points.

1) There is around 1 MB free space with Qcache.

2) The queries in Qcache are zero.

3) There is no Qcache hits.

4) Qcache lowmem prunes is zero.

mysql> select * from d.t1; 405 rows in set (0.03 sec) mysql> SHOW STATUS LIKE "qcache%"; +-------------------------+---------+ | Variable_name | Value | +-------------------------+---------+ | Qcache_free_blocks | 1 | | Qcache_free_memory | 1021624 | | Qcache_hits | 0 | | Qcache_inserts | 1 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 1 | | Qcache_queries_in_cache | 1 | | Qcache_total_blocks | 4 | +-------------------------+---------+ 8 rows in set (0.01 sec)

From the aforesaid status it is clear the query has been cached and it should execute much faster in the next try and increase the Qcache hits status variable by one.

mysql> select * from d.t1; 405 rows in set (0.00 sec). mysql> SHOW STATUS LIKE "%Qcache_hits%"; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Qcache_hits | 1 | +---------------+-------+ 1 row in set (0.00 sec)

Now let us see how the data is getting pruned from the Qcache. For this I will execute a select on table “t” which is having massive records.

mysql> select * from d.t where id > 78995159; mysql> SHOW STATUS LIKE "Qcache_lowmem_prunes"; +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | Qcache_lowmem_prunes | 1 | +----------------------+-------+ 1 row in set (0.00 sec) mysql> select * from d.t1; 405 rows in set (0.02 sec) mysql> SHOW STATUS LIKE "qcache%"; +-------------------------+---------+ | Variable_name | Value | +-------------------------+---------+ | Qcache_free_blocks | 1 | | Qcache_free_memory | 1021624 | | Qcache_hits | 1 | | Qcache_inserts | 2 | | Qcache_lowmem_prunes | 1 | | Qcache_not_cached | 6 | | Qcache_queries_in_cache | 1 | | Qcache_total_blocks | 4 | +-------------------------+---------+ 8 rows in set (0.01 sec)

The Qcache_lowmem_prunes is the status variable which indicates how many times MySQL had to clear/prune some data from the Qcache to make space for the outputs of other queries. We need to observe the Qcache_lowmem_prunes  status variable and try to increase/adjust the size of the cache till we get a very low value ratio for the variable.

It is also undesirable to keep the query cache relatively high value at 256 MB as the Qcache invalidation becomes costly. For details, Peter Zaitsev wrote about this a few years ago in a post that’s still relevant today titled, “Beware large Query_Cache sizes.”

Contention often makes query cache the bottleneck instead of help when you have many CPU cores. Generally, query cache should be off unless proven useful for your workload. So it is important to know your environment well to enable the query cache and to decide what the query cache size should be.

There will also be circumstances where there is no chance of identical selects and in this case it is important to set the query_cache_size and query_cache_type variable to zero. The query_cache_type variable controls the query cache and  setting the query_cache_type to zero will reduce the significant overhead in query execution. On a highly concurrent environment there are chances of query cache mutex, which may become the source of a bottleneck. Setting the query_cache_type to zero will avoid the query cache mutex, as the query cache cannot be enabled at runtime which reduces the overhead in query execution. Please go through the details of QUERY CACHE ENHANCEMENTS with Percona Server.

The post The MySQL Query Cache: How it works, plus workload impacts (good and bad) appeared first on MySQL Performance Blog.

How small changes impact complex systems – MySQL example

December 31, 2014 - 3:00am

If you’ve been studying complex systems you know what minor changes might cause consequences of much greater proportions, sometimes causing some effects that are not easily explained at first. I recently ran across a great illustration of such behavior while doing MySQL benchmarks which I thought would be interesting to share.

I’m using a very simple benchmark – Sysbench 0.5 on Percona Server 5.6.21-70.1 just running update queries:

sysbench --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=pareto --oltp-table-size=1000000000 --mysql-user=root --mysql-password= --mysql-db=sbinnodb --test=/usr/share/doc/sysbench/tests/db/update_index.lua run

Some people frown upon such benchmarks due to their triviality and being irrelevant to workloads. I like them because they often allow you to study already complex system behavior in a much more controlled environment than “real” workloads – and so you can understand internal dynamics better.

For this test I set innodb_purge_threads=8, innodb_max_purge_lag=1000000, innodb_max_purge_lag_delay=5000 as otherwise chances are that the history will just be growing forever not allowing the system to reach the steady state. I wrote about InnoDB history problems in detail recently in a post titled “InnoDB transaction history often hides dangerous ‘debt’.

The only change I was looking for is changing innodb_flush_log_at_trx_commit from 1 to 2. I then went back to 1 a few hours later, after the system had reached the steady state. What I would normally expect in such a case is a performance increase because there is less work to do on commit and that is pretty much it. Looking at the graphs however I can see much more changes in the system. Let me go through them and explain the ones which I can:

The throughput graph is probably the most interesting here. We can see that indeed the throughput has substantially improved when set innodb_flush_log_at_trx_commit=2 and then went back to the old one after it was set back to 1. We see however in 2 some interesting effects here – a spike of a few minutes and later a dip below the original performance level for a few hours, until finally recovery back to the baseline. I’ll explain this with different graphs.

CPU usage is interesting, too. We can see there is user CPU increase correlating to the throughput increase but also a drastic IOWait increase. Why would it increase when we’re actually making workload less IO bound by not flushing log as much? The reason is because there are more threads now that can perform IO concurrently, hence increasing what is reporting IO wait and deceasing idle time.

InnoDB writing much more data is expected and it very well matches throughput. Not much of anything unexpected.

As we see setting innodb_flush_log_at_trx_commit=2 causes the amount of dirty pages in the buffer pool to go down significantly. It recovers when it is put back. The reason I suspect in this case is the undo space – as we will see in later graph history length spikes quickly – obviously requiring some space in the buffer pool.

This is perhaps one of the most interesting graphs here. You can see for some reason when log flush is enabled that purge threads are able to keep up. But if you disable it they cannot, which makes history grow very quickly until the max allowed amount. This is by the way the time we see the spike in the throughput as there is no throttling being done to keep history in check. This also explains why after I set innodb_flush_log_at_trx_commit back to 1 that performance first drops well below original levels – InnoDB needs to go through and purge the history, which takes resources. Another interesting thing you can notice is how uneven history cleanup is. Specifically you can note some 75% percent of the history has been cleaned up almost instantly while initial 25% took a lot of time to work out. I do not have an exact explanation here yet I believe it has to do with the fact initially history purging requires a significant amount of IO while at a certain point in time a purge happens completely in the buffer pool and hence being much faster.

Why would purge threads have a hard time keeping up when log flush is disabled? I suspect there are 2 reasons. First, the system can provide much more throughput in such configuration (not keeping purging into account) while purge subsystem is limited and can’t keep up with such a high load. The other reason is – eliminating serialization on the log flush exposes other parts of InnoDB to contention which further suppresses purge system progress.

This is another relevant graph to show the throttling done because purging can’t keep up. As you see we got about 2.5ms delay injected for updates to give purging some time to keep up. It is good to watch it to understand if the max delay you specified is enough to prevent purge lag while still avoid having the system grind to a halt. Had I set innodb_max_purge_lag_delay to 1000 it would not be enough to prevent history length growing to infinity, having I set it to 100000 the system could grind to a halt in case there is some long-running transaction causing history length to grow to about 1 million.

Lets look at Physical IO – it is also quite interesting. As you can see in the first and last parts of the graph there are more pages flushed than pages read, while when we disable log flushing we get much more reads than writes which only stops after history purging catch up is complete. I find this very interesting – this means that working with even modest history – 1M in this case causes a lot of IO – I assume because due to access pattern to the undo space it is constantly being washed out from the buffer pool. This graph is a very good illustration why large history might be a death spiral, as once it is there the system needs to do much more work.

This IO utilization graph might be a bit hard to understand but what it generally shows is that during the time the hard drive has been 100% utilized, there was still at least one outstanding IO to serve. However the concurrency was different with more average IO requests outstanding when we had log flushing disabled. This makes sense. We can also see it was highest when we enabled log flushing back but there was history to go through and catch up.

Changes to contention are very curious – we can see there is a lot more spin rounds and specifically shared lock spin rounds when we have log flush disabled. This is what I believe comes from less serialization on the log flushing causes a lot more internal contention.

It might not be very easy to see but this graph shows that innodb_row_lock_times spikes 1000x when log flushing is disabled. Why would that be? I suspect the reason is injected delay for purging. I also suspect the design of this throttling so it only is done when an actual change is attempted to be made, this is where update statement already holds some locks, which causes more row lock contention for other statements.

InnoDB has something called Server Activity count. As you can see from this graph it is not really a good indicator – for some reason it shows a lower value when the server was doing much more work. I do not understand other than this variable is incremented enough to explain it but find it curious.

LRU flushing also is drastically different between enabled and disabled log flushing. We get a lot more calls to search free pages and a lot more scans. I’m not sure why exactly this is happening but I think this has something to do with undo space caching which is constantly washed out from buffer pool which increased IO seems to indicate.

Finally lets look at perhaps the most counter-intuitive result – the amount of pending pwrites to the log file – we allowed to cache more log files in the OS cache but the amount of waits on writes to the log files actually increased? I’m not very sure why this is happening – perhaps there is a lot of IO pressure on OS so it does not delay writes to the log files for long time?

This is the same graph showing log fsyncs and log pwrites together. We can see when we set innodb_flush_log_at_trx_commit=2 that we see a lot more waits on pwrites, while less waits on fsync (which now should only be happening once per second or so). In general these two graphs tell me in some cases assuming log writes with innodb_flush_log_at_trx_commit=2 will be “free” because the operating system will instantly acknowledge them and put in the cache is not always correct.

Summary: The main point I’m looking to illustrate here is what changes you’re doing to the MySQL configuration (same can be said about your application OS configuration, hardware, etc.) can cause much larger changes to system behavior than you would guess. If you have the time it is good to validate whenever there are any other changes to system behavior beyond what you would think. If not at least keep an open mind and assume what any change, even a small one, can cause dramatic system behavior changes.

If you want to explore similar graphs check out Percona Cloud Tools – it’s easy to get going and allows you to build graphs like these absolutely free and perform a lot of in depth analyses. Now in free beta, Percona Cloud Tools is a hosted service providing access to query performance insights for all MySQL uses.

The post How small changes impact complex systems – MySQL example appeared first on MySQL Performance Blog.

Diagnosing SST errors with Percona XtraDB Cluster for MySQL

December 30, 2014 - 12:00am

State Snapshot Transfer (SST) is used in Percona XtraDB Cluster (PXC) when a new node joins the cluster or to resync a failed node if Incremental State Transfer (IST) is no longer available. SST is triggered automatically but there is no magic: If it is not configured properly, it will not work and new nodes will never be able to join the cluster. Let’s have a look at a few classic issues.

Port for SST is not open

The donor and the joiner communicate on port 4444, and if the port is closed on one side, SST will always fail.

You will see in the error log of the donor that SST is started:

[...] 141223 16:08:48 [Note] WSREP: Node 2 (node1) requested state transfer from '*any*'. Selected 0 (node3)(SYNCED) as donor. 141223 16:08:48 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 6) 141223 16:08:48 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 141223 16:08:48 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6'' [...]

But then nothing happens, and some time later you will see a bunch of errors:

[...] 2014/12/23 16:09:52 socat[2965] E connect(3, AF=2 192.168.234.101:4444, 16): Connection timed out WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 0 1 (20141223 16:09:52.057) WSREP_SST: [ERROR] Cleanup after exit with status:32 (20141223 16:09:52.064) WSREP_SST: [INFO] Cleaning up temporary directories (20141223 16:09:52.068) 141223 16:09:52 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6' [...]

On the joiner side, you will see a similar sequence: SST is started, then hangs and is finally aborted:

[...] 141223 16:08:48 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 6) 141223 16:08:48 [Note] WSREP: Requesting state transfer: success, donor: 0 141223 16:08:49 [Note] WSREP: (f9560d0d, 'tcp://0.0.0.0:4567') turning message relay requesting off 141223 16:09:52 [Warning] WSREP: 0 (node3): State transfer to 2 (node1) failed: -32 (Broken pipe) 141223 16:09:52 [ERROR] WSREP: gcs/src/gcs_group.cpp:long int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():717: Will never receive state. Need to abort.

The solution is of course to make sure that the ports are open on both sides.

SST is not correctly configured

Sometimes you will see an error like this on the donor:

141223 21:03:15 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.102:4444/xtrabackup_sst' --auth 'sstuser:s3cretzzz' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid 'e63f38f2-8ae6-11e4-a383-46557c71f368:0'' [...] WSREP_SST: [ERROR] innobackupex finished with error: 1. Check /var/lib/mysql//innobackup.backup.log (20141223 21:03:26.973)

And if you look at innobackup.backup.log:

41223 21:03:26 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'sstuser' (using password: YES). innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 2995 main::mysql_connect('abort_on_error', 1) called at /usr//bin/innobackupex line 1530 innobackupex: Error: Failed to connect to MySQL server: DBI connect(';mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock','sstuser',...) failed: Access denied for user 'sstuser'@'localhost' (using password: YES) at /usr//bin/innobackupex line 2979

What happened?

The default SST method is xtrabackup-v2 and for it to work, you need to specify a username/password in the my.cnf file:

[mysqld] wsrep_sst_auth=sstuser:s3cret

And you also need to create the corresponding MySQL user:

mysql> GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost' IDENTIFIED BY 's3cret';

So you should check that the user has been correctly created in MySQL and that wsrep_sst_auth is correctly set.

Galera versions do not match

Here is another set of errors you may see in the error log of the donor:

141223 21:14:27 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 141223 21:14:30 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 141223 21:14:33 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101

Here the issue is that you try to connect a node using Galera 2.x and a node running Galera 3.x. This can happen if you try to use a PXC 5.5 node and a PXC 5.6 node.

The right solution is probably to understand why you ended up with such inconsistent versions and make sure all nodes are using the same Percona XtraDB Cluster version and Galera version.

But if you know what you are doing, you can also instruct the node using Galera 3.x that it will communicate with Galera 2.x nodes by specifying in the my.cnf file:

[mysqld] wsrep_provider_options="socket.checksum=1"

Conclusion

SST errors can have multiple reasons for occurring, and the best way to diagnose the issue is to have a look at the error log of the donor and the joiner. Galera is in general quite verbose so you can follow the progress of SST on both nodes and see where it fails. Then it is mostly about being able to interpret the error messages.

The post Diagnosing SST errors with Percona XtraDB Cluster for MySQL appeared first on MySQL Performance Blog.

How well does your table fit in the InnoDB buffer pool in MySQL 5.6+?

December 29, 2014 - 6:31am

Some time ago, Peter Zaitsev posted a blog titled “How well does your table fits in innodb buffer pool?” He used some special INFORMATION_SCHEMA tables developed for Percona Server 5.1 to report how much of each InnoDB table and index resides in your buffer pool.

As Peter pointed out, you can use this view into the buffer pool to watch a buffer pool warm up with pages as you run queries. You can also use it for capacity planning. If you expect some tables need to be fully loaded in the buffer pool to be used efficiently, but the buffer pool isn’t large enough to hold them, then it’s time to increase the size of the buffer pool.

The problem, however, was that system tables change from version to version. Specifically, the INNODB_BUFFER_POOL_PAGES_INDEX table no longer exists in Percona Server 5.6, and the INNODB_INDEX_STATS table changed some column names in Percona Server 5.5.8, and the table no longer exists in Percona Server 5.6. So many of the comments on Peter’s blog rightly pointed out that the example query didn’t work on subsequent versions of Percona Server. And MySQL Community Edition at the time didn’t have the feature at all. They asked for an update to the blog post.

So here’s an updated, simplified query to report the content of your buffer pool, tested on the most recent versions.

Percona Server 5.1 and 5.5:

USE information_schema; SET @page_size = @@innodb_page_size; SET @bp_pages = @@innodb_buffer_pool_size/@page_size; SELECT P.TABLE_NAME, P.PAGE_TYPE, CASE WHEN P.INDEX_NAME IS NULL THEN NULL WHEN P.TABLE_NAME LIKE '`SYS_%' THEN P.INDEX_NAME WHEN P.INDEX_NAME <> 'PRIMARY' THEN 'SECONDARY' ELSE 'PRIMARY' END AS INDEX_TYPE, COUNT(DISTINCT P.PAGE_NUMBER) AS PAGES, ROUND(100*COUNT(DISTINCT P.PAGE_NUMBER)/@bp_pages,2) AS PCT_OF_BUFFER_POOL, CASE WHEN P.TABLE_NAME IS NULL THEN NULL WHEN P.TABLE_NAME LIKE 'SYS_%' THEN NULL ELSE ROUND(100*COUNT(DISTINCT P.PAGE_NUMBER)/CASE P.INDEX_NAME WHEN 'PRIMARY' THEN TS.DATA_LENGTH/@page_size ELSE TS.INDEX_LENGTH/@page_size END, 2) END AS PCT_OF_INDEX FROM INNODB_BUFFER_PAGE AS P JOIN INNODB_SYS_TABLES AS T ON P.SPACE = T.SPACE JOIN TABLES AS TS ON (T.SCHEMA, T.NAME) = (TS.TABLE_SCHEMA, TS.TABLE_NAME) WHERE TS.TABLE_SCHEMA <> 'mysql' GROUP BY TABLE_NAME, PAGE_TYPE, INDEX_TYPE;

MySQL 5.6 and 5.7 (this also works on Percona Server 5.6):

USE information_schema; SET @page_size = @@innodb_page_size; SET @bp_pages = @@innodb_buffer_pool_size/@page_size; SELECT P.TABLE_NAME, P.PAGE_TYPE, CASE WHEN P.INDEX_NAME IS NULL THEN NULL WHEN P.TABLE_NAME LIKE '`SYS_%' THEN P.INDEX_NAME WHEN P.INDEX_NAME <> 'PRIMARY' THEN 'SECONDARY' ELSE 'PRIMARY' END AS INDEX_TYPE, COUNT(DISTINCT P.PAGE_NUMBER) AS PAGES, ROUND(100*COUNT(DISTINCT P.PAGE_NUMBER)/@bp_pages,2) AS PCT_OF_BUFFER_POOL, CASE WHEN P.TABLE_NAME IS NULL THEN NULL WHEN P.TABLE_NAME LIKE 'SYS_%' THEN NULL ELSE ROUND(100*COUNT(DISTINCT P.PAGE_NUMBER)/CASE P.INDEX_NAME WHEN 'PRIMARY' THEN TS.DATA_LENGTH/@page_size ELSE TS.INDEX_LENGTH/@page_size END, 2) END AS PCT_OF_INDEX FROM INNODB_BUFFER_PAGE AS P JOIN INNODB_SYS_TABLES AS T ON P.SPACE = T.SPACE JOIN TABLES AS TS ON T.NAME = CONCAT(TS.TABLE_SCHEMA, '/', TS.TABLE_NAME) WHERE TS.TABLE_SCHEMA <> 'mysql' GROUP BY TABLE_NAME, PAGE_TYPE, INDEX_TYPE;

In both cases, the output looks something like the following (if I have read from a single table called test.foo):

+--------------+-------------------+------------+-------+--------------------+--------------+ | TABLE_NAME | PAGE_TYPE | INDEX_TYPE | PAGES | PCT_OF_BUFFER_POOL | PCT_OF_INDEX | +--------------+-------------------+------------+-------+--------------------+--------------+ | NULL | FILE_SPACE_HEADER | NULL | 1 | 0.00 | NULL | | NULL | IBUF_BITMAP | NULL | 1 | 0.00 | NULL | | NULL | INODE | NULL | 1 | 0.00 | NULL | | `test`.`foo` | INDEX | PRIMARY | 2176 | 3.32 | 98.37 | | `test`.`foo` | INDEX | SECONDARY | 2893 | 4.41 | 88.47 | +--------------+-------------------+------------+-------+--------------------+--------------+

Unfortunately, the INFORMATION_SCHEMA tables report total size of secondary indexes for a table, but not the size of each index individually. Therefore this query shows the percent of index only for the primary index (which is also the clustered index, i.e. the table itself), and then all other secondary indexes grouped together.

PERFORMANCE_SCHEMA solution

The PERFORMANCE_SCHEMA also includes some information about the contents of the buffer pool. The MySQL SYS Schema makes it easy to query this. But this view doesn’t calculate the percentage of each table in the buffer pool, nor the percentage of the buffer pool occupied by each table.

mysql> SELECT * FROM sys.innodb_buffer_stats_by_table; +---------------+----------------------+------------+------------+-------+--------------+-----------+-------------+ | object_schema | object_name | allocated | data | pages | pages_hashed | pages_old | rows_cached | +---------------+----------------------+------------+------------+-------+--------------+-----------+-------------+ | test | foo | 149.64 MiB | 106.19 MiB | 9577 | 9577 | 9577 | 1050490 | | InnoDB System | SYS_TABLES | 160.00 KiB | 91.24 KiB | 10 | 10 | 10 | 594 | | InnoDB System | SYS_INDEXES | 128.00 KiB | 93.59 KiB | 8 | 8 | 8 | 1345 | | InnoDB System | SYS_COLUMNS | 80.00 KiB | 47.13 KiB | 5 | 5 | 5 | 761 | | InnoDB System | SYS_DATAFILES | 48.00 KiB | 16.40 KiB | 3 | 3 | 3 | 246 | | InnoDB System | SYS_FIELDS | 48.00 KiB | 16.02 KiB | 3 | 3 | 3 | 377 | | InnoDB System | SYS_FOREIGN | 48.00 KiB | 0 bytes | 3 | 3 | 3 | 0 | | InnoDB System | SYS_TABLESPACES | 48.00 KiB | 15.83 KiB | 3 | 3 | 3 | 242 | | InnoDB System | SYS_FOREIGN_COLS | 16.00 KiB | 0 bytes | 1 | 1 | 1 | 0 | . . .

The post How well does your table fit in the InnoDB buffer pool in MySQL 5.6+? appeared first on MySQL Performance Blog.

InnoDB crash recovery speed in MySQL 5.6

December 24, 2014 - 6:43am

It has been a while since I have looked at InnoDB crash recovery. A lot has change in the last few years – we have serious crash recovery performance improvements in MySQL 5.5 and MySQL 5.6, we have solid state drives raising as typical high performance IO subsystem and we also have the ability to set much larger log files and often have a much larger InnoDB Buffer Pool to work with.

First let me revisit the challenge with have with InnoDB configuration. For write-intensive workloads it is extremely important to size innodb_log_file_size for good performance, however the longer log file size you have the longer you might have to wait for InnoDB to complete crash recovery, which impacts your recovery strategy.

How much can innodb_log_file_size impact performance? Massively! Doing intensive writes to a database that well fits in memory, I’d say there’s a 10x difference between having combined size of log files of 32GB vs 64MB.

Before we look at some performance numbers let’s talk about what InnoDB Crash Recovery time depends on:

Combined Innodb Log File Size innodb_log_file_size*innodb_log_files_in_group is what really matters. It does not really matter which of those two you change. I prefer to keep innodb_log_files_in_group as default and only work with innodb_log_file_size. The larger size you have allocated the longer recovery will take.

innodb_checkpoint_age – Combined size of InnoDB log files defines how many changes not reflected in the tablespace we may have where innodb_checkpoint_age shows how much changes we actually have at the current moment, being an actual driving factor of recovery time. If you have very large log files allocated but for your workload innodb_checkpoint_age stays low chances are recovery will be quick. Be careful however – intensive writes can cause innodb_checkpoint_age to go much higher than the average for your workload causing recovery time from crashes at that time to be much longer.

Innodb Buffer Pool Size – This is another very important factor. During recovery, InnoDB has to redo changes to the unflushed/dirty pages from buffer pool, which is obviously limited by buffer pool size. This also means innodb_max_dirty_pages_pct can be used to impact recovery speed. This is the number of dirty pages being the true driving factor. With small buffer pool, a limited number of dirty pages based on the workload you might not have innodb_checkpoint_age to go high even if you have allowed for large log space.

Data Structure matters a lot for recovery speed. Generally shorter rows being updated will mean longer recovery time for the same log file size. This should make sense as shorter row changes means there is less log space produced for the same amount of page changes. If you do a lot of blob writes InnoDB crash recovery can be short even with relatively large log files.

Access Pattern is another key factor – the more “random” access is the more distinct pages you will have touched during the same innodb_checkpoint_age the longer recovery can take.

Hardware - Better hardware means recovery goes faster, as much is obvious. More specifically you will be looking for storage performance at low concurrency (both reads and writes are important) as well as fast CPU cores – crash recovery at this point is not able to use multiple cores effectively.

Let’s now look at the test….
I am running Sysbench on an 11GB table, designed to fit in the 12GB buffer pool. Here is the exact command:

sysbench --tx-rate=4000 --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=uniform --oltp-table-size=40000000 --mysql-user=root --mysql-password=password --test=/usr/share/doc/sysbench/tests/db/update_index.lua run

The box is rather low end i3-4010U (4 threads) CPU with a Samsung EVO 840GB SSD, so numbers are expected to be higher on real server hardware.

In my first test I’m injecting 4000 updates/sec which is about half of what the box can do at sustained load. I do this to illustrate more common load scenario as we rarely run systems at their saturation point in real world. The uniform distribution should mean worse case scenarios for in-memory workloads through I think recovery speed would be slower if u use random writes to the database much larger than the amount of memory.

At this workload I’m getting innodb_checkpoint_age of 15GB even though total log file size is 32GB. Crashing the system makes for about 40 minutes recovery time so the log was processed at the 6.25MB/sec

Here are some interesting graphs:

As you can see recovery is essentially close to single core. It also can be CPU bound at times (and will be more so with faster storage) – at certain times of recovery when logs are being scanned it can be completely CPU bound (see how IO wait essentially goes to zero at some times)

Over time as recovery progresses more and more blocks become cached, so they do not have to be read from the disk for log records to be applied, meaning the workload becomes more and more write bound.

This is an unweighted IO utilization graph where 1000 corresponds to 100% of time where at least one IO request was outstanding. As you can see from this and the previous drive, InnoDB does not keep the IO well saturated all the time during crash recovery.

Additionally to the first sysbench crash test I did two more – one running a system completely saturated with updates. This made innodb_checkpoint_age to go as high as 23.2GB and crash recovery took 1h 20 minutes, showing some 4.8MB/sec The thing to consider in this case is that MySQL was not able to keep up with purging the history so it was growing quickly meaning crash recovery had to cover a lot of undo space modifications.

Finally I also did a run with a more skewed pareto distribution which resulted in 9.8G innodb_checkpoint_age 33min crash recovery time and 4.94MB/sec of log processing speed.

As I explained above there are a lot of moving parts so your numbers are likely to be quite different, yet I hope this can provide some reasonable baseline you can use for calculation.

Note also waiting for the server to recover from the crash is only one way to deal with recovery. Even if you size log files to be very small you will likely need to deal with Operating System boot and when warmup which will take a few minutes. It is often much better to use a different primary method of crash recovery, such as failover to the MySQL Replication Slave or using Percona XtraDB Cluster. If you use these methods you can often use quite a high combined InnoDB log file size to optimize for performance.

Final Thoughts: Even though InnoDB Crash Recovery has improved in MySQL 5.5 and MySQL 5.6 there is still room to improve it even more. As we see from the resource usage graphs during recovery there is an opportunity to both use multiple CPU cores more effectively as well as drive IO subsystem with higher concurrency and in more sustained fashion.

The post InnoDB crash recovery speed in MySQL 5.6 appeared first on MySQL Performance Blog.

File carving methods for the MySQL DBA

December 23, 2014 - 8:31am

This is a long overdue blog post from London’s 44con Cyber Security conference back in September. A lot of old memories were brought to the front as it were; the one I’m going to cover in this blog post is: file carving.

So what is file carving? despite the terminology it’s not going to be a full roast dinner; unless you have an appetite for data which as you’re here I’m assuming you have.

The TL;DR of “what is file carving” is taking a target blob of data (often a multi GB / TB file) and reducing it in to targeted pieces of data, this could be for instance grabbing all the jpeg images in a packet capture / mysqldump; or pulling that single table/schema out of a huge mysqldump with –all-databases (if you’re not using mydumper you really should it avoids issues like this!) aka “Sorting the wheat from the chaff”.

Let’s take for example at the time of writing this post I am looking to extract a single schema out of one such mysqldump –all-database file of around 2GB (2GB of course isn’t large however it’s large enough to give a practical example; the methods for larger files are of course the same). So where to start?

You’ll need the following tools installed:

  1. xxd (you can substitute xxd for od, hexer or any other hex editing / viewing tool you are comfortable with, just make sure it can handle very large files)
  2. grep

Let’s carve out the mysql schema

dbusby@kali:~$ xxd yourdumpfile.sql | grep 'mysql' -B5 | grep 'ASE' -A2 -B2
00003c0: 6e74 2044 6174 6162 6173 653a 2060 6d79 nt Database: my
00003d0: 7371 6c60 0a2d 2d0a 0a43 5245 4154 4520 sql.--..CREATE
00003e0: 4441 5441 4241 5345 202f 2a21 3332 3331 DATABASE /*!3231
00003f0: 3220 4946 204e 4f54 2045 5849 5354 532a 2 IF NOT EXISTS*
0000400: 2f20 606d 7973 716c 6020 2f2a 2134 3031 / mysql /*!40

Wonderful so we have some hex representation of the sql dumpfile why on earth do we want the hex? we need to define our offsets. In short our offsets are the position of the start and end of the chunk we intend to carve from the file.

From the above our start offset is 00003d9 at the start of CREATE DATABASE; for those unfamiliar with hexdump outputs I recommend looking at the tool hexer a vi like tool and pressing v to enter visual selection mode select a few characters and you’ll not something as follows “visual selection:  0x000003d9 – …”.

You can of course work out the range visually from the above, 00003d0 is the start of the line, each alphanumeric pair is a single byte the byte offset notation is hexedecimal 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f.

00003d0: 7371 6c60 0a2d 2d0a 0a43 5245 4154 4520 sql.--..CREATE
00003d0 == s, 00003d1 == q, 00003d2 == l And so on, we can easily verify this using xxd
dbusby@kali:~$ xxd -s 0x3d9 yourdumpfile.sql | head -n3
00003d9: 4352 4541 5445 2044 4154 4142 4153 4520 CREATE DATABASE
00003e9: 2f2a 2133 3233 3132 2049 4620 4e4f 5420 /*!32312 IF NOT
00003f9: 4558 4953 5453 2a2f 2060 6d79 7371 6c60 EXISTS*/ mysql

right so now we need the end offset, as above we establish a search pattern as the schema data we're carving is in the midst of a larger file we can look for the start of the dump for the next schema.


dbusby@kali:~$ xxd -s 0x3d9 yourdumpfile.sql | grep '--' -A5 | grep C -A2 -B2 | less
...
0083b19: 2043 7572 7265 6e74 2044 6174 6162 6173 Current Databas
0083b29: 653a 2060 7065 7263 6f6e 6160 0a2d 2d0a e: nextschema`.--.
...

I’ve piped into less here as there were many matches to the grep patterns.

From the above we can see a potential offset of 0x83b19 however we want to “backtrack” a few bytes to before the — comment start.


dbusby@kali:~$ xxd -s 0x83b14 yourdumpfile.sql | head -n1
0083b14: 2d2d 0a2d 2d20 4375 7272 656e 7420 4461 --.-- Current Da

Excellent we have our offsets starting at 0x3d9 ending at 0x83b14 we need to now convert base16 (hexidecimal) into base10 fortunatly we can do this usinc the bc utility very easily however we will need to fully expand and make upper case our offsets.


dbusby@kali:~$ echo 'ibase=16;00003D9' | bc
985
dbusby@kali:~$ echo 'ibase=16;0083B14' | bc
539412
dbusby@kali:~$ echo '539412-985' | bc
538427
dbusby@kali:~$ dd if=yourdumpfile.sql of=mysql.sql skip=985 bs=1 count=538427
538427+0 records in
538427+0 records out
538427 bytes (538 kB) copied, 1.08998 s, 494 kB/s

Let’s discuss this a little; what we have done here is convert our start offset to a base10 count of bytes to offset by when using dd (skip=985) we then convert the end offset to its base10 byte position, and by removing the startoffset base10 value this gives us the size of the chunk we are carving.

We now put this into a dd command line, and voila! we have a mysql.sql file which contains only the mysqldump data.

I hope this post helps somewhat to demystify file carving; the above techniques can be applied to any for of file carving need and is not limited only to mysql files.

The post File carving methods for the MySQL DBA appeared first on MySQL Performance Blog.

Testing backup locks during Xtrabackup SST on Percona XtraDB Cluster

December 22, 2014 - 5:00am
Background on Backup Locks

I was very excited to see Backup locks support in release notes for the latest Percona XtraDB Cluster 5.6.21 release. For those who are not aware, backup locks offer an alternative to FLUSH TABLES WITH READ LOCK (FTWRL) in Xtrabackup. While Xtrabackup can hot-copy Innodb, everything else in MySQL must be locked (usually briefly) to get a consistent snapshot that lines up with Innodb. This includes all other storage engines, but also things like table schemas (even on Innodb) and async replication binary logs. You can skip this lock, but it isn’t generally considered a ‘safe’ backup in every case.

Until recently, Xtrabackup (like most other backup tools) used FTWRL to accomplish this. This worked great, but had the unfortunate side-effect of locking every single table, even the Innodb ones.  This functionally meant that even a hot-backup tool for Innodb had to take a (usually short) global lock to get a consistent backup with MySQL overall.

Backup locks change that by introducing a new locking command on Percona Server called ‘LOCK TABLES FOR BACKUP’.  This works by locking writes to non-transactional tables, as well as locking DDL on all tables (including Innodb).  If Xtrabackup (of a recent vintage) detects that it’s backing up a Percona Server (also of recent vintage), it will automatically use LOCK TABLES WITH BACKUP instead of FLUSH TABLES WITH READ LOCK.

The TL;DR of this is that you can keep on modifying your Innodb data through the entire backup, since we don’t need to use FTWRL any longer.

This feature was introduced in Percona Server 5.6.16-64.0 and Percona XtraBackup 2.2.  I do not believe you will find it in any other MySQL variant, though I could be corrected.

What this means for Percona XtraDB Cluster (PXC)

The most common (and logical) SST method for Percona XtraDB Cluster is using Xtrabackup. This latest release of PXC includes support for backup locks, meaning that Xtrabackup donor nodes will no longer need to get a global lock. Practically for PXC users, this means that your Donor nodes can stay in rotation without causing client interruptions due to FTWRL.

Seeing it in action

To test this out, I spun up a 3-node cluster on AWS and fired up a sysbench run on the first node. I forced and SST on the node. Here is a snippet of the innobackup.backup.log (generated by all Xtrabackup donors in Percona XtraDB Cluster):

InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy and Percona LLC and/or its affiliates 2009-2013. All Rights Reserved. This software is published under the GNU GENERAL PUBLIC LICENSE Version 2, June 1991. Get the latest version of Percona XtraBackup, documentation, and help resources: http://www.percona.com/xb/p 141218 19:22:01 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtraback up;mysql_socket=/var/lib/mysql/mysql.sock' as 'sst' (using password: YES). 141218 19:22:01 innobackupex: Connected to MySQL server 141218 19:22:01 innobackupex: Starting the backup operation IMPORTANT: Please check that the backup run completes successfully. At the end of a successful backup run innobackupex prints "completed OK!". innobackupex: Using server version 5.6.21-70.1-56 innobackupex: Created backup directory /tmp/tmp.Rm0qA740U3 141218 19:22:01 innobackupex: Starting ibbackup with command: xtrabackup --defaults-file="/etc/my.cnf" --defaults-group="mysqld" --backup --suspe nd-at-end --target-dir=/tmp/tmp.dM03LgPHFY --innodb_data_file_path="ibdata1:12M:autoextend" --tmpdir=/tmp/tmp.dM03LgPHFY --extra-lsndir='/tmp/tmp.dM 03LgPHFY' --stream=xbstream innobackupex: Waiting for ibbackup (pid=21892) to suspend innobackupex: Suspend file '/tmp/tmp.dM03LgPHFY/xtrabackup_suspended_2' xtrabackup version 2.2.7 based on MySQL server 5.6.21 Linux (x86_64) (revision id: ) xtrabackup: uses posix_fadvise(). xtrabackup: cd to /var/lib/mysql xtrabackup: open files limit requested 0, set to 5000 xtrabackup: using the following InnoDB configuration: xtrabackup: innodb_data_home_dir = ./ xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend xtrabackup: innodb_log_group_home_dir = ./ xtrabackup: innodb_log_files_in_group = 2 xtrabackup: innodb_log_file_size = 1073741824 xtrabackup: using O_DIRECT >> log scanned up to (10525811040) xtrabackup: Generating a list of tablespaces [01] Streaming ./ibdata1 >> log scanned up to (10529368594) >> log scanned up to (10532685942) >> log scanned up to (10536422820) >> log scanned up to (10539562039) >> log scanned up to (10543077110) [01] ...done [01] Streaming ./mysql/innodb_table_stats.ibd [01] ...done [01] Streaming ./mysql/innodb_index_stats.ibd [01] ...done [01] Streaming ./mysql/slave_relay_log_info.ibd [01] ...done [01] Streaming ./mysql/slave_master_info.ibd [01] ...done [01] Streaming ./mysql/slave_worker_info.ibd [01] ...done [01] Streaming ./sbtest/sbtest1.ibd >> log scanned up to (10546490256) >> log scanned up to (10550321726) >> log scanned up to (10553628936) >> log scanned up to (10555422053) [01] ...done ... [01] Streaming ./sbtest/sbtest17.ibd >> log scanned up to (10831343724) >> log scanned up to (10834063832) >> log scanned up to (10837100278) >> log scanned up to (10840243171) [01] ...done xtrabackup: Creating suspend file '/tmp/tmp.dM03LgPHFY/xtrabackup_suspended_2' with pid '21892' >> log scanned up to (10843312323) 141218 19:24:06 innobackupex: Continuing after ibbackup has suspended 141218 19:24:06 innobackupex: Executing LOCK TABLES FOR BACKUP... 141218 19:24:06 innobackupex: Backup tables lock acquired 141218 19:24:06 innobackupex: Starting to backup non-InnoDB tables and files innobackupex: in subdirectories of '/var/lib/mysql/' innobackupex: Backing up files '/var/lib/mysql//mysql/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (74 files) >> log scanned up to (10846683627) >> log scanned up to (10847773504) innobackupex: Backing up files '/var/lib/mysql//sbtest/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (21 files) innobackupex: Backing up file '/var/lib/mysql//test/db.opt' innobackupex: Backing up files '/var/lib/mysql//performance_schema/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (53 files) >> log scanned up to (10852976291) 141218 19:24:09 innobackupex: Finished backing up non-InnoDB tables and files 141218 19:24:09 innobackupex: Executing LOCK BINLOG FOR BACKUP... 141218 19:24:09 innobackupex: Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS... 141218 19:24:09 innobackupex: Waiting for log copying to finish >> log scanned up to (10856996124) xtrabackup: The latest check point (for incremental): '9936050111' xtrabackup: Stopping log copying thread. .>> log scanned up to (10856996124) xtrabackup: Creating suspend file '/tmp/tmp.dM03LgPHFY/xtrabackup_log_copied' with pid '21892' 141218 19:24:10 innobackupex: Executing UNLOCK BINLOG 141218 19:24:10 innobackupex: Executing UNLOCK TABLES 141218 19:24:10 innobackupex: All tables unlocked 141218 19:24:10 innobackupex: Waiting for ibbackup (pid=21892) to finish xtrabackup: Transaction log of lsn (9420426891) to (10856996124) was copied. innobackupex: Backup created in directory '/tmp/tmp.Rm0qA740U3' 141218 19:24:30 innobackupex: Connection to database server closed 141218 19:24:30 innobackupex: completed OK!

We can see the LOCK TABLES FOR BACKUP issued at 19:24:06 and unlocked at 19:24:10. Let’s see Galera apply stats from this node during that time:

mycluster / ip-10-228-128-220 (idx: 0) / Galera 3.8(rf6147dd) Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Window Flow time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst appl comm p_ms 19:23:55 P 5 3 Dono 698µs 0 72 0 5418 0.0 3.5M 0 0 187k 94 3k 3 2 0 19:23:56 P 5 3 Dono 701µs 0 58 0 5411 0.0 3.5M 0 0 188k 229 3k 3 2 0 19:23:57 P 5 3 Dono 701µs 0 2 0 5721 0.0 3.7M 0 0 188k 120 3k 3 2 0 19:23:58 P 5 3 Dono 689µs 0 5 0 5643 0.0 3.6M 0 0 188k 63 3k 3 2 0 19:23:59 P 5 3 Dono 679µs 0 55 0 5428 0.0 3.5M 0 0 188k 115 3k 3 2 0 19:24:01 P 5 3 Dono 681µs 0 1 0 4623 0.0 3.0M 0 0 188k 104 3k 3 2 0 19:24:02 P 5 3 Dono 690µs 0 0 0 4301 0.0 2.7M 0 0 188k 141 3k 3 2 0 19:24:03 P 5 3 Dono 688µs 0 2 0 4907 0.0 3.1M 0 0 188k 227 3k 3 2 0 19:24:04 P 5 3 Dono 692µs 0 44 0 4894 0.0 3.1M 0 0 188k 116 3k 3 2 0 19:24:05 P 5 3 Dono 706µs 0 0 0 5337 0.0 3.4M 0 0 188k 63 3k 3 2 0

Initially the node is keeping up ok with replication. The Down Queue (wsrep_local_recv_queue) is sticking around 0. We’re applying 4-5k transactions per second (Ops Dn). When the backup lock kicks in, we do see an increase in the queue size, but note that transactions are still applying on this node:

19:24:06 P 5 3 Dono 696µs 0 170 0 5671 0.0 3.6M 0 0 187k 130 3k 3 2 0 19:24:07 P 5 3 Dono 695µs 0 2626 0 3175 0.0 2.0M 0 0 185k 2193 3k 3 2 0 19:24:08 P 5 3 Dono 692µs 0 1248 0 6782 0.0 4.3M 0 0 186k 1800 3k 3 2 0 19:24:09 P 5 3 Dono 693µs 0 611 0 6111 0.0 3.9M 0 0 187k 651 3k 3 2 0 19:24:10 P 5 3 Dono 708µs 0 93 0 5316 0.0 3.4M 0 0 187k 139 3k 3 2 0

So this node isn’t locked from innodb write transactions, it’s just suffering a bit of IO load while the backup finishes copying its files and such. After this, the backup finished up and the node goes back to a Synced state pretty quickly:

19:24:11 P 5 3 Dono 720µs 0 1 0 4486 0.0 2.9M 0 0 188k 78 3k 3 2 0 19:24:12 P 5 3 Dono 715µs 0 0 0 3982 0.0 2.5M 0 0 188k 278 3k 3 2 0 19:24:13 P 5 3 Dono 1.2ms 0 0 0 4337 0.0 2.8M 0 0 188k 143 3k 3 2 0 19:24:14 P 5 3 Dono 1.2ms 0 1 0 4901 0.0 3.1M 0 0 188k 130 3k 3 2 0 19:24:16 P 5 3 Dono 1.1ms 0 0 0 5289 0.0 3.4M 0 0 188k 76 3k 3 2 0 19:24:17 P 5 3 Dono 1.1ms 0 42 0 4998 0.0 3.2M 0 0 188k 319 3k 3 2 0 19:24:18 P 5 3 Dono 1.1ms 0 15 0 3290 0.0 2.1M 0 0 188k 75 3k 3 2 0 19:24:19 P 5 3 Dono 1.1ms 0 0 0 4124 0.0 2.6M 0 0 188k 276 3k 3 2 0 19:24:20 P 5 3 Dono 1.1ms 0 4 0 1635 0.0 1.0M 0 0 188k 70 3k 3 2 0 19:24:21 P 5 3 Dono 1.1ms 0 0 0 5026 0.0 3.2M 0 0 188k 158 3k 3 2 0 19:24:22 P 5 3 Dono 1.1ms 0 20 0 4100 0.0 2.6M 0 0 188k 129 3k 3 2 0 19:24:23 P 5 3 Dono 1.1ms 0 0 0 5412 0.0 3.5M 0 0 188k 159 3k 3 2 0 19:24:24 P 5 3 Dono 1.1ms 0 315 0 4567 0.0 2.9M 0 0 187k 170 3k 3 2 0 19:24:25 P 5 3 Dono 1.0ms 0 24 0 5535 0.0 3.5M 0 0 188k 131 3k 3 2 0 19:24:26 P 5 3 Dono 1.0ms 0 0 0 5427 0.0 3.5M 0 0 188k 71 3k 3 2 0 19:24:27 P 5 3 Dono 1.0ms 0 1 0 5221 0.0 3.3M 0 0 188k 256 3k 3 2 0 19:24:28 P 5 3 Dono 1.0ms 0 0 0 5317 0.0 3.4M 0 0 188k 159 3k 3 2 0 19:24:29 P 5 3 Dono 1.0ms 0 1 0 5491 0.0 3.5M 0 0 188k 163 3k 3 2 0 19:24:30 P 5 3 Sync 1.0ms 0 0 0 5540 0.0 3.5M 0 0 188k 296 3k 3 2 0 19:24:31 P 5 3 Sync 992µs 0 106 0 5594 0.0 3.6M 0 0 187k 130 3k 3 2 0 19:24:33 P 5 3 Sync 984µs 0 19 0 5723 0.0 3.7M 0 0 188k 275 3k 3 2 0 19:24:34 P 5 3 Sync 976µs 0 0 0 5508 0.0 3.5M 0 0 188k 182 3k 3 2 0

Compared to Percona XtraDB Cluster 5.5

The Backup Locking is only a feature of Percona XtraDB Cluster 5.6, so if we repeat the experiment on 5.5, we can see a more severe lock:

141218 20:31:19 innobackupex: Executing FLUSH TABLES WITH READ LOCK... 141218 20:31:19 innobackupex: All tables locked and flushed to disk 141218 20:31:19 innobackupex: Starting to backup non-InnoDB tables and files innobackupex: in subdirectories of '/var/lib/mysql/' innobackupex: Backing up files '/var/lib/mysql//sbtest/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (21 files) innobackupex: Backing up files '/var/lib/mysql//mysql/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (72 files) >> log scanned up to (6633554484) innobackupex: Backing up file '/var/lib/mysql//test/db.opt' innobackupex: Backing up files '/var/lib/mysql//performance_schema/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (18 files) 141218 20:31:21 innobackupex: Finished backing up non-InnoDB tables and files 141218 20:31:21 innobackupex: Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS... 141218 20:31:21 innobackupex: Waiting for log copying to finish xtrabackup: The latest check point (for incremental): '5420681649' xtrabackup: Stopping log copying thread. .>> log scanned up to (6633560488) xtrabackup: Creating suspend file '/tmp/tmp.Cq5JRZEFki/xtrabackup_log_copied' with pid '23130' 141218 20:31:22 innobackupex: All tables unlocked

Our lock lasts from 20:31:19 until 20:31:21, so it’s fairly short. Note that with larger databases with more schemas and tables, this can be quite a bit longer. Let’s see the effect on the apply rate for this node:

mycluster / ip-10-229-68-156 (idx: 0) / Galera 2.11(r318911d) Wsrep Cluster Node Repl Queue Ops Bytes Conflct Gcache Window Flow time P cnf # Stat Laten Up Dn Up Dn Up Dn lcf bfa ist idx dst appl comm p_ms 20:31:13 P 5 3 Dono N/A 0 73 0 3493 0.0 1.8M 0 0 1.8m 832 746 2 2 0.0 20:31:14 P 5 3 Dono N/A 0 29 0 3578 0.0 1.9M 0 0 1.8m 850 749 3 2 0.0 20:31:15 P 5 3 Dono N/A 0 0 0 3513 0.0 1.8M 0 0 1.8m 735 743 2 2 0.0 20:31:16 P 5 3 Dono N/A 0 0 0 3651 0.0 1.9M 0 0 1.8m 827 748 2 2 0.0 20:31:17 P 5 3 Dono N/A 0 27 0 3642 0.0 1.9M 0 0 1.8m 840 762 2 2 0.0 20:31:18 P 5 3 Dono N/A 0 0 0 3840 0.0 2.0M 0 0 1.8m 563 776 2 2 0.0 20:31:19 P 5 3 Dono N/A 0 0 0 4368 0.0 2.3M 0 0 1.8m 823 745 2 1 0.0 20:31:20 P 5 3 Dono N/A 0 3952 0 339 0.0 0.2M 0 0 1.8m 678 751 1 1 0.0 20:31:21 P 5 3 Dono N/A 0 7883 0 0 0.0 0.0 0 0 1.8m 678 751 0 0 0.0 20:31:22 P 5 3 Dono N/A 0 4917 0 5947 0.0 3.1M 0 0 1.8m 6034 3k 7 6 0.0 20:31:24 P 5 3 Dono N/A 0 10 0 8238 0.0 4.3M 0 0 1.8m 991 1k 7 6 0.0 20:31:25 P 5 3 Dono N/A 0 0 0 3016 0.0 1.6M 0 0 1.8m 914 754 2 1 0.0 20:31:26 P 5 3 Dono N/A 0 0 0 3253 0.0 1.7M 0 0 1.8m 613 766 1 1 0.0 20:31:27 P 5 3 Dono N/A 0 1 0 3600 0.0 1.9M 0 0 1.8m 583 777 2 1 0.0 20:31:28 P 5 3 Dono N/A 0 0 0 3640 0.0 1.9M 0 0 1.8m 664 750 2 2 0.0

The drop here is more severe and the apply rate hits 0 (and stays there for the duration of the FTWRL).

Implications

Obviously Xtrabackup running on a PXC node will cause some load on the node itself, so there still maybe good reasons to keep a Donor node out of rotation from your application.  However, this is less of an issue than it was in the past, where writes would definitely stall on a Donor node and present potentially intermittent stalls on the application.

How you allow applications to start using a Donor node automatically (or not) depends on how you have your HA between the application and cluster setup.  If you use HAproxy or similar with clustercheck, you can either modify the script itself or change a command line argument. The node is in the Donor/Desynced state below:

[root@ip-10-229-64-35 ~]# /usr/bin/clustercheck clustercheckuser clustercheckpassword! HTTP/1.1 503 Service Unavailable Content-Type: text/plain Connection: close Content-Length: 44 Percona XtraDB Cluster Node is not synced. [root@ip-10-229-64-35 ~]# /usr/bin/clustercheck clustercheckuser clustercheckpassword! 1 HTTP/1.1 200 OK Content-Type: text/plain Connection: close Content-Length: 40 Percona XtraDB Cluster Node is synced.

For those doing their own custom health checking, you basically just need to pass nodes that have a wsrep_local_state_comment of either ‘Synced’ or ‘Donor/Desynced’.

The post Testing backup locks during Xtrabackup SST on Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Store UUID in an optimized way

December 19, 2014 - 6:00am

A few years ago Peter Zaitsev, in a post titled “To UUID or not to UUID,” wrote: There is timestamp based part in UUID which has similar properties to auto_increment and which could be used to have values generated at same point in time physically local in BTREE index.”

For this post I’ve rearranged the timestamp part of UUID (Universal Unique Identifier) and did some benchmarks.

Many people store UUID as char (36) and use as row identity value (PRIMARY KEY) because it is unique across every table, every database and every server and allow easy merging of records from different databases. But here comes the problem, using it as PRIMARY KEY causes the problems described below.

Problems with UUID
  • UUID has 36 characters which makes it bulky.
  • InnoDB stores data in the PRIMARY KEY order and all the secondary keys also contain PRIMARY KEY. So having UUID as PRIMARY KEY makes the index bigger which can not be fit into the memory
  • Inserts are random and the data is scattered.

Despite the problems with UUID, people still prefer it because it is UNIQUE across every table, can be generated anywhere. In this blog, I will explain how to store UUID in an efficient way by re-arranging timestamp part of UUID.

Structure of UUID

MySQL uses UUID version 1 which is a 128-bit number represented by a utf8 string of five hexadecimal numbers

  • The first three numbers are generated from a timestamp.
  • The fourth number preserves temporal uniqueness in case the timestamp value loses monotonicity (for example, due to daylight saving time).
  • The fifth number is an IEEE 802 node number that provides spatial uniqueness. A random number is substituted if the latter is not available (for example, because the host computer has no Ethernet card, or we do not know how to find the hardware address of an interface on your operating system). In this case, spatial uniqueness cannot be guaranteed. Nevertheless, a collision should have very low probability.

The timestamp is mapped as follows:
When the timestamp has the (60 bit) hexadecimal value: 1d8eebc58e0a7d7. The following parts of the UUID are set:: 58e0a7d7-eebc-11d8-9669-0800200c9a66. The 1 before the most significant digits (in 11d8) of the timestamp indicates the UUID version, for time-based UUIDs this is 1.

Fourth and Fifth parts would be mostly constant if it is generated from a single server. First three numbers are based on timestamp, so they will be monotonically increasing. Lets rearrange the total sequence making the UUID closer to sequential. This makes the inserts and recent data look up faster. Dashes (‘-‘) make no sense, so lets remove them.
58e0a7d7-eebc-11d8-9669-0800200c9a66 => 11d8eebc58e0a7d796690800200c9a66

Benchmarking

I created created three tables

  • events_uuid – UUID binary(16) PRIMARY KEY
  • events_int – Additional BIGINT auto increment column and made it as primary key and index on UUID column
  • events_uuid_ordered – Rearranged UUID binary(16) as PRIMARY KEY

I created three stored procedures which insert 25K random rows at a time into the respective tables. There are three more stored procedures which call the random insert-stored procedures in a loop and also calculate the time taken to insert 25K rows and data and index size after each loop. Totally I have inserted 25M records.

    • Data Size
      Horizontal Axis – Number of inserts x 25,000
      Vertical Axis – Data Size in MB

      The data size for UUID table is more than other two tables.
    • Index Size
      Horizontal axis – Number of inserts x 25,000
      Vertical axis – Index Size in MB
    • Total Size
      Horizontal Axis – Number of inserts x 25,000
      Vertical Axis – Total Size in MB
    • Time taken
      Horizontal axis – Number of inserts x 25,000
      Vertical axis – Time Taken in seconds

For the table with UUID as PRIMARY KEY, you can notice that as the table grows big, the time taken to insert rows is increasing almost linearly. Whereas for other tables, the time taken is almost constant.

The size of UUID table is almost 50% bigger than Ordered UUID table and 30% bigger than table with BIGINT as PRIMARY KEY. Comparing the Ordered UUID table BIGINT table, the time taken to insert rows and the size are almost same. But they may vary slightly based on the index structure.

root@localhost:~# ls -lhtr /media/data/test/ | grep ibd -rw-rw---- 1 mysql mysql  13G Jul 24 15:53 events_uuid_ordered.ibd -rw-rw---- 1 mysql mysql  20G Jul 25 02:27 events_uuid.ibd -rw-rw---- 1 mysql mysql  15G Jul 25 07:59 events_int.ibd

Table Structure

#1 events_int CREATE TABLE `events_int` (  `count` bigint(20) NOT NULL AUTO_INCREMENT,  `id` binary(16) NOT NULL,  `unit_id` binary(16) DEFAULT NULL,  `event` int(11) DEFAULT NULL,  `ref_url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,  `campaign_id` binary(16) COLLATE utf8_unicode_ci DEFAULT '',  `unique_id` binary(16) COLLATE utf8_unicode_ci DEFAULT NULL,  `user_agent` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,  `city` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,  `country` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,  `demand_partner_id` binary(16) DEFAULT NULL,  `publisher_id` binary(16) DEFAULT NULL,  `site_id` binary(16) DEFAULT NULL,  `page_id` binary(16) DEFAULT NULL,  `action_at` datetime DEFAULT NULL,  `impression` smallint(6) DEFAULT NULL,  `click` smallint(6) DEFAULT NULL,  `sold_impression` smallint(6) DEFAULT NULL,  `price` decimal(15,7) DEFAULT '0.0000000',  `actioned_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',  `unique_ads` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,  `notification_url` text COLLATE utf8_unicode_ci,  PRIMARY KEY (`count`),  KEY `id` (`id`),  KEY `index_events_on_actioned_at` (`actioned_at`),  KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; #2 events_uuid CREATE TABLE `events_uuid` (  `id` binary(16) NOT NULL,  `unit_id` binary(16) DEFAULT NULL, ~ ~ PRIMARY KEY (`id`),  KEY `index_events_on_actioned_at` (`actioned_at`),  KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; #3 events_uuid_ordered CREATE TABLE `events_uuid_ordered` (   `id` binary(16) NOT NULL,   `unit_id` binary(16) DEFAULT NULL, ~ ~ PRIMARY KEY (`id`),   KEY `index_events_on_actioned_at` (`actioned_at`),   KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Conclusions

 

    • Create function to rearrange UUID fields and use it

DELIMITER // CREATE DEFINER=`root`@`localhost` FUNCTION `ordered_uuid`(uuid BINARY(36)) RETURNS binary(16) DETERMINISTIC RETURN UNHEX(CONCAT(SUBSTR(uuid, 15, 4),SUBSTR(uuid, 10, 4),SUBSTR(uuid, 1, 8),SUBSTR(uuid, 20, 4),SUBSTR(uuid, 25))); // DELIMITER ;

Inserts

INSERT INTO events_uuid_ordered VALUES (ordered_uuid(uuid()),'1','M',....);

Selects

SELECT HEX(uuid),is_active,... FROM events_uuid_ordered ;

    • Define UUID as binary(16) as binary does not have any character set

 

References

 

The post Store UUID in an optimized way appeared first on MySQL Performance Blog.

Making HAProxy 1.5 replication lag aware in MySQL

December 18, 2014 - 7:48am

HAProxy is frequently used as a software load balancer in the MySQL world. Peter Boros, in a past post, explained how to set it up with Percona XtraDB Cluster (PXC) so that it only sends queries to available nodes. The same approach can be used in a regular master-slaves setup to spread the read load across multiple slaves. However with MySQL replication, another factor comes into play: replication lag. In this case the approach mentioned for Percona XtraDB Cluster does not work that well as the check we presented only returns ‘up’ or ‘down’. We would like to be able to tune the weight of a replica inside HAProxy depending on its replication lag. This is what we will do in this post using HAProxy 1.5.

Agent checks in HAProxy

HAProxy 1.5 allows us to run an agent check, which is a check that can be added to a regular health check. The benefit of agent checks is that the return value can be ‘up’ or ‘down’, but also a weight.

What is an agent? It is simply a program that can be accessed from a TCP connection on a given port. So if we want to run an agent on a MySQL server that will:

  • Mark the server as down in HAProxy if replication is not working
  • Set the weight to 100% if the replication lag is < 10s
  • Set the weight to 50% if the replication lag is >= 10s and < 60s
  • Set the weight to 5% in all other situations

We can use a script like this:

$ less agent.php = 10 && $lag < 60){ return "up 50%"; } else return "up 5%"; } set_time_limit(0); $socket = stream_socket_server("tcp://127.0.0.1:$port", $errno, $errstr); if (!$socket) { echo "$errstr ($errno) n"; } else { while ($conn = stream_socket_accept($socket,9999999999999)) { $cmd = "$mysql -h127.0.0.1 -u$user -p$password -P$mysql_port -Ee "$query" | grep Seconds_Behind_Master | cut -d ':' -f2 | tr -d ' '"; exec("$cmd",$lag); $weight = set_weight($lag[0]); unset($lag); fputs ($conn, $weight); fclose ($conn); } fclose($socket); } ?>

If you want the script to be accessible from port 6789 and connect to a MySQL instance running on port 3306, run:

$ php agent.php 6789 3306

You will also need a dedicated MySQL user:

mysql> GRANT REPLICATION CLIENT ON *.* TO 'haproxy'@'127.0.0.1' IDENTIFIED BY 'haproxy_pwd';

When the agent is started, you can check that it is working properly:

# telnet 127.0.0.1 6789 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. up 100% Connection closed by foreign host.

Assuming it is run locally on the app server, that 2 replicas are available (192.168.10.2 and 192.168.10.3) and that the application will send all reads on port 3307, you will define a frontend and a backend in your HAProxy configuration like this:

frontend read_only-front bind *:3307 mode tcp option tcplog log global default_backend read_only-back backend read_only-back mode tcp balance leastconn server slave1 192.168.10.2 weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions server slave2 192.168.10.3 weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions

Demo

Now that everything is set up, let’s see how HAProxy can dynamically change the weight of the servers depending on the replication lag.

No lag

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # HAProxy $ echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,100 read_only-back,slave2,UP,100 read_only-back,BACKEND,UP,200

Slave1 lagging

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 25 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,50 read_only-back,slave2,UP,100 read_only-back,BACKEND,UP,150

Slave2 down

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: NULL # echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,100 read_only-back,slave2,DOWN (agent),100 read_only-back,BACKEND,UP,100

Conclusion

Agent checks are a nice addition in HAProxy 1.5. The setup presented above is a bit simplistic though: for instance, if HAProxy fails to connect to the agent, it will not mark the corresponding as down. It is then recommended to keep a regular health check along with the agent check.

Astute readers will also notice that in this configuration, if replication is broken on all nodes, HAProxy will stop sending reads. This may not be the best solution. Possible options are: stop the agent and mark the servers as UP using the stats socket or add the master as a backup server.

And as a final note, you can edit the code of the agent so that replication lag is measured with Percona Toolkit’s pt-heartbeat instead of Seconds_Behind_Master.

The post Making HAProxy 1.5 replication lag aware in MySQL appeared first on MySQL Performance Blog.

InnoDB’s multi-versioning handling can be Achilles’ heel

December 17, 2014 - 8:05am

I believe InnoDB storage engine architecture is great for a lot of online workloads, however, there are no silver bullets in technology and all design choices have their trade offs. In this blog post I’m going to talk about one important InnoDB limitation that you should consider.

InnoDB is a multiversion concurrency control (MVCC) storage engine which means many versions of the single row can exist at the same time. In fact there can be a huge amount of such row versions. Depending on the isolation mode you have chosen, InnoDB might have to keep all row versions going back to the earliest active read view, but at the very least it will have to keep all versions going back to the start of SELECT query which is currently running.

In most cases this is not a big deal – if you have many short transactions happening you will have only a few row versions to deal with. If you just use the system for reporting queries but do not modify data aggressively at the same time you also will not have many row versions. However, if you mix heavy updates with slow reporting queries going at the same time you can get into a lot of trouble.

Consider for example an application with a hot row (something like actively updated counter) which has 1000 updates per second together with some heavy batch job that takes 1000 to run. In such case we will have 1M of row versions to deal with.

Let’s now talk about how those old-row versions are stored in InnoDB – they are stored in the undo space as an essentially linked list where each row version points to the previous row version together with transaction visibility information that helps to decide which version will be visible by this query. Such design favors short new queries that will typically need to see one of the newer rows, so they do not have to go too far in this linked list. This might not be the case with reporting queries that might need to read rather old row version which correspond to the time when the query was started or logical backups that use consistent reads (think mysqldump or mydumper) which often would need to access such very old row versions.

So going through the linked list of versions is expensive, but how expensive it can get? In this case a lot depends upon whenever UNDO space fits in memory, and so the list will be traversed efficiently – or it does not, in which case you might be looking at the massive disk IO. Keep in mind undo space is not clustered by PRIMARY key, as normal data in InnoDB tables, so if you’re updating multiple rows at the same time (typical case) you will be looking at the row-version chain stored in many pages, often as little as one row version per page, requiring either massive IO or a large amount of UNDO space pages to present in the InnoDB Buffer pool.

Where it can get even worse is Index Scan. This is because Indexes are structured in InnoDB to include all row versions corresponding to the key value, current and past. This means for example the index for KEY=5 will contain pointers to all rows that either have value 5 now or had value 5 some time in the past and have not been purged yet. Now where it can really bite is the following – InnoDB needs to know which of the values stored for the key are visible by the current transaction – and that might mean going through all long-version chains for each of the keys.

This is all theory, so lets see how we can simulate such workloads and see how bad things really can get in practice.

I have created 1Bil rows “sysbench” table which takes some 270GB space and I will use a small buffer pool – 6GB. I will run sysbench with 64 threads pareto distribution (hot rows) while running a full table scan query concurrently: select avg(k) from sbtest1 Here is exact sysbench run done after prepare.

sysbench --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=pareto --oltp-table-size=1000000000 --mysql-user root --mysql-password=password --test /usr/share/doc/sysbench/tests/db/oltp.lua run

Here is the explain for the “reporting” query that you would think to be a rather efficient index scan query. With just 4 bytes 1 Billion of values would be just 4G (really more because of InnoDB overhead) – not a big deal for modern systems:

mysql> explain select avg(k) from sbtest1 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: sbtest1 type: index possible_keys: NULL key: k_1 key_len: 4 ref: NULL rows: 953860873 Extra: Using index 1 row in set (0.00 sec)

2 days have passed and the “reporting” query is still running… furthermore the foreground workload started to look absolutely bizarre:

[207850s] threads: 64, tps: 0.20, reads: 7.40, writes: 0.80, response time: 222481.28ms (95%), errors: 0.70, reconnects: 0.00 [207860s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207870s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207880s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207890s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207900s] threads: 64, tps: 2.70, reads: 47.60, writes: 11.60, response time: 268815.49ms (95%), errors: 0.00, reconnects: 0.00 [207910s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207920s] threads: 64, tps: 2.30, reads: 31.60, writes: 9.50, response time: 294954.28ms (95%), errors: 0.00, reconnects: 0.00 [207930s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207940s] threads: 64, tps: 2.90, reads: 42.00, writes: 12.20, response time: 309332.04ms (95%), errors: 0.00, reconnects: 0.00 [207950s] threads: 64, tps: 0.20, reads: 4.60, writes: 1.00, response time: 318922.41ms (95%), errors: 0.40, reconnects: 0.00 [207960s] threads: 64, tps: 0.20, reads: 1.90, writes: 0.50, response time: 335170.09ms (95%), errors: 0.00, reconnects: 0.00 [207970s] threads: 64, tps: 0.60, reads: 13.20, writes: 2.60, response time: 292842.88ms (95%), errors: 0.60, reconnects: 0.00 [207980s] threads: 64, tps: 2.60, reads: 37.60, writes: 10.20, response time: 351613.43ms (95%), errors: 0.00, reconnects: 0.00 [207990s] threads: 64, tps: 5.60, reads: 78.70, writes: 22.10, response time: 186407.21ms (95%), errors: 0.00, reconnects: 0.00 [208000s] threads: 64, tps: 8.10, reads: 120.20, writes: 32.60, response time: 99179.05ms (95%), errors: 0.00, reconnects: 0.00 [208010s] threads: 64, tps: 19.50, reads: 280.50, writes: 78.90, response time: 27559.69ms (95%), errors: 0.00, reconnects: 0.00 [208020s] threads: 64, tps: 50.70, reads: 691.28, writes: 200.70, response time: 5214.43ms (95%), errors: 0.00, reconnects: 0.00 [208030s] threads: 64, tps: 77.40, reads: 1099.72, writes: 311.31, response time: 2600.66ms (95%), errors: 0.00, reconnects: 0.00 [208040s] threads: 64, tps: 328.20, reads: 4595.40, writes: 1313.40, response time: 911.36ms (95%), errors: 0.00, reconnects: 0.00 [208050s] threads: 64, tps: 538.20, reads: 7531.90, writes: 2152.10, response time: 484.46ms (95%), errors: 0.00, reconnects: 0.00 [208060s] threads: 64, tps: 350.70, reads: 4913.45, writes: 1404.09, response time: 619.42ms (95%), errors: 0.00, reconnects: 0.00 [208070s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208080s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208090s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208100s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208110s] threads: 64, tps: 1.60, reads: 24.20, writes: 6.80, response time: 42385.40ms (95%), errors: 0.10, reconnects: 0.00 [208120s] threads: 64, tps: 0.80, reads: 28.20, writes: 3.40, response time: 51381.54ms (95%), errors: 2.80, reconnects: 0.00 [208130s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208140s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208150s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208160s] threads: 64, tps: 0.60, reads: 14.20, writes: 2.40, response time: 93248.04ms (95%), errors: 0.80, reconnects: 0.00

As you can see we have long stretches of times when there are no queries completed at all… going to some spikes of higher performance. This is how it looks on the graph:

Corresponding CPU usage:

This shows what we are not only observing something we would expect with InnoDB design but also there seems to be some serve starvation happening in this case which we can confirm:

Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790809552640 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790804735744 has waited at row0sel.cc line 3506 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790810756864 has waited at row0sel.cc line 4125 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790811158272 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive

Waiting for the given buffer pool block to become available for more than 3 minutes is a big issue – this lock should never be held by more than a few microseconds.

SHOW PROCESSLIST confirms even most basic selects by primary key can get stalled for long time

| 5499 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5500 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5501 | root | localhost | sbtest | Query | 185 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5502 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5503 | root | localhost | sbtest | Query | 14 | statistics | SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99 ORDER BY c | 0 | 0 | | 5504 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5505 | root | localhost | sbtest | Query | 14 | updating | UPDATE sbtest1 SET k=k+1 WHERE id=1 | 0 | 0 | | 5506 | root | localhost | sbtest | Query | 236 | updating | DELETE FROM sbtest1 WHERE id=1 | 0 | 0 | | 5507 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5508 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99 | 0 | 0 | | 5509 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5510 | root | localhost | sbtest | Query | 14 | updating | UPDATE sbtest1 SET c='80873149502-45132831358-41942500598-17481512835-07042367094-39557981480-593123 | 0 | 0 | | 5511 | root | localhost | sbtest | Query | 236 | updating | UPDATE sbtest1 SET k=k+1 WHERE id=18 | 0 | 1 | | 5512 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=7 | 0 | 0 | | 6009 | root | localhost | sbtest | Query | 195527 | Sending data | select avg(k) from sbtest1 | 0 | 0 |

How do I know it is UNDO space related issue in this case? Because it ends up taking majority of buffer pool

mysql> select page_type,count(*) from INNODB_BUFFER_PAGE group by page_type; +-------------------+----------+ | page_type | count(*) | +-------------------+----------+ | EXTENT_DESCRIPTOR | 1 | | FILE_SPACE_HEADER | 1 | | IBUF_BITMAP | 559 | | IBUF_INDEX | 855 | | INDEX | 2186 | | INODE | 1 | | SYSTEM | 128 | | UNDO_LOG | 382969 | | UNKNOWN | 6508 | +-------------------+----------+ 9 rows in set (1.04 sec)

And it does so in a very crazy way – when there is almost no work being done UNDO_LOG contents of the buffer pool is growing very rapidly while when we’re getting some work done the INDEX type pages take a lot more space. To me this seems like as the index scan is going it touches some hot rows and some not-so-hot ones, containing less row versions and so does not put much pressure on “undo space.”

Take Away: Now you might argue that this given workload and situation is rather artificial and rather narrow. It well might be. My main point here is what if you’re looking at just part of your workload, such as your main short application queries, and not taking reporting or backups into account “because their performance is not important.” In this case you might be in for a big surprise. Those background activities might be taking much more than you would expect, and in addition, they might have much more of a severe impact to your main application workload, like in this case above.

The post InnoDB’s multi-versioning handling can be Achilles’ heel appeared first on MySQL Performance Blog.

OpenStack Live tutorials & sessions to bring OpenStack users up to speed

December 16, 2014 - 9:28am

I attended the OpenStack Paris summit last month (Percona had a booth there). It was my first opportunity to meet face-to-face with this thriving community of developers and users. I’m proud that Percona is part of this open source family and look forward to reconnecting with many of the developers and users I met in Paris – as well as meeting new faces – at OpenStack Live in Silicon Valley April 13-14.

OpenStack summits, generally held twice a year, are the place where (for the most part) developers meet and design “in the open,” as the OpenStack organization says. OpenStack Live 2015, held in parallel with the annual Percona Live MySQL Conference and Expo, will be a unique opportunity for users and enthusiasts to learn from leading OpenStack experts in the field about top cloud strategies, improving overall cloud performance, and operational best practices for managing and optimizing OpenStack and its MySQL database core.

OpenStack Live will also provide some serious classroom-style learning. Percona announced the OpenStack Live tutorials sessions a couple days ago. Most sessions are three hours long and because they really are “hands-on” require that you bring your laptop – and a power cord (not to be confused with a “power chord,” though those also welcome”).

Let’s take a closer look at the OpenStack Live tutorial sessions.

Barbican: Securing Your Secrets.” Join Rackspace gurus Douglas Mendizábal, Chelsea Winfree and Steve Heyman on a tour through the magical world of Barbican (yes, they are dedicated members of the Barbican project).

Don’t be intimidated if don’t have any previous experience with Barbican (and if you’ve never heard of it, more the reason to attend!). A basic understanding of security components (such as keys and certificates) and a basic understanding of ReST is helpful, but not required.

By the end of the class you will know:
1)   Importance of secret storage
2)   How to store & retrieve secrets with Barbican
3)   How to submit an order with Barbican
4)   How to create a container
5)   Use cases for Barbican / Examples
6)   The future of Barbican –Ordering SSL Certs

Deploying, Configuring and Operating OpenStack Trove.” As the title suggests, these three hours focus squarely on Trove. The tutorial – led by Tesora founder & CTO Amrith Kumar, along with Doug Shelley, the company’s vice president of product development – will begin with a quick overview of OpenStack and the various services.

If you attend this tutorial you’ll actually deploy your own OpenStack environment – and create and manage a Nova (compute) instance using a command line and a graphical user interface (Horizon). And the fun continues! You’ll then install and configure Trove, and create and manage a single MySQL instance. Finally, pupils will create and operate a simple replicated MySQL instance pair and ensure that data is being properly replicated from master to slave.

Essential DevStack.” DevStack is an opinionated script to quickly create an OpenStack development environment. It can also be used to demonstrate starting/running OpenStack services and provide examples of using them from a command line. The power of DevStack lies within small trick that if people understand can hugely improve the contribution effectiveness, quality and required time. This three-hour tutorial will be led by Red Hat senior software engineer Swapnil Kulkarni.

OpenStack Networking Introduction,” with PLUMgrid’s Valentina Alaria and Brendan Howes. Buckle your seat belts! Designed for IT professionals looking to expand their OpenStack “networking” (no, not the LinkedIn sort of networking) knowledge, OpenStack Networking Fundamentals will be a comprehensive and fast-paced course.

This half-day training provides an overview of OpenStack, its components and then dives deep into OpenStack Networking – the features and plugin model and its role in building an OpenStack Cloud. The training is concluded with a hands-on lab to bring all the concepts together.

OpenStack Networking (Neutron) Introduction [1 hour]
– Goal of Neutron
– Architecture of Neutron
– Plugin Architecture
– Use cases for Neutron
– What’s new in Juno & what’s planned for Kilo

OpenStack Networking (Neutron) Advanced [1 hour]
– Interaction with other OpenStack components (Compute & Storage)
– Designing Neutron for HA
– Installing Neutron
– Troubleshooting Neutron

Hands-on Lab [1 hour]
– Creation of tenant networks
– Configuration of external connectivity
– Advanced Features Configurati

Percona’s director of conferences, Kortney Runyan, offered a sneak peek at the OpenStack sessions last week. Attendees of the Percona Live MySQL Conference and Expo 2015 (April 13-16, 2015) with full-pass access are also welcome to attend OpenStack Live sessions. The two conferences are running in parallel, which is very exciting since there will be crossover opportunities between them.

I hope to see you next April! And be sure to take advantage of Early Bird pricing for OpenStack Live (register here).
And if you are an organizer of an OpenStack (or MySQL) Meetup and need some financial help, Percona is happy to chip in as a sponsor. Just let me know and I’ll work with you to set that up! You can drop me a note in the comments and I’ll contact you offline.

The post OpenStack Live tutorials & sessions to bring OpenStack users up to speed appeared first on MySQL Performance Blog.

MySQL Tutorials: A time to learn at Percona Live 2015

December 15, 2014 - 12:00am

The many hours of intensive tutorials, led by some of the top minds in MySQL, have always been a major draw each year to the Percona Live MySQL Conference and Expo. And 2015’s event will be no exception.

Percona Live 2015 runs April 13-16 in Santa Clara, Calif. and the first day is dedicated to the classroom – so bring your laptops for the combined 45 hours of learning. MySQL tutorials are included with the full-conference pass but a “tutorial-only pass” is also available. Super-saver registration discounts have been extended until Dec. 19. Here’s a look at this year’s tutorials lineup. (There will be a couple more announced in January).

And that’s just on Monday! There will be much more over the four days of the Percona Live MySQL Conference and Expo 2015. Percona conference director Kortney Runyan shared a sneak peek at the full schedule a few days ago. And remember, super-saver registration discounts have been extended until Dec. 19 so register now – and don’t forgot your laptop (and power cord)!

The post MySQL Tutorials: A time to learn at Percona Live 2015 appeared first on MySQL Performance Blog.

Percona XtraBackup 2.2.7 is now available

December 10, 2014 - 7:03am

Percona is glad to announce the release of Percona XtraBackup 2.2.7 on December 10, 2014. Downloads are available from our download site or Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

Bugs Fixed:

  • Non-default value for innodb_log_block_size variable would cause assertion when preparing the backup. Bug fixed #1391216.
  • When Percona XtraBackup would run FLUSH ENGINE LOGS during the backup process on GTID master, command was recorded to the slave’s binary log as well, which lead to inconsistency between master and slave. Fixed by adding the NO_WRITE_TO_BINLOG clause to FLUSH ENGINE LOGS to avoid interfering with binary log and inconsistency with coordinates. Bug fixed #1394632.
  • Exporting tables was inefficient when backup contained a large (and unrelated) change buffer. Bug fixed #1366065 (Davi Arnaut).
  • innobackupex was printing the GTID even if the GTID mode was disabled which could cause confusion since it wasn’t incrementing. Now it prints only GTID when GITD mode is enabled and when GTID mode is disabled it prints only filename and position. innobackupex still prints GTID, filename and positions if MariaDB server is being backed up. Bug fixed #1391041.

Other bugs fixed: bug #1386157.

Release notes with all the bugfixes for Percona XtraBackup 2.2.7 are available in our online documentation. Bugs can be reported on the launchpad bug tracker. Percona XtraBackup is an open source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases.

The post Percona XtraBackup 2.2.7 is now available appeared first on MySQL Performance Blog.

Recover MySQL root password without restarting MySQL (no downtime!)

December 10, 2014 - 12:00am

Disclaimer: Do this at your own risk! It doesn’t apply if you’re using Pluggable authentication and certainly won’t be usable if/when MySQL system tables are stored on InnoDB

Recover your root password with care!

What is the situation?

The situation is the classic “need to recover MySQL root password” but you cannot restart MySQL (because it is the master production server, or any other reason), which makes the –skip-grant-tables solution as a no-no possibility.

 What can I do?

There is a workaround, which is the following:

  •  Launch another instance of mysqld, a small one (without innodb).
  •  Copy your user.[frm|MYD|MYI] files from the original datadir to the datadir of the new instance.
  • Modify them and then copy them back to the original location.

That simple? No, but close. Here is the step by step:

Step by step recovery
  1. Create a new datadir and run mysql_install_db for the new datadir. This one will be removed at the end. Don’t forget to change ownership to mysql user and group:
    [root@machina dbdata]# mkdir datadir [root@machina dbdata]# chown -R mysql:mysql datadir/ [root@machina dbdata]# mysql_install_db --datadir=/dbdata/datadir/ --user=mysql Installing MySQL system tables...OK Filling help tables...OK
  2. Launch the new instance. Be careful with the datadir path, the socket file and the port number. Also, disable InnoDB, you won’t need it, just add –skip-innodb AND –default-storage-engine=myisam:
    [root@machina datadir]# /usr/sbin/mysqld --basedir=/usr --datadir=/dbdata/datadir --plugin-dir=/usr/lib/mysql/plugin --skip-innodb --default-storage-engine=myisam --socket=/var/run/mysqld/mysql2.sock --port=3307 --user=mysql --log-error=/dblogs/log/error2.log --pid-file=/dbdata/data/mysql.pid &
  3. Copy the user.* files from the original mysql instance (the ones that you need to modify) to the new instance’s datadir and login to this instance of mysql:
    [root@machina ~]# cp /dbdata/data/mysql/user.* /dbdata/datadir/mysql/cp: overwrite `/dbdata/datadir/mysql/user.frm'? y cp: overwrite `/dbdata/datadir/mysql/user.MYD'? y cp: overwrite `/dbdata/datadir/mysql/user.MYI'? y [root@machina datadir]# mysql --socket=/var/run/mysqld/mysql2.sock -p Enter password: Welcome to the MySQL monitor.  Commands end with ; or g.
  4. Execute a “flush tables” command, so the user table will be “reopened” and you can see the data and verify:
    mysql2> flush tables; mysql2> select user, host, password from user where user like 'root'; +------+--------------------------------------+------------------------------------------+ | user | host                                 | password                                 | +------+--------------------------------------+------------------------------------------+ | root | localhost                            | 696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | 127.0.0.1                            | 696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | %                                    | 696D727429CC43695423FA5F2F0155D92A0AAC08 | +------+--------------------------------------+------------------------------------------+ 3 rows in set (0.00 sec)
  5. Now, update the password field with the desired value:
    mysql2> update mysql.user set password='*696D727429CC43695423FA5F2F0155D92A0AAC08' where user like 'root'; Query OK, 3 rows affected (0.00 sec) Rows matched: 3  Changed: 3  Warnings: 0
  6. Verify again:
    mysql2> select user, host, password from user where user like 'root'; +------+--------------------------------------+-------------------------------------------+ | user | host                                 | password                                  | +------+--------------------------------------+-------------------------------------------+ | root | localhost                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | 127.0.0.1                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | %                                    | *696D727429CC43695423FA5F2F0155D92A0AAC08 | +------+--------------------------------------+-------------------------------------------+ 3 rows in set (0.00 sec)
  7. Flush privileges and verify that the new password is correct, by logging in again:
    mysql2> flush privileges; Query OK, 0 rows affected (0.00 sec)
  8. Now that we have made the changes, we can move back the user.* files to the original location, being extremely careful with owner and privileges:
    [root@machina ~]# cd /dbdata/datadir/mysql/ [root@machina mysql]# cp user.* /dbdata/data/mysql/; chown mysql:mysql /dbdata/data/mysql/user.*; chmod 660 /dbdata/data/mysql/user.* cp: overwrite `/dbdata/data/mysql/user.frm'? y cp: overwrite `/dbdata/data/mysql/user.MYD'? y cp: overwrite `/dbdata/data/mysql/user.MYI'? y
  9. At this moment, you can shutdown the new mysql instance since is no longer needed. Be very very careful so you don’t end up shutting down your original mysqld!:
    [root@machina datadir]# mysqladmin --socket=/var/run/mysqld/mysql2.sock -p shutdown Enter password: 141120 06:59:14 mysqld_safe mysqld from pid file /dbdata/data/mysql.pid ended
  10. Now, the last step is to execute a “FLUSH PRIVILEGES” in the original mysqld. Since we cannot yet access it, we need to send a SIGHUP signal to mysqld. MySQL responds to this signal by reloading the grant tables and flushing tables, logs, the thread cache, and the host cache, so choose wisely the moment of the day when you want to send the SIGHUP since the performance might be degraded (look at “flush tables” ).The way to send SIGHUP is to execute “kill” command with the -1 flag:
    [root@machina datadir]# kill -1 $(/sbin/pidof mysqld)
  11. Finally, login into MySQL as root!:
    [root@machina datadir]# mysql -p Enter password: Welcome to the MySQL monitor.  Commands end with ; or g. Your MySQL connection id is 101208 mysql1> select user, host, password from mysql.user where user like 'root'; +------+--------------------------------------+-------------------------------------------+ | user | host                                 | password                                  | +------+--------------------------------------+-------------------------------------------+ | root | localhost                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | 127.0.0.1                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | %                                    | *696D727429CC43695423FA5F2F0155D92A0AAC08 | +------+--------------------------------------+-------------------------------------------+ 3 rows in set (0.00 sec)
    You can see your schemas? of course you can! your databases are okay!
    mysql1> show databases; +--------------------+ | Database           | +--------------------+ | information_schema | | mysql              | | percona            | | testing            | +--------------------+ 4 rows in set (0.03 sec)

We’ve successfully recovered the MySQL root password without the need to restart MySQL and thus avoid downtime.

I hope you never face this situation, but in case you do, there’s a workaround to recover your access! Is there another way to perform this?

Share it with the world!

The post Recover MySQL root password without restarting MySQL (no downtime!) appeared first on MySQL Performance Blog.

OpenStack Live 2015: Sneak peak of the April conference

December 9, 2014 - 9:00am

On behalf of the OpenStack Live Conference Committee, I am excited to announce the sneak peek schedule for the inaugural OpenStack Live 2015 Conference! This new annual conference, running in parallel with the already established Percona Live MySQL Conference and Expo, will feature one day of tutorials followed by a full day of breakout sessions April 13-14, in Santa Clara, Calif.

Though the entire conference schedule won’t be finalized until early January, this initial list of talks is sure to spark interest! So without further ado, here is he OpenStack Live 2015 SNEAK PEEK SCHEDULE!

Deploying an OpenStack Cloud at Scale at Time Warner Cable
-Matthew Fischer, Principal Software Engineer for OpenStack DevOps at Time Warner Cable, and Clayton O’Neill, Principal Software Engineer for OpenStack Cloud at Time Warner Cable

An Introduction to Database as a Service with an Emphasis on OpenStack Using Trove
-Amrith Kumar, Founder and CTO of Tesora, Inc., and Tushar Katarki, Director of Product Management at Percona

Lightweight OpenStack Benchmarking Service with Rally and Docker
-Swapnil Kulkarni, Senior Software Engineer at Red Hat

MySQL and OpenStack Deep Dive
-Peter Boros, Principal Architect at Percona

This is just a small taste of what will be presented at OpenStack Live 2015 conference this spring. Take advantage of this unique opportunity to hear from leading experts in the field about top cloud strategies, improving overall cloud performance, and operational best practices for managing and optimizing OpenStack and its MySQL database core.

As a special bonus, OpenStack Live attendees attendees will also have access to the Percona Live MySQL Conference & Expo keynotes, receptions, exhibition hall, and Birds of a Feather sessions on April 13 and 14, allowing them to dive deeper into MySQL topics such as high availability, security, performance optimization, and much more.

Registration for OpenStack Live 2015 is now open… register now with Early Bird pricing! Hope to see you there!

The post OpenStack Live 2015: Sneak peak of the April conference appeared first on MySQL Performance Blog.

Pages

]]>