EmergencyEMERGENCY? Get 24/7 Help Now!
Subscribe to Latest MySQL Performance Blog posts feed
Percona's Data performance and scalability blog
Updated: 52 min 47 sec ago

Percona Server 5.6.27-76.0 is now available

December 4, 2015 - 8:54am

Percona is glad to announce the release of Percona Server 5.6.27-76.0 on December 4, 2015. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.27, including all the bug fixes in it, Percona Server 5.6.27-76.0 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – and this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release can be found in the 5.6.27-76.0 milestone on Launchpad.

New Features:

  • SHOW SLAVE STATUS NOLOCK syntax in 5.6 has been undeprecated. Both SHOW SLAVE STATUS NOLOCK and SHOW SLAVE STATUS NONBLOCKING are now supported. Percona Server originally used SHOW SLAVE STATUS NOLOCK syntax for this feature. As of 5.6.20-68.0 release, Percona Server implements SHOW SLAVE STATUS NONBLOCKING syntax, which comes from early MySQL 5.7. Current MySQL 5.7 does not have this syntax and regular SHOW SLAVE STATUS is non-blocking.
  • TokuDB tables can now be automatically analyzed in the background based on a measured change in data.
  • Percona Server has implemented new tokudb_strip_frm_data variable which can be used to assist in TokuDB data recovery. WARNING: Use this variable only if you know what you’re doing otherwise it could lead to data loss.

Bugs Fixed:

  • Setting the tokudb_backup_last_error_string and tokudb_backup_last_error values manually could cause server assertion. Bug fixed #1512464.
  • Fixed invalid memory accesses when mysqldump was running with --innodb-optimize-keys option. Bug fixed #1517444.
  • Fixed incorrect filename specified in storage/tokudb/PerconaFT/buildheader/CMakeLists.txt which could cause subsequent builds to fail. Bug fixed #1510085 (Sergei Golubchik).
  • Fixed multiple issues with TokuDB CMake scripts. Bugs fixed #1510092, #1509219 and #1510081 (Sergei Golubchik).
  • An upstream fix for upstream bug #76135 might cause server to stall or hang. Bug fixed #1519094 (upstream #79185).
  • ps_tokudb_admin now prevents Percona TokuBackup activation if there is no TokuDB storage engine on the server. Bug fixed #1520099.
  • Percona TokuBackup plugin now gets removed during the TokuDB storage engine uninstall process. Bug fixed #1520472.
  • New --defaults-file option has been implemented for ps_tokudb_admin to specify the MySQL configuration file if it’s not in the default location. Bug fixed #1517021.

Other bugs fixed: #1425480, #1517523, #1515741 and #1513178.

Release notes for Percona Server 5.6.27-76.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

The post Percona Server 5.6.27-76.0 is now available appeared first on MySQL Performance Blog.

Percona Server 5.5.46-37.6 is now available

December 4, 2015 - 8:52am


Percona is glad to announce the release of Percona Server 5.5.46-37.6 on December 4, 2015. Based on MySQL 5.5.46, including all the bug fixes in it, Percona Server 5.5.46-37.6 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. Details of the release can be found in the 5.5.46-37.6 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:

  • An upstream fix for upstream bug #76135 might cause server to stall or hang. Bug fixed #1519094 (upstream #79185).
  • Fixed invalid memory accesses when mysqldump was running with --innodb-optimize-keys option. Bug fixed #1517444.

Other bugs fixed: #1517523.

Release notes for Percona Server 5.5.46-37.6 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.46-37.6 is now available appeared first on MySQL Performance Blog.

Amazon Aurora in sysbench benchmarks

December 3, 2015 - 9:22am

In my previous post Amazon Aurora – Looking Deeper, I promised benchmark results on Amazon Aurora.

There are already some results available from Amazon itself: https://d0.awsstatic.com/product-marketing/Aurora/RDS_Aurora_Performance_Assessment_Benchmarking_v1-2.pdf.

There are also some from Marco Tusa: http://www.tusacentral.net/joomla/index.php/mysql-blogs/175-aws-aurora-benchmarking-blast-or-splash.html.

Amazon used quite a small dataset in their benchmark: 250 tables, with 25000 rows each, which in my calculation corresponds to 4.5GB worth of data. For this datasize, Amazon used r3.8xlarge instances, which provided 32 virtual CPUs and 244GB of memory. So I can’t say their benchmark is particularly illustrative, as all the data fits nicely into the available memory.

In my benchmark, I wanted to try different datasizes, and also compare Amazon Aurora with Percona Server 5.6 in identical cloud instances.

You can find my full report there: http://benchmark-docs.readthedocs.org/en/latest/benchmarks/aurora-sysbench-201511.html

Below is a short description of the benchmark:

Data Sizes

  • Initial dataset. 32 sysbench tables, 50 million (mln) rows each. It corresponds to about 400GB of data.
  • Testing sizes. For this benchmark, we vary the maximum amount of rows used by sysbench: 1mln, 2.5mln, 5mln, 10mln, 25mln, 50mln.

In the chart, the results are marked in thousands of rows: 1000, 2500, 5000, 10000, 25000, 50000. In other words, “1000” corresponds to 1mln rows.

Instance Sizes
It is actually very complicated to find an equal configuration (in both performance and price aspects) to use as a comparison between Percona Server running on an EC2 instance and Amazon Aurora.

Amazon Aurora:

  • db.r3.xlarge instance (4 virtual CPUS + 30GB memory)
  • Monthly computing cost (1-YEAR TERM, No Upfront): $277.40
  • Monthly storage cost: $0.100 per GB-month * 400 GB = $40
  • Extra $0.200 per 1 million IO requests

Total cost (per month, excluding extra per IO requests): $311.40

Percona Server:

  • r3.xlarge instance (4 virtual CPUS + 30GB memory)
  • Monthly computing cost (1-YEAR TERM, No Upfront): $160.60

For the storage we will use 3 options:

  • General purpose SSD volume (marked as “ps” in charts), 500GB size, 1500/3000 ios, cost: $0.10 per GB-month * 500 = $50
  • Provisioned IOPS SSD volume (marked as “ps-io3000”), 500GB, 3000 IOP = $0.125 per GB-month * 500 + $0.065 per provisioned IOPS-month * 3000 = $62.5 + $195 = $257.5
  • Provisioned IOPS SSD volume (marked as “ps-io2000”), 500GB, 2000 IOP = $0.125 per GB-month * 500 + $0.065 per provisioned IOPS-month * 2000 = $62.5 + $130 = $192.5

So corresponding total costs (per month) for used EC2 instances are: $210.60; $418.10; $353.10

Results

More graphs, including timelines, are available by the link http://benchmark-docs.readthedocs.org/en/latest/benchmarks/aurora-sysbench-201511.html

Summary results of Amazon Aurora vs Percona Server vs different datasizes:

Observations
There are few important points to highlight:

  • Even in long runs (2 hours) I didn’t see a fluctuation in results. The throughput is stable.
  • I actually made one run for 48 hours. There were still no fluctuations.
  • For Percona Server, as expected, better storage gives better throughput. 3000 IOPS is better then Amazon Aurora, especially for IO-heavy cases.
  • Amazon Aurora shows worse results with smaller datasizes. Aurora outperforms Percona Server (with general purpose SSD and provisioned SSD 2000IOPS volumes) when it comes to big datasizes.
  • It appears that Amazon Aurora does not benefit from adding extra memory – the throughput does not grow much with small datasizes. I think it proves my assumption that Aurora has some kind of write-through cache, which shows better results in IO-heavy workloads.
  • Provisioned IO volumes indeed give much better performance compared to general purpose volume, though they are more expensive.
  • From a cost consideration (compared to provisioned IO volumes) 3000 IOPS is more cost efficient (for this particular case, but in your workload it might be different) than 2000 IOPS, in the sense that it gives more throughput per dollar.

The post Amazon Aurora in sysbench benchmarks appeared first on MySQL Performance Blog.

Upcoming Webinar: MySQL 5.7 in a Nutshell

December 2, 2015 - 2:35pm

Join Alexander Rubin, Principal Consultant, Percona as he provides an overview of MySQL 5.7 features. Wednesday, December 9, 2015 10:00AM PST

MySQL® 5.7 is a great release that has a lot to offer, especially in the areas of development and replication. It provides many new optimizer features for developers, a much more powerful GIS function, and a high performance JSON data type – allowing for a more powerful store for semi-structured data. It also features a dramatically improved Performance Schema and Parallel and Multi-Source replication – allowing you to scale much further than ever before.

Primary manageability enhancements within MySQL 5.7 include:

  • New Native JSON Data Type and JSON Functions: Allows for efficient and flexible storage, search, and manipulation of schema-less data.
  • Performance Schema: Enables instrumentation for memory, transactions, stored routines, prepared statements, replication, and locks.
  • MySQL SYS Schema: Provides helper objects that answer common performance, health, usage, and monitoring questions.
  • Improved Security: Delivers easier and safer instance initialization, setup and management.
  • Expanded Geographic Information System (GIS) Support for Mobile Applications: Provides spatial index support in InnoDB, GeoJSON, and GeoHash.

Key enhancements in MySQL 5.7 for greater performance and scalability include:

  • Enhanced Speed: In benchmark tests MySQL 5.7 delivered 1,600,000 queries per second (QPS) – 3x faster than MySQL 5.6.
  • Optimized InnoDB: New capabilities include increased performance and concurrency, enhanced on-line operations, spatial indexes, and native partitioning.
  • More Robust Replication: Enhancements to MySQL’s replication features include multi-source replication, enhanced Global Transaction Identifiers (GTIDs), and improved multi-threaded slaves for better scalability and availability.
  • Enhanced Optimizer: A new MySQL optimizer dynamic cost model provides better query performance and greater user control.

This webinar will be part of a 3-part series which will include MySQL 5.7 for developers and MySQL 5.7 for DBAs.

Register here:

https://www.percona.com/resources/webinars/mysql-57-nutshell

The post Upcoming Webinar: MySQL 5.7 in a Nutshell appeared first on MySQL Performance Blog.

Fixing errant transactions with mysqlslavetrx prior to a GTID failover

December 2, 2015 - 10:05am

Errant transactions are a major issue when using GTID replication. Although this isn’t something new, the drawbacks are more notorious with GTID than with regular replication.

The situation where errant transaction bite you is a common DBA task: Failover. Now that tools like MHA have support for GTID replication (starting from 0.56 version), this protocol is becoming more popular, and so are the issues with errant transactions. Luckily, the fix is as simple as injecting an empty transaction into the databases that lack the transaction. You can easily do this through the master, and it will be propagated to all the slaves.

Let’s consider the following situations:

  • What happens when the master blows up into the air and is out of the picture?
  • What happens when there’s not just one but dozens of errant transactions?
  • What happens when you have a high number of slaves?

Things start to become a little more complex.

A side note for the first case: when your master is no longer available, how you can find errant transactions? Well, you can’t. In this case, you should check for errant transactions between your slaves and your formerly slave/soon-to-be master.

Let’s think alternatives. What’s the workaround of injecting empty transactions for every single errant trx to every single slave? The MySQL utility mysqlslavetrx. Basically, this utility allows us to skip multiple transactions on multiple slaves in a single step.

One way to install the MySQL utilities is by executing the following steps:

  • wget http://dev.mysql.com/get/Downloads/MySQLGUITools/mysql-utilities-1.6.2.tar.gz
  • tar -xvzf mysql-utilities-1.6.2.tar.gz
  • cd mysql-utilities-1.6.2
  • python ./setup.py build
  • sudo python ./setup.py install

And you’re ready.

What about some examples? Let’s say we have this Master/Slave server with GTID replication, current status as follows:

mysql> show master status; +------------------+----------+--------------+------------------+------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+------------------------------------------+ | mysql-bin.000002 | 530 | | | 66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2 | +------------------+----------+--------------+------------------+------------------------------------------+ 1 row in set (0.00 sec) mysql> show slave statusG ... Executed_Gtid_Set: 66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2 Auto_Position: 1 1 row in set (0.00 sec)

Add chaos to the slave in form of a new schema:

mysql> create database percona; Query OK, 1 row affected (0.00 sec)

Now we have an errant transaction!!!!!

The slave status looks different:

mysql> show slave statusG ... Executed_Gtid_Set: 66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2, 674a625e-976e-11e5-a8fb-125cab082fc3:1 Auto_Position: 1 1 row in set (0.00 sec)

By using the GTID_SUBSET function we can confirm that things goes from “all right” to “no-good”:

Before:

mysql> select gtid_subset('66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2','66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2') as is_subset; +-----------+ | is_subset | +-----------+ | 1 | +-----------+ 1 row in set (0.00 sec)

After:

mysql> select gtid_subset('66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2,674a625e-976e-11e5-a8fb-125cab082fc3:1','66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2') as is_subset; +-----------+ | is_subset | +-----------+ | 0 | +-----------+ 1 row in set (0.00 sec)

All right, it’s a mess, got it. What’s the errant transaction? The GTID_SUBTRACT function will tell us:

mysql> select gtid_subtract('66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2,674a625e-976e-11e5-a8fb-125cab082fc3:1','66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2') as errand; +----------------------------------------+ | errand | +----------------------------------------+ | 674a625e-976e-11e5-a8fb-125cab082fc3:1 | +----------------------------------------+ 1 row in set (0.00 sec)

The classic way to fix this is by injecting an empty transaction:

mysql> SET GTID_NEXT='674a625e-976e-11e5-a8fb-125cab082fc3:1'; Query OK, 0 rows affected (0.00 sec) mysql> begin; Query OK, 0 rows affected (0.00 sec) mysql> commit; Query OK, 0 rows affected (0.00 sec) mysql> SET GTID_NEXT='AUTOMATIC'; Query OK, 0 rows affected (0.00 sec)

After of this, the errant transaction won’t be errant anymore.

mysql> show master status; +------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+ | mysql-bin.000002 | 715 | | | 66fbd3be-976e-11e5-a8fb-1256731a26b7:1-2, 674a625e-976e-11e5-a8fb-125cab082fc3:1 | +------------------+----------+--------------+------------------+----------------------------------------------------------------------------------+ 1 row in set (0.00 sec)

Okay, let’s add another slave to the mix. Now is the moment where the mysqlslavetrx becomes very handy.

What you need to know is:

  • The slave’s IP address
  • The GTID set

And it will be simple to execute:

mysqlslavetrx --­­gtid­-set=6aa9a742­8284­11e5­a09b­12aac3869fc9:1­­ --verbose ­­--slaves=user:password@172.16.1.143:3306,user:password@172.16.1.144

The verbose output will look something this:

# GTID set to be skipped for each server: # ­- 172.16.1.143@3306: 6aa9a742­8284­11e5­a09b­12aac3869fc9:1 # ­- 172.16.1.144@3306: 6aa9a742­8284­11e5­a09b­12aac3869fc9:1 # # Injecting empty transactions for '172.16.1.143:3306'... # ­- 6aa9a742­8284­11e5­a09b­12aac3869fc9:1 # Injecting empty transactions for '172.16.1.144:3306'... # ­- 6aa9a742­8284­11e5­a09b­12aac3869fc9:1 # #...done. #

You can run mysqlslavetrx from anywhere (master or any slave). You just need to be sure that the user and password used are valid and have the the SUPER privilege, required to set the gtid_next variable.

As a summary: Take advantages of the MySQL utilities. In this particular case, mysqlslavetrx is extremely useful when using GTID replication and want to perform a clean failover. It can be added as a pre-script for MHA failover (which supports GTID since the 0.56 version) or can be simply used to maintain consistency between master and slaves.

The post Fixing errant transactions with mysqlslavetrx prior to a GTID failover appeared first on MySQL Performance Blog.

How to upgrade your master server with minimal downtime

December 1, 2015 - 9:31am

Here’s a step-by-step guide on how to invert roles for master and slave so you can perform a master server upgrade, and then switch roles back to the original setup.

* While following this guide consider server-A as your original master and server-B as your original slave. We will assume server-B already produces binlogs and that both nodes have log-slave-updates=1 in my.cnf
https://dev.mysql.com/doc/refman/5.0/en/replication-options-slave.html#option_mysqld_log-slave-updates

1. Prepare the original Slave to invert roles.

Make sure the slave is not in read-only mode. This should be set to 0, if not change it:

server-B> SELECT @@global.read_only; server-B> SET GLOBAL read_only=0

server-B> SET GLOBAL read_only=0


* For critical service, you might also want to make sure the slave is actually a perfect replica by running a checksum of the tables with pt-table-checksum.

 

Check SHOW SLAVE STATUSG  on server-B until you see Seconds_Behind_Master is zero

server-B> SHOW SLAVE STATUSG

 

When server-B is fully caught up, then issue FLUSH TABLES WITH READ LOCK;  in a mysql client prompt in server-A

At this point your users will suffer a service request interruption and downtime starts ticking.

* DO NOT CLOSE this mysql client; otherwise the lock will be lost. We’ll use this same session to run the rest of the commands in server-A.

server-A> FLUSH TABLES WITH READ LOCK;

 

Get master coordinates:

server-A> SHOW MASTER STATUS;

 

Run SHOW SLAVE STATUSG  in server-B
Repeat until Relay_Master_Log_File  and Exec_Master_Log_Pos  matches file and position from the previous step.

server-B> SHOW SLAVE STATUSG

 

Check if SHOW MASTER STATUS;  is not changing on server-B (to ensure that there are no queries local to server-B), then stop the slave

server-B> SHOW MASTER STATUS; server-B> STOP SLAVE; server-B> RESET SLAVE ALL;

 

Run SHOW MASTER STATUSG  in server-B and save this information in a safe place.

Also make sure binlogs from that position and onwards are kept until you bring server-A back online. Otherwise you’ll need to rebuild server-A.

server-B> SHOW MASTER STATUSG


2. Reverse roles for Master – Slave and upgrade original Master

Direct traffic to server-B (point VIP to server-B, or through DNS -not really recommended-, or by updating the configuration files of API services).

In a separate shell session in server-A, stop mysql

Now server-A can be shut down and serviced

* At this point you’re safe to upgrade server-A.

* After you’ve finished upgrading your server you can continue this guide to put server-A back as master.

 

Start MySQL in server-A (with skip-slave-start!) and run RESET SLAVE ALL;

server-A> RESET SLAVE ALL;

 

Reconfigure slave in A with CHANGE MASTER TO ...  and start it
Make sure to put the same
MASTER_LOG_FILE  and MASTER_LOG_POS  values as you previously got from running SHOW MASTER STATUSG  on server-B.

 

server-A> CHANGE MASTER TO MASTER_HOST='<IP-of-server-B>', MASTER_USER='<your-replication-user>, MASTER_PASSWORD='<replication-user-password>', MASTER_LOG_FILE='<file-from-serverB>, MASTER_LOG_POS=<pos-from-serverB->; server-A> SET GLOBAL read_only=1; server-A> START SLAVE;

 

Make sure replication is running

Repeat until you see Seconds_Behind_Master  is 0.

server-A> SHOW SLAVE STATUSG

* Here is the moment where the roles are fully inverted, and A is an up-to-date slave of B.

 

3. Prepare the original Master to set back the original roles.

Prepare

When server-A is fully caught up, then issue FLUSH TABLES WITH READ LOCK;  in a mysql client prompt in server-B.

DO NOT CLOSE this mysql client on server-B; otherwise the lock will be lost. We’ll use this same session to run the rest of the commands in server-B.

server-B> FLUSH TABLES WITH READ LOCK;

* DO NOT CLOSE THIS CLIENT SESSION! *

 

Check master status

server-B> SHOW MASTER STATUS;

 

Run SHOW SLAVE STATUSG  in server-A. Repeat until Relay_Master_Log_File  and Exec_Master_Log_Pos  matches the file and position from SHOW MASTER STATUS;

server-A> SHOW SLAVE STATUSG

 

Get master coordinates:

server-A> SHOW MASTER STATUSG server-A> SET GLOBAL read_only=0;

4. Set back the original roles for Master – Slave

Direct traffic to server-A (point VIP to server-A, or through DNS -not really recommended-, or by updating the configuration files of API services).

In a separate shell session in server-B, restart mysql.

Make server-B again a slave of server-A:

server-B> CHANGE MASTER TO MASTER_HOST='<IP-of-server-A>', MASTER_USER='<your-replication-user>, MASTER_PASSWORD='<replication-user-password>', MASTER_LOG_FILE='<file-from-server-A>, MASTER_LOG_POS=<pos-from-server-A>; server-B> SET GLOBAL read_only=1; server-B> START SLAVE; server-B> SHOW SLAVE STATUSG

 

Stop the slave threads in server-A:

server-A> STOP SLAVE; server-A> RESET SLAVE ALL;

 

* If you have a chained replication setup you should take into consideration the log_slave_updates variable, especially as it is used in this case and should be enabled.
i.e.: In a replication chain A > B > C, for C to receive updates from A, B will need to log them to the binary logs and that is what this option does.

For more information regarding this variable, please check the following link:
https://dev.mysql.com/doc/refman/5.0/en/replication-options-binary-log.html#sysvar_log_slave_updates

The post How to upgrade your master server with minimal downtime appeared first on MySQL Performance Blog.

Transporting tablespace from MySQL 5.6 to MySQL 5.7 (case study)

November 30, 2015 - 11:32pm

Recently, I was working on a MySQL support ticket where a customer was facing an issue while transporting tablespace from MySQL 5.6 to MySQL 5.7.
After closely reviewing the situation, I saw that while importing tablespace they were getting errors such as:

ERROR 1808 (HY000): Schema mismatch (Table flags don't match, server table has 0x10 and the meta-data file has 0x1)

After some research, I found that there is a similar bug reported to MySQL for this issue (https://bugs.mysql.com/bug.php?id=76142), but no solution is mentioned. I tested the scenario locally and found a solution that I will detail in this post.

First, I created a table on MySQL 5.6

nilnandan.joshi@bm-support01:~$ mysql -uroot -p --socket=/tmp/mysql_sandbox5624.sock Enter password: Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 35 Server version: 5.6.24 MySQL Community Server (GPL) ... mysql> create database nil; Query OK, 1 row affected (0.02 sec) mysql> use nil; Database changed mysql> create table nil(id int, name varchar(10)); Query OK, 0 rows affected (0.01 sec) mysql> insert into nil values (1, 'nilnandan'),(2, 'niljoshi'),(3, 'njoshi'),(4,'joshi'); Query OK, 4 rows affected (0.10 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> select * from nil; +------+-----------+ | id | name | +------+-----------+ | 1 | nilnandan | | 2 | niljoshi | | 3 | njoshi | | 4 | joshi | +------+-----------+ 4 rows in set (0.02 sec)

Then took the backup for the nil database and exported it.

nilnandan.joshi@bm-support01:~$ innobackupex --defaults-file=/home/njoshi/sandboxes/msb_5_6_24/my.sandbox.cnf --user=root --password=msandbox --databases="nil" /home/njoshi/backup/ 151127 01:29:14 innobackupex: Starting the backup operation .... 151127 01:29:16 Backup created in directory '/home/njoshi/backup//2015-11-27_01-29-14' 151127 01:29:16 [00] Writing backup-my.cnf 151127 01:29:16 [00] ...done 151127 01:29:16 [00] Writing xtrabackup_info 151127 01:29:16 [00] ...done xtrabackup: Transaction log of lsn (1695477) to (1695477) was copied. 151127 01:29:16 completed OK! nilnandan.joshi@bm-support01:~$ nilnandan.joshi@bm-support01:~$ innobackupex --apply-log --export backup/2015-11-27_01-29-14 151127 01:32:25 innobackupex: Starting the apply-log operation ... xtrabackup: export option is specified. xtrabackup: export metadata of table 'nil/nil' to file `./nil/nil.exp` (1 indexes) xtrabackup: name=GEN_CLUST_INDEX, id.low=31, page=3 ... InnoDB: Starting shutdown... InnoDB: Shutdown completed; log sequence number 1706518 151127 01:32:28 completed OK! nilnandan.joshi@bm-support01:~$ nilnandan.joshi@bm-support01:~/backup/2015-11-27_01-29-14/nil$ ll total 140 drwx------ 2 nilnandan.joshi percona 4096 Nov 27 01:32 ./ drwx------ 3 nilnandan.joshi percona 4096 Nov 27 01:32 ../ -rw-r----- 1 nilnandan.joshi percona 65 Nov 27 01:29 db.opt -rw-r--r-- 1 nilnandan.joshi percona 421 Nov 27 01:32 nil.cfg -rw-r--r-- 1 nilnandan.joshi percona 16384 Nov 27 01:32 nil.exp -rw-r----- 1 nilnandan.joshi percona 8586 Nov 27 01:29 nil.frm -rw-r----- 1 nilnandan.joshi percona 98304 Nov 27 01:29 nil.ibd nilnandan.joshi@bm-support01:~/backup/2015-11-27_01-29-14/nil$

Now on MySQL 5.7, I create a nil table, discarded tablespace, copied the .cfg and .ibd files from backup to the datadir, and set proper permissions.

nilnandan.joshi@bm-support01:~/backup/2015-11-27_01-29-14/nil$ mysql -uroot -p --socket=/tmp/mysql_sandbox13454.sock Enter password: Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 16 Server version: 5.7.9-log MySQL Community Server (GPL) ... mysql> create database nil; Query OK, 1 row affected (0.04 sec) mysql> use nil Database changed mysql> create table nil(id int, name varchar(10)); Query OK, 0 rows affected (0.04 sec) mysql> ALTER TABLE nil DISCARD TABLESPACE; Query OK, 0 rows affected (0.00 sec) mysql> quit Bye nilnandan.joshi@bm-support01:/home/nilnandan.joshi/sandboxes/rsandbox_mysql-5_7_9/master/data/nil$ sudo cp /home/njoshi/backup/2015-11-27_01-29-14/nil/nil.cfg . nilnandan.joshi@bm-support01:/home/nilnandan.joshi/sandboxes/rsandbox_mysql-5_7_9/master/data/nil$ sudo cp /home/njoshi/backup/2015-11-27_01-29-14/nil/nil.ibd . nilnandan.joshi@bm-support01:/home/nilnandan.joshi/sandboxes/rsandbox_mysql-5_7_9/master/data/nil$ nilnandan.joshi@bm-support01:/home/nilnandan.joshi/sandboxes/rsandbox_mysql-5_7_9/master/data/nil$ mysql -uroot -p --socket=/tmp/mysql_sandbox13454.sock Enter password: Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 17 Server version: 5.7.9-log MySQL Community Server (GPL) ... mysql> use nil Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> show tables; +---------------+ | Tables_in_nil | +---------------+ | nil | +---------------+ 1 row in set (0.00 sec) mysql> ALTER TABLE nil IMPORT TABLESPACE; ERROR 1808 (HY000): Schema mismatch (Table flags don't match, server table has 0x5 and the meta-data file has 0x1)

After all that, I got the same error but I didn’t find any specific error message about the problem. When I deleted the .cfg file and tried again I got the exact error message.

mysql> ALTER TABLE nil IMPORT TABLESPACE; ERROR 1808 (HY000): Schema mismatch (Table has ROW_TYPE_DYNAMIC row format, .ibd file has ROW_TYPE_COMPACT row format.)

So, I dropped the table on MySQL 5.7, created the same table with the  “row_format=compact” option, copied both .cfg and .ibd files again, and this time it worked.

mysql> drop table nil; Query OK, 0 rows affected (0.00 sec) mysql> create table nil(id int, name varchar(10)) row_format=compact; Query OK, 0 rows affected (0.00 sec) mysql> ALTER TABLE nil DISCARD TABLESPACE; Query OK, 0 rows affected (0.00 sec) mysql> quit Bye nilnandan.joshi@bm-support01:/home/nilnandan.joshi/sandboxes/rsandbox_mysql-5_7_9/master/data/nil$ sudo cp /home/njoshi/backup/2015-11-27_01-29-14/nil/nil.ibd . nilnandan.joshi@bm-support01:/home/nilnandan.joshi/sandboxes/rsandbox_mysql-5_7_9/master/data/nil$ sudo cp /home/njoshi/backup/2015-11-27_01-29-14/nil/nil.cfg . mysql> ALTER TABLE nil IMPORT TABLESPACE; Query OK, 0 rows affected (0.00 sec) mysql> select * from nil; +------+-----------+ | id | name | +------+-----------+ | 1 | nilnandan | | 2 | niljoshi | | 3 | njoshi | | 4 | joshi | +------+-----------+ 4 rows in set (0.01 sec)

I’m not sure if the same issue occurs each time, but one thing is sure: removing the .cfg file gives you the exact cause of the problem and and lets you resolve it.

Why does moving tablespace from MySQL 5.6 to MySQL 5.7 give you this error? The answer is because the default innodb_file_format is changed in MySQL 5.7 from Antelope to Barracuda.

Important Change; InnoDB: The following changes were made to InnoDB configuration parameter default values:
  • The innodb_file_format default value was changed to Barracuda. The previous default value was Antelope. This change allows tables to use Compressed or Dynamic row formats.

https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-7.html

I’ve already submitted bug report to MySQL for this error and inappropriate error message.  https://bugs.mysql.com/bug.php?id=79469

BTW: here the the link for how we can transport tablespace with Xtrabackup.
https://www.percona.com/doc/percona-xtrabackup/2.3/howtos/recipes_ibkx_partition.html

The post Transporting tablespace from MySQL 5.6 to MySQL 5.7 (case study) appeared first on MySQL Performance Blog.

Smallest MySQL Server in the World (under $60) which can even make you toast while fixing MySQL bug #2

November 25, 2015 - 7:36am

Introduction
In my last blog post, Internet of Things, Messaging and MySQL, I have showed how to start your own Internet of Things with Particle Photon board. That implementation is great, but requires constant internet (wi-fi) access as the Particle Photon board does not have any local storage. If you do not have a reliable network access (i.e. in some remote places) or need something really small to store your data you can now use Intel Edison. I’ve even install MySQL on Edison, which makes it the smallest (in size) MySQL server in the world! Other options include:

MySQL Configuration

Intel Edison is a tiny computer based on 22 nm Silvermont dual-core Intel Atom CPU, 500MHz, running Linux (Ubuntu based distribution called Yocto). To program the Edison we will need a breakout board. Options include Arduino compatible breakout board (which includes SD card) and a small Intel breakout board.

The installation and configuration is straightforward. I’ve used the Get Started with Yocto Project on the Intel Edison Board guide to setup and configure the board. First we need to connect to Edison via serial connection and configure sshd and Wi-Fi; when it is done we can connect to Edison using SSH.

The MySQL installation is relatively easy as Linux generic binaries are compatible with Yocto Linux (so you do not have to compile anything). There are 2 challenges though:

  • By default the Yocto linux (as well as the official repository) does not include libraries needed for MySQL: libaio1, libcrypto, libssl
  • The internal storage is tiny and MySQL 5.7 binaries did not even fit into any partition. I had to remove some “tests” and other stuff I do not need. For the real installation one can use SD card (SD slot is available on some boards).

To install the libraries I’ve used the un-official Edison repositories following the excellent guide: Edison package repo configuration. Setup is simple:

To configure your Edison to fetch packages from this repo, replace anything you have in /etc/opkg/base-feeds.conf with the following (other opkg config files don’t need any changes):

src/gz all http://repo.opkg.net/edison/repo/all src/gz edison http://repo.opkg.net/edison/repo/edison src/gz core2-32 http://repo.opkg.net/edison/repo/core2-32

Then we will need to setup the libraries:

# opkg install libaio1_0.3 libcrypto1.0.0 libssl1.0.0

Finally we can download Percona Server 5.6 and place it somewhere (use basedir in my.cnf to point to the installation path):

# wget https://www.percona.com/downloads/Percona-Server-5.6/Percona-Server-5.6.25-73.1/binary/tarball/Percona-Server-5.6.25-rel73.1-Linux.i686.ssl100.tar.gz

Please note that the latest Percona Server 5.6 depends on the Numa library and there is no such library for Yocto (does not make sense for Edison). So 5.6.25 is the latest Percona Server you can install here.

The simple (and rather useless) benchmark on Intel Edison:

root@edison:/usr/local/mysql# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 74 model name : Genuine Intel(R) CPU 4000 @ 500MHz ... mysql> SELECT BENCHMARK(10000000,ENCODE('hello','goodbye')); +-----------------------------------------------+ | BENCHMARK(10000000,ENCODE('hello','goodbye')) | +-----------------------------------------------+ | 0 | +-----------------------------------------------+ 1 row in set (18.77 sec)

Can MySQL make you toast?

The famous MySQL Bug#2, submitted 12 Sep 2002, states that “MySQL Connector/J doesn’t make toast”. With Intel Edison and Arduino compatible breakout board it is now trivial to fix this bug: not only MySQL Connector/J but also MySQL server itself can make you a toast! This can be done via UDF or, in MySQL 5.7, with Query Rewrite Plugins, so you can execute MySQL query:

mysql> make toast;

For the actual implementation you can either “hack” an existing toaster to interface with breakout board pins or use a Arduino compatible Robotic Arm. Ok, MySQL, make me toast!

The post Smallest MySQL Server in the World (under $60) which can even make you toast while fixing MySQL bug #2 appeared first on MySQL Performance Blog.

Amazon Aurora – Looking Deeper

November 16, 2015 - 1:26pm

Recently my colleague (by Percona) Yves Trudeau and colleague (by industry) Marco Tusa
published their materials on Amazon Aurora. Indeed, Amazon Aurora is a hot topic these days, and we have a stream of customer inquiries regarding this technology. I’ve decided to form my own opinion, and nothing is better than a personal, hands-on experience, which I am going to share.

The materials I will refer to:

Presentation [1] gives a good general initial overview. However, there is one statement the presenter made I’d like to highlight. It is “Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.”

This does not claim that Amazon Aurora is an open source database, but certainly the goal is to make Amazon look comparable to open source.

I would like to make clear that Amazon Aurora is not open source. Amazon Aurora is based on the Open Source MySQL Edition, but is not open source by itself; it is a proprietary, closed-source database.

By choosing Amazon Aurora you are fully dependent on Amazon for bug fixes or upgrades. In this regard you are locked-in to Amazon RDS. Though it is not the same lock-in as to a commercial database like Oracle or MS-SQL. You should still be able to relatively easily migrate your application to a community MySQL.

Amazon uses a GPLv2 hole, which allows it to not publish source code in a cloud model. How is Amazon Aurora different from Percona Server or MariaDB? Both of these projects are required to publish their sources. It comes to the distribution model. GPLv2 makes a restriction on a traditional distribution model: if you download software or receive a hard copy of software binaries, you also have rights to request corresponding source code. This is not the case with cloud computing: there you do not download anything, just launch an instance. GPLv2 does not make any restrictions for this case, so Amazon is in compliance with GPLv2.

Bug fixes

Speaking of bug fixes, Amazon Aurora exposes itself as “version: 5.6.10, version_comment: MySQL Community Server (GPL)”. MySQL 5.6.10 was released on 2013-02-05. That was 2.5 year ago. It is not clear if Aurora includes 2.5 years worth of bug fixes and just did not update the version, or if this is really binaries based on a 2.5 year old code base.

For example, let’s take the MySQL bug http://bugs.mysql.com/bug.php?id=70577, which was fixed in 5.6.15. I do not see the bug fix present in Amazon Aurora 5.6.10.

Another bug, http://bugs.mysql.com/bug.php?id=68041, which was fixed in MySQL 5.6.13, but is still present in Amazon Aurora.

What about InnoDB’s code base? The bug, http://bugs.mysql.com/bug.php?id=72548, with InnoDB fulltext search, was fixed in MySQL 5.6.20 (released more than a year ago, on 2014-07-31) and is still present in Amazon Aurora.
This leaves me with the impression that the general Aurora codebase was not updated recently.

Although it seems Amazon changed the innodb_version. Right now it is 1.2.10. A couple of weeks ago it was innodb_version: 1.2.8

My question here, does Amazon have the ability to keep up with MySQL bug fixes and regularly update their software? So far it does not seem so.

Amazon Aurora architecture:

My understanding of Amazon Aurora in simplified form is the following:

That is, all Aurora instances share the same storage, and makes it very easy to start new “Reader” instances over the same data.

Communication between Writer (only 1 Writer allowed) and Readers is done by transferring records similar to InnoDB redo log records. And this really limits how many Writers you can have (only one). I do not believe it is possible to implement a proper transactional coordinator between two Writers based on redo records.

A similar way is used to update data stored on shared storage: Aurora just applies redo log records to data.

So, updating data this way, Aurora is able to:

  • Avoid data writes to storage
  • Avoid binary logs
  • Avoid InnoDB transactional logs
  • Disable doublewrites
  • Disable InnoDB checksums

Aurora makes claims about significant performance improvements, but we need to keep in mind that EACH WRITE goes directly to storage and it has to be acknowledged by 4 out of 6 copies (synchronous writes). Aurora Writer works in some kind of “write through” mode – this is needed, as I understand, to make Reader see changes immediately. I expect it also comes with a performance penalty, so whether the performance gain is bigger than the performance penalty will depend on the workload.

Now, I should give credit to the Amazon engineering team for a proper implementation of shipping and applying transactional redo logs. It must have required a lot of work to change the InnoDB engine, and as we see it took probably a year (from the announcement to the general availability) for Amazon to stabilize Aurora. Too bad Amazon keeps their changes closed, even when the main codebase is an open source database.

Work with transactional isolation

Distributed computing is especially complicated from a transactional processing point of view (see for example a story), so I also wanted to check how Amazon Aurora handles transactional isolation levels on different instances.
Fortunately for Aurora, they have an easy way out, allowing only read statements on Readers, but we still check isolation in some cases.

It seems that the only TRANSACTION ISOLATION level supported is REPEATABLE-READ. When I try to change to SERIALIZABLE or READ-COMMITTED, Aurora accepts this without an error, but silently ignores it. tx_isolation stays REPEATABLE-READ.

mysql> SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE; Query OK, 0 rows affected (0.00 sec) mysql> SELECT @@GLOBAL.tx_isolation, @@tx_isolation; +-----------------------+-----------------+ | @@GLOBAL.tx_isolation | @@tx_isolation | +-----------------------+-----------------+ | REPEATABLE-READ | REPEATABLE-READ | +-----------------------+-----------------+ 1 row in set (0.00 sec) mysql> SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED; Query OK, 0 rows affected (0.00 sec) mysql> SELECT @@GLOBAL.tx_isolation, @@tx_isolation; +-----------------------+-----------------+ | @@GLOBAL.tx_isolation | @@tx_isolation | +-----------------------+-----------------+ | REPEATABLE-READ | REPEATABLE-READ | +-----------------------+-----------------+ 1 row in set (0.00 sec)

Actually, there I face another worrisome behaviour: silent changes in Aurora without notification.

I am pretty sure, that when a couple of weeks ago I tried to use SERIALIZABLE level, it failed with an error: “SERIALIZABLE is not supported”. Now it just silently ignores it. So I assume Amazon continues to make changes. I guess this is one of the changes from innodb_version 1.2.8 to 1.2.10. Is there a full Changelog we can see?

The lack of SERIALIZABLE level is not a big problem in my opinion. In the end, we know that Percona XtraDB Cluster does not support it either.

But not being able to use READ-COMMITTED might be an issue for some applications; you need to check if your application is working properly with READ-COMMITTED silently set as REPEATABLE-READ.

I found another unusual behaviour between the reader and writer when I tried to execute ALTER TABLE statement on the Writer (this is another hard area for clusters: to keep a data dictionary synchronized).
Scenario:
READER:
execute long SELECT col1 FROM tab1
WRITER:
while SELECT running, execute ALTER TABLE tab1 ADD COLUMN col2 ;
Effect: SELECT on READER fails immediately with an error: “ERROR 1866 (HY000): Query execution was interrupted on a read-only database because of a metadata change on the master”

So there again I think Aurora does its best given architectural limitations and one-directional communication: it just chooses to kill read statements on Readers.

Query cache
I should highlight improvements to query_cache as a good enhancement. Query cache is enabled by default and Amazon fixed the major issue with MySQL query cache, which is when update queries may stall for a long time waiting on invalidation of query cache entries. This problem does not exist in Amazon Aurora. Also Amazon adjusts query_cache to work properly on Writer-Reader pair. Query_cache on the Reader gets invalidated when data is changed on Writer.

Config review

Let’s make a quick review of the MySQL configuration that Aurora proposes.

These are variables which are set by default and you can’t change:

| innodb_change_buffering | none | | innodb_checksum_algorithm | none | | innodb_checksums | OFF | | innodb_doublewrite | OFF | | innodb_flush_log_at_trx_commit | 1 |

Disabled doublewrite and checksums is not a surprise, I mentioned this above.
Also innodb_flush_log_at_trx_commit is strictly set to 1, I think it is also related to how Aurora deals with InnoDB redo log records.

Disabled innodb_change_buffering is also interesting, and it can’t be good for performance. I guess Amazon had to disable any buffering of updates so changes are immediately written to the shared storage.

Diagnostic

Traditionally RDS does not provide you with good access to system metrics like vmstat and iostat, and it makes troubleshooting quite challenging. MySQL slow-log is also not available, so it leaves us with only PERFORMANCE_SCHEMA (which is OFF by default)

One good thing is that we can access SHOW GLOBAL STATUS, and this can also be used for monitoring software.

Interesting status after a heavy write load, these counters always stay at 0:

| Innodb_buffer_pool_pages_dirty | 0 | | Innodb_buffer_pool_bytes_dirty | 0 | | Innodb_buffer_pool_pages_flushed | 0 | | Innodb_pages_written | 0 |

I think this supports my guess about the architecture, that Aurora does not keep any changed pages in memory and just directly writes everything to storage.

Final thoughts

I will follow up this post with my benchmarks, as I expect the proposed architecture comes with serious performance penalties, although removing binary logs and eliminating doublewrites should show a positive improvement.

In general I think Amazon Aurora is a quite advanced proprietary version of MySQL. It is not revolutionary, however, and indeed not “reimagined relational databases” as Amazon presents it. This technology does not address a problem with scaling writes, sharding and does not handle cross-nodes transactions.

As for other shortcomings, I see there is no public bug database, no detailed user documentation and Aurora’s code is based on old MySQL source code.

The post Amazon Aurora – Looking Deeper appeared first on MySQL Performance Blog.

pt-online-schema-change and innodb_stats_persistent on MySQL 5.6

November 13, 2015 - 8:46am

Recently we released an updated version of our Percona Toolkit with an important update for pt-online-schema-change if you use MySQL 5.6. A new parameter has been added, analyze-before-swap. What does it do? Let me explain it in more detail.

A bug not a bug

A customer contacted us because pt-online-schema-change caused hundred of queries to get stuck after the ALTER was done. All those queries were trying read from the altered table but for some reason the queries never finished. Of course, it caused downtime. The reason behind this is this “bug not a bug”:

http://bugs.mysql.com/bug.php?id=78289

As a summary, if you are running 5.6 with persistent stats enabled (which it is by default), the optimizer in some cases could choose a bad execution plan because it has incorrect statistics to make a good decision. Even simple queries with a WHERE condition trying to find a value on the PK could switch to a full table scan, because the optimizer has no idea there is a PK.

There are two ways to force index calculations:

  • Wait until the background thread recalculates the statistics of the new table. This could take longer than expected if the load on the server is really high.
  • Run analyze table.

So, –analyze-before-swap actually does that. It runs ANALYZE on the new table before the table swap is done. That means that ANALYZE on the new table (before rename) does NOT affect queries of the table we are altering, but could affect the triggers’ operations for the new table.

Analyze table is not the best solution either. Check out this blog post from Peter: https://www.percona.com/blog/2008/09/02/beware-of-running-analyze-in-production/

So in case you want to disable this feature, you would need to add no-analyze-before-swap. If pt-online-schema change finds that the version running is 5.6 and that persistent stats are enabled, then the configuration option will be enabled by default.

The post pt-online-schema-change and innodb_stats_persistent on MySQL 5.6 appeared first on MySQL Performance Blog.

Logical MySQL backup tool Mydumper 0.9.1 now available

November 12, 2015 - 7:32am

The new Mydumper 0.9.1 version, which includes many new features and bug fixes, is now available.  You can download the code from here.

A significant change included in this version now enables Mydumper to handle all schema objects!!  So there is no longer a dependency on using mysqldump to ensure complex schemas are backed up alongside the data.

Let’s review some of the new features:

Full schema support for Mydumper/Myloader

Mydumper now takes care of backing up the schema, including Views and Merged tables. As a result, we now have these new associated options:

-d, --no-data Do not dump table data
-G, --triggers Dump triggers
-E, --events Dump events
-R, --routines Dump stored procedures and functions

These options are not enabled by default to keep backward compatibility with actual mixed solutions using Mysqldump for DDLs.

Locking reduce options

--trx-consistency-only      Transactional consistency only

You can think on this as --single-transaction for mysqldump, but still with binlog position. Obviously this position only applies to transactional tables (TokuDB included).  One of the advantages of using this option is that the global read lock is only held for the threads coordination, so it’s released as soon as the transactions are started.

GTIDs and Multisource Slave 

GTIDs are now recorded on the metadata file.  Also Mydumper is now able to detect a multisource slave (MariaDB 10.1.x) and will record all the slaves coordinates.

Myloader single database restore

Until now the only option was to copy the database files to a different directory and restore from it. However, we now have a new option available:

-s, --source-db                   Database to restore

It can be used also in combination with -B, --database to restore to a different database name.

Full list of Bug Fixes:

#1431410 innodb stats tables
#1440403 *-post and *-triggers compressed files corrupt
#1470891 functions may be needed by SP and views
#1390437 segmentation fault against Percona MySQL 5.6.15-63.0
#1446280 Segmentation fault on Debian Wheezy
#1399715 Typo in –tables-list option in manpage
#1428608 missing -K option in mydumper manpage
#1440437 myloader: wrong database name in message when -B used
#1457091 tokudb detection doesn’t work
#1481747 Unable to compile r179 WITH_BINLOG=ON (undeclared ‘bj’)
#1507574 Assertion when broken mrg tables
#841651 dump view definitions
#1485688 make compile error myloader.c:209

 

The post Logical MySQL backup tool Mydumper 0.9.1 now available appeared first on MySQL Performance Blog.

Percona Toolkit 2.2.16 is now available

November 9, 2015 - 1:44pm

Percona is pleased to announce the availability of Percona Toolkit 2.2.16, released on November 9, 2015. Percona Toolkit is a collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

This release is the current GA (Generally Available) stable release in the 2.2 series. It includes new features and bug fixes as well as continued preparation for MySQL 5.7 compatibility. Full details are below. Downloads are available here and from the Percona Software Repositories.

New Features:

  • 1491261: When using MySQL 5.6 or later, and innodb_stats_persistent option is enabled (by default, it is enabled), then pt-online-schema-change will now run with the --analyze-before-swap option. This ensures that queries continue to use correct execution path, instead of switching to full table scan, which could cause possible downtime. If you do not want pt-online-schema-change to run ANALYZE on new tables before the swap, you can disable this behavior using the --no-analyze-before-swap option.
  • 1402051: pt-online-schema-change will now wait forever for slaves to be available and not be lagging. This ensures that the tool does not abort during faults and connection problems on slaves.
  • 1452895: pt-archiver now issues ‘keepalive’ queries during and after bulk insert/delete process that takes a long time. This keeps the connection alive even if the innodb_kill_idle_transaction variable is set to a low value.

Bug Fixes:

  • 1488685: The --filter option for pt-kill now works correctly.
  • 1494082: The pt-stalk tool no longer uses the -warn option when running find, because the option is not supported on FreeBSD.

Details of the release can be found in the release notes and the 2.2.16 milestone on Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

The post Percona Toolkit 2.2.16 is now available appeared first on MySQL Performance Blog.

Percona Server 5.6.27-75.0 is now available

November 5, 2015 - 12:45pm

Percona is glad to announce the release of Percona Server 5.6.27-75.0 on November 5, 2015. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.27, including all the bug fixes in it, Percona Server 5.6.27-75.0 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – and this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release can be found in the 5.6.27-75.0 milestone on Launchpad.

New Features:

  • Percona Server is now available for Ubuntu 15.10 (Wily).
  • TokuDB MTR tests have been integrated into Percona Server.
  • Linux thread ID is now available in the PROCESSLIST table.
  • Percona Server has now re-enabled savepoints in triggers and stored functions.
  • Variables innodb_buffer_pool_populate and numa_interleave have been mapped to the upstream implementation of the new innodb_numa_interleave option.

Bugs Fixed:

  • Fixed transactional inconsistency with rollback in TokuDB. Rolling back a large transaction with many inserts/updates/deletes could result in some of the changes being committed rather than rolled back. Bug fixed #1512455.
  • Variable tokudb_backup_exclude was not excluding files correctly. Bug fixed #1512457.
  • TokuDB could crash under load if transaction-isolation level READ-COMMITTED was used. Bug fixed #1501633.
  • TokuDB thread pool names were missing in the SHOW ENGINE tokudb STATUS which caused duplicate entries. Bug fixed #1512452.
  • Manipulating the innodb_track_redo_log_now variable dynamically would crash the server if it was started without innodb_track_changed_pages enabled. This variable is available on debug builds only. Bug fixed #1368530.
  • If the user had duplicate pid-file options in config files when running yum upgrade, the upgrade would stop with error because it would think it found the duplicate pid while it was the same pid specified twice. Bug fixed #1454917.
  • On some filesystems server would not start if XtraDB changed page tracking feature was enabled and innodb_flush_method variable was set to O_DIRECT. Bugs fixed #1500720 and #1498891.
  • When User Statistics are enabled, executing any statement of the SHOW family with non-empty result, would bump USER_STATISTICS.ROWS_FETCHED column values erroneously. Bug fixed #1510953.
  • A write operation with innodb_fake_changes enabled could cause a server assertion if it followed the pessimistic B-tree update path internally. Bug fixed #1192898.
  • An online DDL operation could have caused server crash with fake changes enabled. Bug fixed #1226532.
  • Fixed the conflicting meta packages between 5.1, 5.5, and 5.6 release series in Debian and Ubuntu distributions. percona-server-server and percona-server-client meta packages now point to the latest 5.6 release. Bug fixed #1292517.
  • INNODB_CHANGED_PAGES table was unavailable with non-default innodb_data_home_dir setting if the variable had a trailing slash. Bug fixed #1364315.
  • Changing innodb_fake_changes variable value in the middle of a transaction would have an immediate effect, that is, making part of the transaction run with fake changes enabled and the rest with fake changes disabled, resulting in a broken transaction. Fixed by making any innodb_fake_changes value changes becoming effective at the start of the next transaction instead of the next statement. Bug fixed #1395579.
  • UPDATE statement could crash the server with Support for Fake Changes enabled. Bug fixed #1395706.
  • Startup would fail due to a small hard-coded timeout value in the init script for the pid file to appear. This has been fixed by creating default file for Debian init script timeout parameters in etc/default/mysql. Bug fixed #1434022.
  • CMakeLists.txt for tokudb-backup-plugin was missing Valgrind dependency. Bug fixed #1494283.
  • Percona Server would fail to install on CentOS 7 if mariadb-devel package was already installed. Bug fixed #1499721.
  • Fixed suboptimal Support for Fake Changes handling in online ALTER storage engine API. Bug fixed #1204422.
  • The upstream bug #76627 was not fixed for the ALL_O_DIRECT case. Bug fixed #1500741.
  • Fixed multiple TokuDB clang build issues. Bug fixed #1512449.

Other bugs fixed: #1204443, #1384632, #1475117, #1512301, #1452397, #1160960, #1495965, and #1497942.

Release notes for Percona Server 5.6.27-75.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

The post Percona Server 5.6.27-75.0 is now available appeared first on MySQL Performance Blog.

Percona Server 5.5.46-37.5 is now available

November 5, 2015 - 12:44pm


Percona is glad to announce the release of Percona Server 5.5.46-37.5 on November 5, 2015. Based on MySQL 5.5.46, including all the bug fixes in it, Percona Server 5.5.46-37.5 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. Details of the release can be found in the 5.5.46-37.5 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

New Features:

  • Percona Server is now available for Ubuntu 15.10 (Wily).

Bugs Fixed:

  • Manipulating the innodb_track_redo_log_now variable dynamically would crash the server if it was started without innodb_track_changed_pages enabled. This variable is available on debug builds only. Bug fixed #1368530.
  • A potential crash in handling corrupted tables with innodb_corrupt_table_action warn or salvage values has been fixed. Bug fixed #1426610.
  • If the user had duplicate pid-file options in config files when running yum upgrade, the upgrade would stop with error because it would think it found the duplicate pid while it was the same pid specified twice. Bug fixed #1454917.
  • On some filesystems server would not start if XtraDB changed page tracking feature was enabled and innodb_flush_method variable was set to O_DIRECT. Bugs fixed #1500720 and #1498891.
  • When User Statistics are enabled, executing any statement of the SHOW family with non-empty result, would bump USER_STATISTICS ROWS_FETCHED column values erroneously. Bug fixed #1510953.
  • Fixed the conflicting meta packages between 5.1, 5.5, and 5.6 release series in Debian and Ubuntu distributions. percona-server-server and percona-server-client meta packages now point to the latest 5.6 release. Bug fixed #1292517.
  • INNODB_CHANGED_PAGES table was unavailable with non-default innodb_data_home_dir setting if the variable had a trailing slash. Bug fixed #1364315.
  • UPDATE statement could crash the server with Support for Fake Changes enabled. Bug fixed #1395706.
  • Changing innodb_fake_changes variable value in the middle of a transaction would have an immediate effect, that is, making part of the transaction run with fake changes enabled and the rest with fake changes disabled, resulting in a broken transaction. Fixed by making any innodb_fake_changes value changes becoming effective at the start of the next transaction instead of the next statement. Bug fixed #1395579.
  • Startup would fail due to a small hard-coded timeout value in the init script for the pid file to appear. This has been fixed by creating default file for Debian init script timeout parameters in etc/default/mysql. Bug fixed #1434022.
  • Percona Server would fail to install on CentOS 7 if mariadb-devel package was already installed. Bug fixed #1499721.
  • The upstream bug #76627 was not fixed for the ALL_O_DIRECT case. Bug fixed #1500741.

Other bugs fixed: #1512301, #1160960, and #1497942.

Release notes for Percona Server 5.5.46-37.5 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.46-37.5 is now available appeared first on MySQL Performance Blog.

MySQL 5.7: CHANGE REPLICATION FILTER online

November 4, 2015 - 5:16pm

MySQL 5.7 has a lot of enhancements and new features. I summarized this list previously in this blog post.

Adding replication filters online is one of the MySQL 5.7 features described in this manual. However, I will describe and summarize a few examples in this blog post.

Filtering replication events is also known as Partial Replication. Partial replication can be done from the master or slave. Filtering events on the master server via binlog-do-db and binlog-ignore-db is not a good idea, as explained in this post. However, if you really need it, partial replication on the slave is a better option. Our CEO Peter Zaitsev wrote a detailed blog post some time back on filtered MySQL replication that you may find useful.

Partial replication works differently for statement-based and row-based replication. You can find details in the manual and on this manual page. It is useful to know how MySQL evaluates partial replication rules.

Prior to MySQL 5.7, adding/changing replication rules required bouncing the MySQL server. In MySQL 5.7 adding/changing replication filter rules becomes an online operation without restarting MySQL server, using the CHANGE REPLICATION FILTER command. Let me share a few examples:

Currently, the slave server runs without filtered replication and this can be verified by the slave status and the last five variables. Replicate_* has an empty value, meaning no replication filter rules are set.

mysql> show slave statusG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000002 Read_Master_Log_Pos: 351 Relay_Log_File: centos59-relay-bin.000003 Relay_Log_Pos: 566 Relay_Master_Log_File: master-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: .

The master database server contains db1-db4. Let’s replicate only db1 and db2 out of the four databases. This can be done with the help of replicate-do-db option.

mysql> CHANGE REPLICATION FILTER REPLICATE_DO_DB = (db1, db2); ERROR 3017 (HY000): This operation cannot be performed with a running slave sql thread; run STOP SLAVE SQL_THREAD first

This error is CHANGE REPLICATION FILTER is not supported on a running slave. We have to stop the SQL slave thread first and re-run the command to set replicate-do-db option. Checking the slave status confirmed that the slave now only replicates db1 and db2 out of the four databases from the master.

mysql> STOP SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> CHANGE REPLICATION FILTER REPLICATE_DO_DB = (db1, db2); Query OK, 0 rows affected (0.00 sec) mysql> show slave statusG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000002 Read_Master_Log_Pos: 505 Relay_Log_File: centos59-relay-bin.000003 Relay_Log_Pos: 720 Relay_Master_Log_File: master-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: db1,db2 Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: .

In order to remove that filter, you need an empty value for the filter name; i.e., replicate-do-db as below. The slave status verified that there are no replication filters set against the replicate-do-db variable.

mysql> STOP SLAVE SQL_THREAD; Query OK, 0 rows affected (0.03 sec) mysql> CHANGE REPLICATION FILTER REPLICATE_DO_DB = (); Query OK, 0 rows affected (0.00 sec) mysql> START SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> SHOW SLAVE STATUSG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000002 Read_Master_Log_Pos: 1629 Relay_Log_File: centos59-relay-bin.000003 Relay_Log_Pos: 1844 Relay_Master_Log_File: master-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: .

Moreover, multiple, different replication filters can be set in one command and should be separated with a comma as below:

mysql> STOP SLAVE SQL_THREAD; Query OK, 0 rows affected (0.03 sec) mysql> CHANGE REPLICATION FILTER REPLICATE_WILD_DO_TABLE = ('db1.db1_new%'), REPLICATE_WILD_IGNORE_TABLE = ('db1.db1_old%'); mysql> START SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> SHOW SLAVE STATUSG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 448 Relay_Log_File: centos59-relay-bin.000006 Relay_Log_Pos: 663 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: db1.db1_new% Replicate_Wild_Ignore_Table: db1.db1_old1% .

The slave status verifies that there are a couple of replication filters set where db1.db1_new replicates binary log events to slave, which ignores replication events on the slave for db1.db1_old table(s) as per Replicate_Wild_Ignore_Table filter. Also, if the database or table name doesn’t contain any special characters, then it’s not necessary to be quoted as a value for the filter. However, Replicate_Wild_Do_Table and Replicate_Wild_Ignore_Table are string expressions and may contain wild cards, so they must be quoted.

With the CHANGE REPLICATION FILTER command, you can’t set the same filtering rule multiple times, as opposed to behavior where you can set multiple filters for the same rule in my.cnf file by specifying it multiple times. With the CHANGE REPLICATION FILTER command, if you try to set multiple filters for the same rule then only the last rule will be activated and all above rules will be ignored as illustrated in the following example:

mysql> SELECT * FROM db1.db1_old; Empty set (0.00 sec) mysql> STOP SLAVE SQL_THREAD; Query OK, 0 rows affected (0.03 sec) mysql> CHANGE REPLICATION FILTER REPLICATE_WILD_DO_TABLE = ('db2.db2_tbl1%'), REPLICATE_WILD_DO_TABLE = ('db2.db2_tbl2%'); mysql> START SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> SHOW SLAVE STATUSG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 980 Relay_Log_File: centos59-relay-bin.000006 Relay_Log_Pos: 1195 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: db2.db2_tbl2% Replicate_Wild_Ignore_Table: .

As you can see, db2.db2_tbl1 table is ignored and only the last rule for db2.db2_tbl2 table is activated.
As I mentioned before, to unset filters of any given type you need to set that particular filter to an empty value. The below example will unset Replicate_Wild_Do_Table filter.

mysql> CHANGE REPLICATION FILTER REPLICATE_WILD_DO_TABLE = ();

However, you may set multiple filters in one command by separating each rule with a comma as in the previous example with Replicate_Do_DB. Let’s set multiple rules for the Replicate_Wild_Do_Table option via the CHANGE REPLICATION FILTER command.

mysql> STOP SLAVE SQL_THREAD; Query OK, 0 rows affected (0.03 sec) mysql> CHANGE REPLICATION FILTER REPLICATE_WILD_DO_TABLE = ('db2.db2_tbl1%','db2.db2_tbl2%'); Query OK, 0 rows affected (0.00 sec) mysql> START SLAVE SQL_THREAD; Query OK, 0 rows affected (0.00 sec) mysql> SHOW SLAVE STATUSG *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.0.130 Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000008 Read_Master_Log_Pos: 154 Relay_Log_File: centos59-relay-bin.000013 Relay_Log_Pos: 369 Relay_Master_Log_File: master-bin.000008 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: db2.db2_tbl1%,db2.db2_tbl2% Replicate_Wild_Ignore_Table: Last_Errno: 0 . .

Conclusion:
Partial replication is not a great solution in most cases. All of the replicate options replicate-do-table, replicate-ignore-table, replicate-wild-do-table and replicate-wild-ignore-table work differently. You need to use the default database in order to work filtering normally and it behaves differently with a different binlog format. Filters other than replicate-wild-do-table and replicate-wild-ignore-table might not work as expected and events with wild% filters stored procedures and stored functions may be inconsistent.

The post MySQL 5.7: CHANGE REPLICATION FILTER online appeared first on MySQL Performance Blog.

A first look at RDS Aurora

November 2, 2015 - 12:23pm

Recently, I happened to have an onsite engagement and the goal of the engagement was to move a database service to RDS Aurora. Like probably most of you, I knew the service by name but I couldn’t say much about it, so, I Googled, I listened to talks and I read about it. Now that my onsite engagement is over, here’s my first impression of Aurora.

First, let’s describe the service itself. It is part of RDS and, at first glance, very similar to a regular RDS instance. In order to setup an Aurora instance, you go to the RDS console and you either launch a new instance choosing Aurora as type or you create a snapshot of a RDS 5.6 instance and migrate it to Aurora. While with a regular MySQL RDS instance you can create slaves, with Aurora you can add reader nodes to an existing cluster. An Aurora cluster minimally consists of a writer node but you can add up to 15 reader nodes (only one writer though). It is at the storage level that things become interesting. Aurora doesn’t rely on a filesystem type storage, at least not from a database standpoint, it has its own special storage service that is replicated locally and to two other AZ automatically for a total of 6 copies. Furthermore, you pay only for what you use and the storage grows/shrinks automatically in increments of 10 GB, which is pretty cool. You can have up to 64 TB in an Aurora cluster.

Now, all that is fine, but what are the benefits of using Aurora? I must say I barely used Aurora; one week is not a field proven experience. These are claims by Amazon, but, as we will discuss, there are some good arguments in favor of these claims.

The first claim is that the write capacity is increased by up to 4x. So, even if only a single instance is used as writer in Aurora, you get up to 400% the write capacity of a normal MySQL instance. That’s quite huge and amazing, but it basically means replication is asynchronous at the storage level, at least for the multi-AZ part since the latency would be a performance killer. Locally Aurora uses a quorum-based approach with the storage nodes. Given that the object store is a separate service with its own high availability configuration, that is a reasonable trade-off. For example, the clustering solutions with Galera like Percona XtraDB Cluster typically lowers the write capacity since all nodes must synchronize on commit. Other claims are that the readers performance is unaffected by the clustering and that the readers have almost no lag with the writer. Furthermore, as if that is not enough, readers can’t diverge from the master. Finally, since there’s no lag, any readers can replace the writer very quickly, so in terms of failover, all is right.

That seems almost too good to be true; how can it be possible? I happen to be interested in object stores, Ceph especially, and I was toying with the idea of using Ceph to store InnoDB pages. It appears that the Amazon team did a super great job at putting an object store under InnoDB and they went way further than what I was thinking. Here, I may be speculating a bit and I would be happy to be found wrong. The writer never writes dirty pages back to the store… it only writes fragments of InnoDB log to the object store as objects, one per transaction, and notifies the readers of the set of pages that have been updated by this fragment log object. Just have a look at the show global status of an Aurora instance and you’ll see what I mean… Said otherwise, it is like having an infinitely large set of InnoDB log files; you can’t reach the max checkpoint age. Also, if the object store supports atomic operations, there’s no need for the double-write buffer, a high source of contention in MySQL. Just those two aspects are enough, in my opinion, to explain the up to 4x performance claim for the write capacity, but also considering the amount of writes and the log files are a kind of binary diff, that’s usually much less stuff to write than whole pages.

Something is needed to remove the fragment log objects, since over time, the accumulation of these log objects and the need to apply them would impact performance, a phenomenon called log amplification. With Aurora, that seems to be handled at the storage level and the storage system is wise enough to know that a requested page is dirty and apply the log fragments before sending it back to the reader. The shared object store can also explain why the readers have almost no lag and why they can’t diverge. The only lag the readers can have is the notification time which has to be short if within the same AZ.

So, how does Aurora compares to a technology like Galera?

Pros:

  • Higher write capacity, writer is unaffected by the other nodes
  • Simpler logic, no need for certification
  • No need for an SST to provision a new node
  • Can’t diverge
  • Scale iops tremendously
  • Fast failover
  • No need for quorum (handled by the object store)
  • Simple to deploy

Cons:

  • Likely asynchronous at the storage level
  • Only one node is writable
  • Not open source

Aurora is a mind shift in term of database and a jewel in the hands of Amazon. Openstack currently has no database service that can offer similar features. I wonder how hard it would be to produce an equivalent solution using well known opensource components like Ceph for the object store and corosync or zookeeper or zeroMQ or else for the communication layer. Also, would there be a use case?

The post A first look at RDS Aurora appeared first on MySQL Performance Blog.

MySQL and Trojan.Chikdos.A

October 29, 2015 - 9:23am

Symantec published a blog post yesterday regarding MySQL and the Trojan.Chikdos.A as can be seen here

The Symantec post gives detail into the behavior of the Trojan and it’s effects on the Windows system registry, yet gives little detail as to how the required first stage (namely a malicious UDF) is injected, citing:

“In the latest Chikdos campaign that we observed, the attackers likely used an automated scanner or possibly a worm to compromise MySQL servers and install the UDF.”

I’m going to give my thoughts on the pre-requisites to infection here.

  1. The MySQL server has poor network isolation (i.e. is likely accessible from everywhere e.g. shodan.io examples)
  2. The MySQL user has poor (read: overly permissive) grants e.g. “GRANT ALL”
  3. The MySQL @@GLOBAL.plugin_dir is writeable (likely poor file ACL & no D.A.C such as SELinux / AppArmor)

This is pretty much the exact same method I discussed at PLUK13 and updated to be discussed again at PLMCE14. There are YouTube videos embedded in each showing the attack aimed at a Linux deployment. The above 3 pre-requisite points, however, should still hold true on Windows.

So what is the real takeaway here?

  1. Ensure you properly isolate your MySQL service on the network, allowing access from only known hosts which require access.
    1. There really is no reason at all to have the service accessible from everywhere on the Internet
  2. Ensure your user grants follow the Principle of Least Privilege; aka, only grant the required access for the user to do the job they are supposed to be doing.
  3. Never EVER chmod 777 / setenforce 0

I can see that last point raising some arguments; please keep in mind we’re talking about the security of your MySQL data here. Simply turning off security measures “to make it work” is a recipe for disaster.

 

 

The post MySQL and Trojan.Chikdos.A appeared first on MySQL Performance Blog.

MySQL 5.7 first impressions on group-replication

October 29, 2015 - 6:55am

During the last few weeks I’ve been testing and playing a bit with the new group-replication plugin available for MySQL 5.7. Before continuing I’d like to clarify some aspects: the plugin is only available in labs and is not yet ready for production. The current version is 0.6. I used 5.7.9 GA running in a Vagrant 3 nodes cluster with CentOS 7.
As an additional note, I’ve tested previous version of plugin 0.5 against 5.7.8.rc and there are some good changes, so I recommend starting with the GA version.

For the matter of my tests I’ve followed instructions from this post. It’s not as straightforward as it looks; there were some issues that needed to be handled, but I finally managed to get a consistent 3 nodes cluster running:

mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec)

My impressions about installation

1- The previous version relies only on Corosync for group communication, meaning that you need to install and configure an extra piece of software. For those who aren’t very familiar with it (like me), it can be a bit tricky and complex. From version 0.6.0 for MySQL 5.7.9, a new communication engine called XCom was added. This is now the default communication framework which is included in the plugin, so no extra pieces of software are needed.

2- When you initialize the MySQL database (mysqld –initialize function now replaces mysql_install_db script) you need to disable binary logging in the configuration file, otherwise information such as ‘create database mysql’ will be pushed to binary logs and cause issues with nodes joining the cluster due errors like:
2015-10-21T20:18:52.059231Z 8 [Warning] Slave: Can't create database 'mysql'; database exists Error_code: 1007

3- In group replication there isn’t a concept like SST (State Snapshot Transfer) which basically drops and recreates the datadir if it finds data discrepancies. With group replication you may end up having different datasets and replication will continue working (more on this later in the post).

4- For Incremental State Transfer (a.k.a. IST in Galera), group replication trusts in binary logs present in any of the potential donors (at the moment the selection of a donor is done randomly). So, if a node is disconnected, when it comes back online, it requests binary logs from the donor using the same IO thread as regular replication. The problem here is that if the binary log was purged on the donor then the joiner can’t be recovered and a full rebuild is needed. This is a similar approach to the gcache in Galera, but when gcache is not able to provide transactions needed for IST, an SST is performed instead. Group replication can’t do this (yet?).

These are some of the installation issues I faced. Now that we have the cluster running, what works? Well let’s try some samples.

Simple write tests
I tried running simple write operations like a few inserts, create tables and so on using sysbench like this:

sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run

And checked status in the other nodes. It does what it is supposed to do; data and records are found in the rest of nodes just like this:
Node1:

mysql> select * from sbtest1 order by 1 limit 4; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 8 | 12410 | 05879259692-02024081996-76022818191-82933803603-81845875017-31928300264-16934042125-67052432228-92123768050-95121478647 | 79362588344-09017007031-35411714211-52054317597-26283585383 | | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | | 22 | 12544 | 57481185690-89398636500-16888148413-67987678267-15604944838-94210794401-18107184012-91338377776-83386272438-09451188763 | 35227182905-15234265621-59793845249-15413569710-23749555118 | | 29 | 12436 | 29279855805-99348203463-85191104223-39587263726-81794340135-73817557808-54578801760-64404111877-55434439976-37212880746 | 59222897263-22759023974-22020489960-93434521232-77981152534 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 4 rows in set (0.08 sec)

Node2

mysql> select * from sbtest1 order by 1 limit 4; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 8 | 12410 | 05879259692-02024081996-76022818191-82933803603-81845875017-31928300264-16934042125-67052432228-92123768050-95121478647 | 79362588344-09017007031-35411714211-52054317597-26283585383 | | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | | 22 | 12544 | 57481185690-89398636500-16888148413-67987678267-15604944838-94210794401-18107184012-91338377776-83386272438-09451188763 | 35227182905-15234265621-59793845249-15413569710-23749555118 | | 29 | 12436 | 29279855805-99348203463-85191104223-39587263726-81794340135-73817557808-54578801760-64404111877-55434439976-37212880746 | 59222897263-22759023974-22020489960-93434521232-77981152534 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 4 rows in set (0.08 sec)

Well we expected this so, yay, we are fine.

What about trying to write in 2 nodes at the same time? This should fire things like conflict resolution during certification; in a nutshell, if we expect to use group replication to write in multiple nodes at the same time, we need a way to resolve conflicts with the data. These are most common in PK/UK violations; in other words 2 transactions trying to insert the same record/id. This is not recommended because it is not an approach we can use to scale up writes (same as Galera) but it’s still possible to do.

An easier way to test is to run sysbench in more than one member of a cluster and wait for a failure. As expected, it does what it is supposed to do:
Node1

[root@node1 data]# sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run |grep tps [ 1s] threads: 8, tps: 0.00, reads: 0.00, writes: 1416.75, response time: 25.65ms (95%), errors: 0.00, reconnects: 0.00 [ 2s] threads: 8, tps: 0.00, reads: 0.00, writes: 1893.78, response time: 20.94ms (95%), errors: 0.00, reconnects: 0.00 [ 3s] threads: 8, tps: 0.00, reads: 0.00, writes: 1421.19, response time: 28.44ms (95%), errors: 0.00, reconnects: 0.00 [ 4s] threads: 8, tps: 0.00, reads: 0.00, writes: 1397.83, response time: 34.92ms (95%), errors: 0.00, reconnects: 0.00 [ 5s] threads: 8, tps: 0.00, reads: 0.00, writes: 1734.31, response time: 22.75ms (95%), errors: 0.00, reconnects: 0.00

Node2

[root@node2 vagrant]# sysbench --test=/vagrant/sysb/doc/sysbench/tests/db/update_index.lua --mysql-user=root --mysql-db=test --mysql-socket=/data/mysql.sock --oltp-table-size=25000 --oltp-tables-count=8 --num-threads=8 --report-interval=1 --max-requests=0 run WARNING: Both max-requests and max-time are 0, running endless test sysbench 0.5: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 8 Report intermediate results every 1 second(s) Random number generator seed is 0 and will be ignored Threads started! ALERT: failed to execute MySQL query: `UPDATE sbtest6 SET k=k+1 WHERE id=12608`: ALERT: Error 1180 Got error 149 during COMMIT FATAL: failed to execute function `event': (null) ALERT: failed to execute MySQL query: `UPDATE sbtest6 SET k=k+1 WHERE id=12468`: ALERT: Error 1180 Got error 149 during COMMIT FATAL: failed to execute function `event': (null)

Node2 eventually failed, but what happened? Let’s check the error log to see what’s reporting:

2015-10-27T17:12:33.894759Z 3 [Note] InnoDB: Blocking transaction: ID: 223804 - Blocked transaction ID: 223820 - MySQL thread id 59, OS thread handle 139838350866176, query id 197913 localhost root query end UPDATE sbtest5 SET k=k+1 WHERE id=12510

At the commit stage there was a conflict with an already committed transaction in Node1, so it forced a failure and a rollback of the operation. So far, so good.

What about a node going down?
One of the tests I ran was to kill one of the nodes during the operations to see if it resumes replication properly when back to life. For this we need to set up some variables in the configuration file as follows:

loose-group_replication_group_name="8a94f357-aab4-11df-86ab-c80aa9429562" loose-group_replication_start_on_boot=1 loose-group_replication_recovery_user='rpl_user' loose-group_replication_recovery_password='rpl_pass'

Note: This is interesting, that the replication credentials are not saved into a table (as is done with slave_master_info in regular replication). I guess this is part of a to do section, but it’s something to keep in mind since this implies a security risk.

Back to our test. I ran the regular sysbench command in one of my nodes and then went to node2 and killed mysql daemon. After the regular crash recovery messages we can see:

[root@node2 data]# killall mysqld [root@node2 data]# tail -500 error.log 2015-10-27T17:15:26.460674Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active. 2015-10-27T17:15:26.460711Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active. 2015-10-27T17:15:26.461001Z 0 [Note] InnoDB: Waiting for purge to start 2015-10-27T17:15:26.514015Z 0 [Note] InnoDB: 5.7.9 started; log sequence number 77370253 2015-10-27T17:15:26.515417Z 0 [Note] Plugin 'FEDERATED' is disabled. 2015-10-27T17:15:26.525466Z 0 [Note] InnoDB: not started 2015-10-27T17:15:26.525914Z 0 [Note] InnoDB: Loading buffer pool(s) from /data/ib_buffer_pool [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configuring Xcom group: XCom Group ID=1827295128 Name=8a94f357-aab4-11df-86ab-c80aa9429562 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Peer Nodes: 192.168.70.2:10300 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Peer Nodes: 192.168.70.3:10300 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Total number of peers: 2 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Local Node: 192.168.70.3:10300 [XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Bootstrap: false 2015-10-27T17:15:26.723392Z 0 [Note] InnoDB: Buffer pool(s) load completed at 151027 17:15:26 2015-10-27T17:15:27.135089Z 0 [Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key 2015-10-27T17:15:27.136449Z 0 [Note] Server hostname (bind-address): '*'; port: 3306 2015-10-27T17:15:27.145198Z 0 [Note] IPv6 is available. 2015-10-27T17:15:27.145247Z 0 [Note] - '::' resolves to '::'; 2015-10-27T17:15:27.145265Z 0 [Note] Server socket created on IP: '::'. 2015-10-27T17:15:27.171615Z 0 [Warning] 'user' entry 'root@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.171711Z 0 [Warning] 'user' entry 'mysql.sys@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.172447Z 0 [Warning] 'db' entry 'sys mysql.sys@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.173089Z 0 [Warning] 'proxies_priv' entry '@ root@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.192881Z 0 [Warning] 'tables_priv' entry 'sys_config mysql.sys@localhost' ignored in --skip-name-resolve mode. 2015-10-27T17:15:27.205764Z 0 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=node2-relay-bin' to avoid this problem. 2015-10-27T17:15:27.676222Z 1 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2015-10-27T17:15:27.685374Z 3 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './node2-relay-bin-group_replication_applier.000002' position: 51793711 2015-10-27T17:15:27.685985Z 0 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!' 2015-10-27T17:15:27.686009Z 0 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7' 2015-10-27T17:15:27.686017Z 0 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 2' [XCOM BINDING DEBUG] ::join() connecting to 192.168.70.3 10300 state 0 action xa_init connecting to 192.168.70.3 10300 connected to 192.168.70.3 10300 connecting to 192.168.70.3 10300 connected to 192.168.70.3 10300 ... [XCOM BINDING DEBUG] ::join():: I am NOT the boot node. [XCOM BINDING DEBUG] ::join():: xcom_client_open_connection to 192.168.70.2:10300 connecting to 192.168.70.2 10300 connected to 192.168.70.2 10300 [XCOM BINDING DEBUG] ::join():: Calling xcom_client_add_node cli_err 0 state 3489 action xa_snapshot new state x_recover state 3505 action xa_complete new state x_run get_nodeno(get_site_def()) = 2 task_now() = 1445966128.920615 n = 0 median_time() = 1.000000 executed_msg={84bca5ce 12385 1} [XCOM BINDING DEBUG] ::join():: GCS_OK [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. 2015-10-27T17:15:28.926806Z 0 [Note] Event Scheduler: Loaded 0 events 2015-10-27T17:15:28.928485Z 0 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.7.9-log' socket: '/data/mysql.sock' port: 3306 MySQL Community Server (GPL) [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. [XCOM_BINDING_DEBUG] : Rejecting this message. I'm not in a view yet. .... [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Install new view [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data 2015-10-27T17:15:30.084101Z 0 [Note] Plugin group_replication reported: 'Starting group replication recovery with view_id 46750407:7' [XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 1382963399 2015-10-27T17:15:30.091414Z 5 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/86400' 2015-10-27T17:15:30.107078Z 5 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='node3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2015-10-27T17:15:30.117379Z 5 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor e5263489-7cb7-11e5-a8ee-0800275ff74d at node3 port: 3306.' 2015-10-27T17:15:30.118109Z 6 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2015-10-27T17:15:30.130001Z 7 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './node2-relay-bin-group_replication_recovery.000001' position: 4 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12456 message_id.node= 0 [XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 684195134 2015-10-27T17:15:30.169817Z 6 [ERROR] Slave I/O for channel 'group_replication_recovery': error connecting to master 'rpl_user@node3:3306' - retry-time: 60 retries: 1, Error_code: 1130 2015-10-27T17:15:30.169856Z 6 [Note] Slave I/O thread for channel 'group_replication_recovery' killed while connecting to master 2015-10-27T17:15:30.169862Z 6 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4 2015-10-27T17:15:30.174955Z 5 [ERROR] Plugin group_replication reported: 'There was an error when connecting to the donor server. Check group replication recovery's connection credentials.' 2015-10-27T17:15:30.175573Z 5 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/86400' 2015-10-27T17:15:30.178016Z 5 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='node3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='node1', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2015-10-27T17:15:30.189233Z 5 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor a0ef74a1-7cb3-11e5-845e-0800275ff74d at node1 port: 3306.' 2015-10-27T17:15:30.190787Z 8 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2015-10-27T17:15:30.610531Z 8 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'rpl_user@node1:3306',replication started in log 'FIRST' at position 4 get_nodeno(get_site_def()) = 2 task_now() = 1445966131.000425 n = 74 (n - old_n) / (task_now() - old_t) = 11.136156

During this process we can check the status in any node as follows:

mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | RECOVERING | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec)

Again, as expected, the node connected to the cluster, requested binary logs from latest GTID executed position and applied remaining changes to be back online.

The final test I’ve done so far is about data consistency. For example, what if I stop group replication in a node and make some data changes? When it gets back to replication will it send these changes?

Let’s see a very simple example:
Node2:

mysql> select * from sbtest1 where id=15; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1 row in set (0.12 sec)

Node1:

mysql> select * from sbtest1 where id=15; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> stop group_replication; Query OK, 0 rows affected (0.03 sec) mysql> delete from sbtest1 where id=15; Query OK, 1 row affected (0.02 sec) mysql> start group_replication; Query OK, 0 rows affected (0.02 sec) mysql> select * from sbtest1 where id=15; Empty set (0.00 sec)

And now Node2 again:

mysql> select * from sbtest1 where id=15; +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | id | k | c | pad | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ | 15 | 12593 | 86109785327-21666363106-71483869798-96435839840-61691707322-34771340621-77881281887-34657680322-46195586522-19795644808 | 74925946566-05246863347-88207617885-76849666822-72200234338 | +----+-------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1 row in set (0.00 sec)

Hmmmm, not cool, what if I try to remove a row from Node2?

mysql> delete from sbtest1 where id=15; Query OK, 1 row affected (0.01 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec)

Hmmmm, strange, everything seems to be working correctly. Is it? Let’s check node1 again:

mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 80cc27ce-7cb4-11e5-9055-0800275ff74d | node2 | 3306 | ONLINE | | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | ONLINE | | group_replication_applier | e5263489-7cb7-11e5-a8ee-0800275ff74d | node3 | 3306 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.01 sec) mysql> stop group_replication; Query OK, 0 rows affected (4.01 sec) mysql> start group_replication; Query OK, 0 rows affected (2.41 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | a0ef74a1-7cb3-11e5-845e-0800275ff74d | node1 | 3306 | OFFLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 1 row in set (0.00 sec) mysql> exit Bye [root@node1 data]# tail -100 error.log 2015-10-27T17:52:50.075274Z 15 [ERROR] Slave SQL for channel 'group_replication_applier': Could not execute Delete_rows event on table test.sbtest1; Can't find record in 'sbtest1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 346, Error_code: 1032 2015-10-27T17:52:50.075274Z 15 [Warning] Slave: Can't find record in 'sbtest1' Error_code: 1032 2015-10-27T17:52:50.075274Z 15 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 0 2015-10-27T17:52:50.075294Z 2 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!' 2015-10-27T17:52:50.075308Z 2 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7' 2015-10-27T17:52:50.075312Z 2 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 1' [XCOM BINDING DEBUG] ::join() connecting to 192.168.70.2 10300 state 3489 action xa_init connecting to 192.168.70.2 10300 .... XCOM BINDING DEBUG] ::join():: I am NOT the boot node. [XCOM BINDING DEBUG] ::join():: Skipping own address. [XCOM BINDING DEBUG] ::join():: xcom_client_open_connection to 192.168.70.3:10300 connecting to 192.168.70.3 10300 connected to 192.168.70.3 10300 [XCOM BINDING DEBUG] ::join():: Calling xcom_client_add_node cli_err 0 state 3489 action xa_snapshot new state x_recover state 3505 action xa_complete new state x_run [XCOM BINDING DEBUG] ::join():: GCS_OK [XCOM_BINDING_DEBUG] ::cb_xcom_receive_global_view():: message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 0 [XCOM_BINDING_DEBUG] ::cb_xcom_receive_global_view():: node set: peer: 0 flag: 1 peer: 1 flag: 1 peer: 2 flag: 1 [XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: Processing new view on Handler [XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: My node_id is 2 [XCOM_BINDING_DEBUG] ::xcom_receive_global_view():: State Exchange started. get_nodeno(get_site_def()) = 2 task_now() = 1445968372.450627 n = 0 median_time() = 1.000000 executed_msg={84bca5ce 12585 1} [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 1 [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Still waiting for more State Exchange messages [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12585 message_id.node= 2 [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Still waiting for more State Exchange messages [XCOM_BINDING_DEBUG] ::xcom_receive_data_internal():: xcom_receive_data message_id.group= 2226955726 message_id.msgno= 12586 message_id.node= 0 [XCOM_BINDING_DEBUG] ::process_possible_control_message():: Received a control message [XCOM_BINDING_DEBUG] ::process_possible_control_message()::Install new view [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data [XCOM_BINDING_DEBUG] ::install_view():: Processing exchanged data 2015-10-27T17:52:52.455340Z 0 [Note] Plugin group_replication reported: 'Starting group replication recovery with view_id 46750407:11' 2015-10-27T17:52:52.456474Z 16 [ERROR] Plugin group_replication reported: 'Can't evaluate the group replication applier execution status. Group replication recovery will shutdown to avoid data corruption.' 2015-10-27T17:52:52.456503Z 16 [ERROR] Plugin group_replication reported: 'Fatal error during the Recovery process of Group Replication. The server will leave the group.' [XCOM BINDING DEBUG] ::leave() [XCOM BINDING DEBUG] ::leave():: Skipping own address. [XCOM BINDING DEBUG] ::leave():: xcom_client_open_connection to 192.168.70.3:10300 connecting to 192.168.70.3 10300 [XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 750756806 connected to 192.168.70.3 10300 [XCOM BINDING DEBUG] ::leave():: Calling xcom_client_remove_node cli_err 0 handle_remove_node /export/home2/pb2/build/sb_0-16846472-1445524610.82/build/BUILD/mysql-server/plugin/group_replication/gcs/src/bindings/xcom/xcom/xcom_base.c:1987 nodes: 0x3d05fa8 nodes->node_list_len = 1 nodes->node_list_val: 0x3da7da0 node_address n.address: 0x3d238d0 192.168.70.2:10300 getstart group_id 84bca5ce state 3551 action xa_terminate new state x_start state 3489 action xa_exit Exiting xcom thread new state x_start [XCOM BINDING DEBUG] ::leave():: Installing Leave view [XCOM_BINDING_DEBUG] ::install_view():: No exchanged data [XCOM_BINDING_DEBUG] ::install_view():: View delivered to client handler= 750756806 [XCOM BINDING DEBUG] ::leave():: Exiting with error=GCS_OK

So it looks like a member that has data inconsistencies might be reported as ONLINE erroneously, but whenever group replication is restarted it will fail and won’t be able to join to the cluster. It seems there should be better error handling when a data inconsistency is found.

What about the operational perspective?
It looks very limited, just a few variables and status counters, plus some status tables in performance schema as follows:

mysql> show global variables like '%group_repli%'; +---------------------------------------------------+---------------------------------------+ | Variable_name | Value | +---------------------------------------------------+---------------------------------------+ | group_replication_allow_local_lower_version_join | OFF | | group_replication_auto_increment_increment | 7 | | group_replication_bootstrap_group | OFF | | group_replication_components_stop_timeout | 31536000 | | group_replication_gcs_engine | xcom | | group_replication_group_name | 8a94f357-aab4-11df-86ab-c80aa9429562 | | group_replication_local_address | 192.168.70.3:10300 | | group_replication_peer_addresses | 192.168.70.2:10300,192.168.70.3:10300 | | group_replication_pipeline_type_var | STANDARD | | group_replication_recovery_complete_at | TRANSACTIONS_CERTIFIED | | group_replication_recovery_password | | | group_replication_recovery_reconnect_interval | 60 | | group_replication_recovery_retry_count | 86400 | | group_replication_recovery_ssl_ca | | | group_replication_recovery_ssl_capath | | | group_replication_recovery_ssl_cert | | | group_replication_recovery_ssl_cipher | | | group_replication_recovery_ssl_crl | | | group_replication_recovery_ssl_crlpath | | | group_replication_recovery_ssl_key | | | group_replication_recovery_ssl_verify_server_cert | OFF | | group_replication_recovery_use_ssl | OFF | | group_replication_recovery_user | rpl_user | | group_replication_start_on_boot | ON | +---------------------------------------------------+---------------------------------------+ 24 rows in set (0.00 sec) mysql> show global status like '%group_repli%'; +-----------------------------+-------+ | Variable_name | Value | +-----------------------------+-------+ | Com_group_replication_start | 8 | | Com_group_replication_stop | 7 | +-----------------------------+-------+ 2 rows in set (0.01 sec) mysql> show tables from performance_schema like '%group%'; +----------------------------------------+ | Tables_in_performance_schema (%group%) | +----------------------------------------+ | replication_group_member_stats | | replication_group_members | +----------------------------------------+ 2 rows in set (0.00 sec)

Most of values above are self-descriptive. I still need to dig into it a bit more to find the function for some of them.

Conclusions:

So far the work done with group replication is very impressive. Of course there is still a long road to travel, but it doesn’t look to be fair to compare group replication against Galera, unless it is not a side by side comparison.

Even if I like the idea of using a legacy component, I don’t like the need to install and configure Corosync because it’s another piece of software that could eventually fail. Fortunately this can be avoided with the newer version of the plugin, which can use the new XCom communication framework. I tested both versions and using XCom is far easier to setup and configure; however, the error log file can become very verbose, maybe too verbose in my opinion.
With regards to installation and configuration it’s pretty easy once you find the proper way to do it. There are few variables to configure to have a working cluster and most of the settings works just fine by default (like group_replication_auto_increment_increment).

I would still like to have some automatic control on data inconsistency handling (like SST in Galera), but in my opinion this new feature can be a good approach to consider in the future when looking for high availability solutions. A lot of tests need to be done and I’d also like to see some benchmarks. These are just my first impressions and we should wait some time before seeing this feature as GA. Paraphrasing that song “it’s a long way to the top if you wanna rock ‘n’ roll.”

The post MySQL 5.7 first impressions on group-replication appeared first on MySQL Performance Blog.

How Big Can Your Galera Transactions Be

October 26, 2015 - 11:17am

While we should be aiming for small and fast transactions with Galera, it is always possible at some point you might want a single large transaction, but what is involved?

First, this is supposed to be controlled by two settings, wsrep_max_ws_rows  and wsrep_max_ws_size . The first variable is not yet enforced and has no effect – see here and here – so don’t bother tuning this knob just yet. In my opinion, I would rather implement only one – having a limit by rows is hard to control as a DBA since each row’s size can be very different per workload.

The second variable restricts the writeset size in bytes and has better control on cluster performance. If your network and CPU can only process N amount of bytes per second, this is a good variable to enforce. Additionally, the maximum allowed value for this setting is only 2GB, with a default of 1GB. You can actually set this higher than 2GB, but only the 2GB is being enforced in my tests.

Let’s see what happens when we adjust this setting to large values. I have a sysbench table with 10M rows below and will update all rows in a single transaction.

-rw-rw---- 1 mysql mysql 8.5K Oct 20 03:25 t.frm -rw-rw---- 1 mysql mysql 2.4G Oct 20 03:46 t.ibd node1 mysql> set global wsrep_max_ws_size=1024*1024*1024*2; Query OK, 0 rows affected (0.00 sec) node1 mysql> update t set k=k+1; ERROR 1180 (HY000): Got error 5 during COMMIT

On the MySQL error log this translates to:

2015-10-20 04:27:14 10068 [Warning] WSREP: transaction size limit (2147483648) exceeded: 2147516416 2015-10-20 04:27:14 10068 [ERROR] WSREP: rbr write fail, data_len: 0, 2

Let’s see what happens if we set this to 4GB:

node1 mysql> set global wsrep_max_ws_size=1024*1024*1024*4; Query OK, 0 rows affected, 1 warning (0.00 sec) node1 mysql> show warnings; +---------+------+-----------------------------------------------------------+ | Level | Code | Message | +---------+------+-----------------------------------------------------------+ | Warning | 1292 | Truncated incorrect wsrep_max_ws_size value: '4294967296' | +---------+------+-----------------------------------------------------------+ 1 row in set (0.00 sec) node1 mysql> select @@wsrep_max_ws_size; +---------------------+ | @@wsrep_max_ws_size | +---------------------+ | 4294901759 | +---------------------+ 1 row in set (0.00 sec)

We got a warning that it exceeds the allowed value, not the 2GB, but something else a bit lower than the 32bit UNSIGNED MAX VALUE. So let’s try the transaction again:

node1 mysql> update t set k=k+1; ERROR 1180 (HY000): Got error 5 during COMMIT

We clearly got the same error on the client, and on the MySQL error log, it’s a bit different:

2015-10-20 04:54:14 10068 [ERROR] WSREP: Maximum writeset size exceeded by 1737995426: 90 (Message too long) at galera/src/write_set_ng.hpp:check_size():662 2015-10-20 04:54:14 10068 [ERROR] WSREP: transaction size exceeded

If you really must process a large number of rows, one way to try and reduce the amount of writeset size is to set wsrep_binlog_image  to minimal . Let’s test this again with a 2GB wsrep_max_ws_size  and see how it goes:

node1 mysql> set global wsrep_max_ws_size=1024*1024*1024*2; Query OK, 0 rows affected (0.00 sec) node1 mysql> select @@wsrep_max_ws_size; +---------------------+ | @@wsrep_max_ws_size | +---------------------+ | 2147483648 | +---------------------+ 1 row in set (0.00 sec) node1 mysql> SET GLOBAL binlog_row_image=minimal; Query OK, 0 rows affected (0.00 sec) node1 mysql> select @@binlog_row_image; +--------------------+ | @@binlog_row_image | +--------------------+ | MINIMAL | +--------------------+ 1 row in set (0.00 sec) node1 mysql> show global status like 'wsrep_replicated_bytes'; +------------------------+-------------+ | Variable_name | Value | +------------------------+-------------+ | wsrep_replicated_bytes | 13211964556 | +------------------------+-------------+ 1 row in set (0.00 sec) node1 mysql> update t set k=k+1; Query OK, 10000000 rows affected (11 min 18.33 sec) Rows matched: 10000000 Changed: 10000000 Warnings: 0 node1 mysql> show global status like 'wsrep_replicated_bytes'; +------------------------+-------------+ | Variable_name | Value | +------------------------+-------------+ | wsrep_replicated_bytes | 13402597135 | +------------------------+-------------+ 1 row in set (0.00 sec) node1 mysql> select (13402597135-13211964556)/1024/1024 as ws_size; +--------------+ | ws_size | +--------------+ | 181.80139446 | +--------------+ 1 row in set (0.01 sec)

In our tests, we reduced it to at least 10x the original writeset size. Thanks to my colleague Przemyslaw for pointing this out.

Now we know how big we can go in terms of size and how to go further with minimal row image, although this does not mean that you can and should be allowing it. Make sure to set a sane limit on the sizes depending on your workload and realistic performance expectations. Large transactions will not only cause unexpected performance issues with your cluster, but will lead to usability issues as well in terms of increased deadlocks. Lastly, make sure to review what limitations there would be when switching to Galera-based clusters here for an enjoyable experience :).

The post How Big Can Your Galera Transactions Be appeared first on MySQL Performance Blog.

State of Percona Server 5.6, MySQL 5.6 and MySQL 5.7 RC

October 26, 2015 - 7:39am

This week Oracle will release MySQL 5.7 GA, so it’s a perfect time to do a quick review of the current state of Percona Server 5.6.26, MySQL 5.6.26 and MySQL-5.7.8 RC. We used two boxes from our benchmark lab for this:
– Box 1: 16 Cores+HT (32 virt cores)/fast PCIe ssd card/RAM: 192GB
– Box 2: 24 Cores+HT(48 virt cores)/fast PCIe ssd card/RAM: 128GB

Dataset: sysbench/uniform, 32 tables with 12M rows each, ~95GB
Tests: sysbench – point select, oltp read only, oltp read/write
Test sequence: start server, warmup, series of the tests (each lasts 5 minutes) from 1 to 4096 threads

Tests were run for two setups:
– CPU bound (in memory) – innodb_buffer_pool_size=100GB
– IO bound – innodb_buffer_pool_size=25GB

Observations:

* CPU bound
– It’s clear that MySQL 5.7 RC, in both read-only scenarios (adhoc and transaction), outperforms MySQL 5.6/Percona Server 5.6 and scales very well up to 4k threads, especially on Box 2 with 48 cores. It shows great improvements over 5.6 in the read only scalability area. In the read-write scenario there are still some problems with the 5.7 RC. It shows a stable result on the 16 core box, but notably degrades for high threads on Box 2 with 48 cores. Percona Server 5.6 is OK up to 1024/2048 threads for both types of boxes, and then tps drops as well. MySQL 5.6 in this test scales up to 512 threads only and then tps dramatically decreases.

In general, in the CPU-bound scenario, 5.7 RC on Box 1 with 16 cores showed a bit worse results than 5.6. It looks like it is limited by something, and this may require additional analysis. We will recheck that after GA.

* IO bound
– Again, 5.7 RC shines in read-only scenarios. For Box 1 with 32 cores, Percona Server 5.6 competes with 5.7 RC, but on Box 2 with 48 cores the difference is quite notable with higher threads. Read/write workload in the IO-bound scenario is the most problematic case for 5.7 – it shows an almost similar pattern to MySQL 5.6 on Box 1 and is slightly better on Box 2. We have checked that case with Performance Schema for all 3 servers on each box and according to that, (see charts below) the most notable waits for 5.7 are caused by a doublewrite mutex. MySQL 5.6 is affected by contention of the buffer pool mutex and for Percona Server 5.6 log_sys mutex is the hottest one.

Charts with mutex info above are for the OLTP_RW test for the runs with 64 and 1024 threads for Percona Server 5.6.26/MySQL 5.7.8/MySQL 5.6.26
mysql server settings

innodb_log_file_size=10G innodb_doublewrite=1 innodb_flush_log_at_trx_commit=1 innodb_buffer_pool_instances=8 innodb_change_buffering=none innodb_adaptive_hash_index=OFF innodb_flush_method=O_DIRECT innodb_flush_neighbors=0 innodb_read_io_threads=8 innodb_write_io_threads=8 innodb_lru_scan_depth=8192 innodb_io_capacity=15000 innodb_io_capacity_max=25000 loose-innodb-page-cleaners=4 table_open_cache_instances=64 table_open_cache=5000 loose-innodb-log_checksum-algorithm=crc32 loose-innodb-checksum-algorithm=strict_crc32 max_connections=50000 skip_name_resolve=ON loose-performance_schema=ON loose-performance-schema-instrument='wait/synch/%=ON',

The post State of Percona Server 5.6, MySQL 5.6 and MySQL 5.7 RC appeared first on MySQL Performance Blog.



General Inquiries

For general inquiries, please send us your question and someone will contact you.