]]>
]]>

Feed aggregator

You are here

Percona Server 5.5.37-35.1 is now available

Latest MySQL Performance Blog posts - June 3, 2014 - 7:47am

Percona Server version 5.5.37-35.1

Percona is glad to announce the release of Percona Server 5.5.37-35.1 on June 3rd, 2014 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.5.37, including all the bug fixes in it, Percona Server 5.5.37-35.1 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.5.37-35.1 milestone at Launchpad.

Bugs Fixed:

  • InnoDB could crash if workload contained writes to compressed tables. Bug fixed #1305364.
  • GUI clients such as MySQL Workbench could not authenticate with a user defined with auth_pam_compat plugin. Bug fixed #1166938.
  • Help in Percona Server 5.5 command line client was linking to Percona Server 5.1 manual. Bug fixed #1198775.
  • Audit Log wasn’t parsing escape characters correctly in the OLD format. Bug fixed #1313696.
  • Percona Server version was reported incorrectly in Debian/Ubuntu packages. Bug fixed #1319670.

Other bugs fixed: #1272732, #1219833, #1271178, and #1314568.

Release notes for Percona Server 5.5.37-35.1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.37-35.1 is now available appeared first on MySQL Performance Blog.

Do not trust vmstat IOwait numbers

Latest MySQL Performance Blog posts - June 3, 2014 - 3:00am

I’ve been running a benchmark today on my old test box with conventional hard drives (no raid with BBU) and noticed something unusual in the CPU utilization statistics being reported.

The benchmark was run like this:

sysbench --num-threads=64 --max-requests=0 --max-time=600000 --report-interval=10 --test=oltp --db-driver=mysql --oltp-dist-type=special --oltp-table-size=1000000 --mysql-user=root --mysql-password=password run

Which means: create 64 threads and hammer the database with queries as quickly as possible. As the test was run on the localhost I would expect the benchmark to completely saturate the system – being either using CPU or being blocked on IO nature of this benchmark so it does not spend a lot on database locks, especially as this system has just 2 cores.

Looking at VMSTAT however I noticed this:

[root@smt1 mysql]# vmstat 10 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 13 0 0 880156 40060 2140824 0 0 12 750 895 1045 32 8 54 6 0 0 0 0 877116 40060 2141312 0 0 0 1783 2185 23112 44 10 41 5 0 15 0 0 872648 40068 2141960 0 0 0 1747 2204 25743 41 11 46 2 0 0 0 0 868056 40068 2142604 0 0 0 1803 2164 26224 40 11 44 5 0 17 1 0 863216 40068 2143160 0 0 0 1875 1948 23020 36 9 50 5 0 0 0 0 858384 40168 2143656 0 0 0 1063 1855 21116 32 9 45 14 0 23 0 0 855848 40176 2144232 0 0 0 1755 2036 23181 36 10 48 6 0 49 0 0 851248 40184 2144648 0 0 0 1679 2313 22832 45 10 40 5 0 10 0 0 846292 40192 2145248 0 0 0 1911 1980 23185 36 9 50 4 0 0 0 0 844260 40196 2145868 0 0 0 1757 2152 26387 39 11 45 5 0 0 3 0 839296 40196 2146520 0 0 0 1439 2104 25096 38 10 50 1 0

Looking at last few columns we have IO wait typically being in single digit percents where the “idle” CPU floats in 40-50% range. You can also see it visually in the graph from Percona Cloud Tools

So the system should be completely saturated with the load but instead we have 50% of CPU idle and iowait is typically in single digits?

Running SHOW PROCESSLIST can give us some good idea in this case:

mysql> show processlist; +-----+---------------+-----------------+--------+---------+------+----------------+--------------------------------------------------------------------+-----------+---------------+-----------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read | +-----+---------------+-----------------+--------+---------+------+----------------+--------------------------------------------------------------------+-----------+---------------+-----------+ | 1 | root | localhost:39624 | NULL | Sleep | 0 | | NULL | 0 | 0 | 0 | | ... | 32 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 33 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 34 | root | localhost | sbtest | Execute | 0 | Sorting result | SELECT c from sbtest where id between 365260 and 365359 order by c | 0 | 0 | 0 | | 35 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 36 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 37 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 38 | root | localhost | sbtest | Execute | 0 | Writing to net | DELETE from sbtest where id=496460 | 0 | 1 | 1 | | 39 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | ... | 89 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 90 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 91 | root | localhost | sbtest | Execute | 0 | NULL | COMMIT | 0 | 0 | 0 | | 268 | root | localhost | NULL | Query | 0 | NULL | show processlist | 0 | 0 | 0 | +-----+---------------+-----------------+--------+---------+------+----------------+--------------------------------------------------------------------+-----------+---------------+-----------+ 68 rows in set (0.00 sec)

So most of the queries are stalling in COMMIT stage. Checking setting we can see:

mysql> select @@innodb_flush_log_at_trx_commit; +----------------------------------+ | @@innodb_flush_log_at_trx_commit | +----------------------------------+ | 1 | +----------------------------------+ 1 row in set (0.00 sec)

Which means Percona Server is running in the fully ACID mode, which with slow hard drives without BBU means commit is causing syncing data to the disk which is very expensive.

So we figured out what is really happening… yet if we’re really waiting on disk a lot why iowait does not show that?
To answer that lets see how IOwait is really computed.

IOwait is type of Idle time (and in earlier Linux versions it wasincluded in Idle) – when we have the process/thread which would be runable if it would not be waiting on IO we consider this iowait, if it is not runable because of something else (waiting on mutex, network etc) it is considered idle.

Writes to Innodb log file are serialized – so only one process will be doing log write (and sync) while all others waiting to be able to commit will be waiting on some form of synchronization mechanism to be notified when the commit is complete as part of group commit.

What is the take away? Take iowait% numbers from vmstat/iostat with grain of salt. It is possible there is much more to gain than faster IO subsystem (or proper configuration) than iowait numbers show. Consider the iowait%+idle% of the CPU as “cpu available” which could be potentially put to the good use with improving IO subsystem and minimizing contention

To illustrate that lets take a look at the CPU graph again – now as I’ve set innodb_flush_log_at_trx_commit=2

As we can see from the graph we have reclaimed much more than 6-7% of idle time we had – the user CPU usage increased to 70% and idle+iowait is now 10-15%; the throughput of the system has increased about 60% – much more than “eliminating iowait” would allow.

The post Do not trust vmstat IOwait numbers appeared first on MySQL Performance Blog.

Using InfiniDB MySQL server with Hadoop cluster for data analytics

Latest MySQL Performance Blog posts - June 2, 2014 - 9:58am

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala.

This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the MySQL server to execute queries. This allows for an easy “migration” of existing analytical tools. The results are quite interesting and promising.

Quick How-To

The InfiniDB documentation is not very clear on step-by-step instructions so I’ve created this quick guide:

  1. Install Hadoop cluster (minimum install will work). I’ve used Cloudera Manager (CDH5) to compare the speed of InfiniDB to Cloudera Impala. Install the tools in the “Pre-requirements” sections of InfiniDB for Hadoop Manual
  2. Install the InfiniDB for Hadoop binaries on 1 Hadoop node (you can choose any node).  This will install InfiniDB and its version of MySQL (based on MySQL 5.1).
  3. After installation it will tell you the variables to set and run the postConfigure script. Example:
    export JAVA_HOME=/usr/java/jdk1.6.0_31 export LD_LIBRARY_PATH=/usr/java/jdk1.6.0_31/jre/lib/amd64/server . /root/setenv-hdfs-20 /usr/local/Calpont/bin/postConfigure
  4. The postConfigure script will ask the questions. Couple imfortant notes:
  • Make sure to use HDFS as a “type of Data Storage”.
  • The performance module 1 (pm1) should point to the host (hostname and IP) you are running the postConfigure script on. Other pm(s) should point to other Hadoop nodes

When installation is finished you will be able to login into MySQL server, it uses script called ibdmysql which will call mysql cli with the correct socket and port. Check that the infiniDB is enabled by running “show engines”, InfiniDB should be in the list.

The next step will be importing data.

Data import

First we will need to create a MySQL table with “engine=InfiniDB”:

CREATE TABLE `ontime` ( `YearD` int(11) NOT NULL, `Quarter` tinyint(4) DEFAULT NULL, `MonthD` tinyint(4) DEFAULT NULL, `DayofMonth` tinyint(4) DEFAULT NULL, `DayOfWeek` tinyint(4) DEFAULT NULL, `FlightDate` date DEFAULT NULL, ... ) ENGINE=InfiniDB DEFAULT CHARSET=latin1

Second,  I’ve used the cpimport to load the data. It turned out it is much more efficient and easier to load 1 big file rather than 20×12 smaller files (original “ontime” data is 1 file per month), so I’ve exported the “Ontime” data from MySQL table and created 1 big file “ontime.psv”.

I used the following command to export data into InfiniDB:

[root@n0 ontime]# /usr/local/Calpont/bin/cpimport -s '|' ontime ontime ontime.psv 2014-05-20 15:12:58 (18787) INFO : Running distributed import (mode 1) on all PMs... 2014-05-20 15:25:28 (18787) INFO : For table ontime.ontime: 155083620 rows processed and 155083620 rows inserted. 2014-05-20 15:25:28 (18787) INFO : Bulk load completed, total run time : 751.561 seconds

The data is stored in Hadoop:

[root@n0 ontime]# hdfs dfs -du -h /usr/local/Calpont 1.4 G /usr/local/Calpont/data1 1.4 G /usr/local/Calpont/data2 1.4 G /usr/local/Calpont/data3 1.4 G /usr/local/Calpont/data4 1.4 G /usr/local/Calpont/data5 1.4 G /usr/local/Calpont/data6 1.4 G /usr/local/Calpont/data7 1.4 G /usr/local/Calpont/data8

The total size of the data is 8×1.4G = 11.2G (compressed). To compare the size of the same dataset in Impala Parquet format is 3.6G. Original size was ~60G.

[root@n0 ontime]# hdfs dfs -du -h /user/hive/warehouse 3.6 G /user/hive/warehouse/ontime_parquet_snappy

Now we can run the 2 queries I’ve tested before:

1. Simple group-by

mysql> select yeard, count(*) from ontime group by yeard order by yeard; +-------+----------+ | yeard | count(*) | +-------+----------+ | 1988 | 5202096 | | 1989 | 5041200 | | 1990 | 5270893 | | 1991 | 5076925 | | 1992 | 5092157 | | 1993 | 5070501 | | 1994 | 5180048 | | 1995 | 5327435 | | 1996 | 5351983 | | 1997 | 5411843 | | 1998 | 5384721 | | 1999 | 5527884 | | 2000 | 5683047 | | 2001 | 5967780 | | 2002 | 5271359 | | 2003 | 6488540 | | 2004 | 7129270 | | 2005 | 7140596 | | 2006 | 7141922 | | 2007 | 7455458 | | 2008 | 7009726 | | 2009 | 6450285 | | 2010 | 6450117 | | 2011 | 6085281 | | 2012 | 6096762 | | 2013 | 6369482 | | 2014 | 1406309 | +-------+----------+ 27 rows in set (0.22 sec)

2. The complex query from my original post:

mysql> select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(if(ArrDelayMinutes>30, 1, 0)) as flights_delayed, round(sum(if(ArrDelayMinutes>30, 1, 0))/count(*),2) as rate FROM ontime WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' GROUP by carrier HAVING cnt > 100000 and max(yeard) > 1990 ORDER by rate DESC, cnt desc LIMIT 1000; +------------+------------+---------+----------+-----------------+------+ | min(yeard) | max(yeard) | Carrier | cnt | flights_delayed | rate | +------------+------------+---------+----------+-----------------+------+ | 2003 | 2009 | EV | 1454777 | 237698 | 0.16 | | 2003 | 2009 | FL | 1082489 | 158748 | 0.15 | | 2006 | 2009 | YV | 740608 | 110389 | 0.15 | | 2006 | 2009 | XE | 1016010 | 152431 | 0.15 | | 2003 | 2009 | B6 | 683874 | 103677 | 0.15 | | 2001 | 2009 | MQ | 3238137 | 448037 | 0.14 | | 2003 | 2005 | DH | 501056 | 69833 | 0.14 | | 2004 | 2009 | OH | 1195868 | 160071 | 0.13 | | 2003 | 2006 | RU | 1007248 | 126733 | 0.13 | | 1988 | 2009 | UA | 9593284 | 1197053 | 0.12 | | 2003 | 2006 | TZ | 136735 | 16496 | 0.12 | | 1988 | 2001 | TW | 2656286 | 280283 | 0.11 | | 1988 | 2009 | AA | 10568437 | 1183786 | 0.11 | | 1988 | 2009 | CO | 6023831 | 673354 | 0.11 | | 1988 | 2009 | DL | 11866515 | 1156048 | 0.10 | | 2003 | 2009 | OO | 2654259 | 257069 | 0.10 | | 1988 | 2009 | AS | 1506003 | 146920 | 0.10 | | 2007 | 2009 | 9E | 577244 | 59440 | 0.10 | | 1988 | 2009 | US | 10276862 | 990995 | 0.10 | | 1988 | 2009 | NW | 7601727 | 725460 | 0.10 | | 1988 | 2005 | HP | 2607603 | 235675 | 0.09 | | 1988 | 2009 | WN | 12722174 | 1107840 | 0.09 | | 2005 | 2009 | F9 | 307569 | 28679 | 0.09 | | 1988 | 1991 | PA | 203401 | 19263 | 0.09 | +------------+------------+---------+----------+-----------------+------+ 24 rows in set (0.86 sec)

The same query in impala (on the same hardware) runs for 7.18 seconds:

[n8.local:21000] > select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(if(ArrDelayMinutes>30, 1, 0)) as flights_delayed, round(sum(if(ArrDelayMinutes>30, 1, 0))/count(*),2) as rate FROM ontime_parquet_snappy WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' GROUP by carrier HAVING cnt > 100000 and max(yeard) > 1990 ORDER by rate DESC LIMIT 1000; Query: select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(if(ArrDelayMinutes>30, 1, 0)) as flights_delayed, round(sum(if(ArrDelayMinutes>30, 1, 0))/count(*),2) as rate FROM ontime_parquet_snappy WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' GROUP by carrier HAVING cnt > 100000 and max(yeard) > 1990 ORDER by rate DESC LIMIT 1000 +------------+------------+---------+----------+-----------------+------+ | min(yeard) | max(yeard) | carrier | cnt | flights_delayed | rate | +------------+------------+---------+----------+-----------------+------+ | 2003 | 2009 | EV | 1454777 | 237698 | 0.16 | | 2003 | 2009 | FL | 1082489 | 158748 | 0.15 | | 2006 | 2009 | XE | 1016010 | 152431 | 0.15 | | 2006 | 2009 | YV | 740608 | 110389 | 0.15 | | 2003 | 2009 | B6 | 683874 | 103677 | 0.15 | | 2001 | 2009 | MQ | 3238137 | 448037 | 0.14 | | 2003 | 2005 | DH | 501056 | 69833 | 0.14 | | 2004 | 2009 | OH | 1195868 | 160071 | 0.13 | | 2003 | 2006 | RU | 1007248 | 126733 | 0.13 | | 1988 | 2009 | UA | 9593284 | 1197053 | 0.12 | | 2003 | 2006 | TZ | 136735 | 16496 | 0.12 | | 1988 | 2001 | TW | 2656286 | 280283 | 0.11 | | 1988 | 2009 | CO | 6023831 | 673354 | 0.11 | | 1988 | 2009 | AA | 10568437 | 1183786 | 0.11 | | 1988 | 2009 | US | 10276862 | 990995 | 0.10 | | 2007 | 2009 | 9E | 577244 | 59440 | 0.10 | | 1988 | 2009 | DL | 11866515 | 1156048 | 0.10 | | 2003 | 2009 | OO | 2654259 | 257069 | 0.10 | | 1988 | 2009 | NW | 7601727 | 725460 | 0.10 | | 1988 | 2009 | AS | 1506003 | 146920 | 0.10 | | 1988 | 1991 | PA | 203401 | 19263 | 0.09 | | 1988 | 2009 | WN | 12722174 | 1107840 | 0.09 | | 1988 | 2005 | HP | 2607603 | 235675 | 0.09 | | 2005 | 2009 | F9 | 307569 | 28679 | 0.09 | +------------+------------+---------+----------+-----------------+------+ Returned 24 row(s) in 7.18s

Conclusion and charts

To summaries I’ve created the following charts:

Simple query:

As we can see InfiniDB looks pretty good here. It also uses MySQL protocol, so existing application which uses MySQL will be able to work here without any additional “connectors”.

One note regarding my query example: the “complex” query is designed in a way that will make it hard to use any particular set of index; this query will have to scan the >70% of the table to generate the resultset. That is why it is so slow in MySQL compared to columnar databases. Another “issue” is that the table is very wide and most of the columns are declared as varchar (table is not normalized), which makes it large in MySQL. All this will make it ideal for columnar storage and compression. Other cases may not show that huge of a difference.

So far I was testing with small data (60G), I will plan to run big data benchmark next.

The post Using InfiniDB MySQL server with Hadoop cluster for data analytics appeared first on MySQL Performance Blog.

High Availability with MySQL Fabric: Part II

Latest MySQL Performance Blog posts - May 30, 2014 - 12:00am

This is the third post in our MySQL Fabric series. If you missed the previous two, we started with an overall introduction, and then a discussion of MySQL Fabric’s high-availability (HA) features. MySQL Fabric was RC when we started this series, but it went GA recently. You can read the press release here, and see this blog post from Oracle’s Mats Kindahl for more details. In our previous post, we showed a simple HA setup managed with MySQL Fabric, including some basic failure scenarios. Today, we’ll present a similar scenario from an application developer’s point of view, using the Python Connector for the examples. If you’re following the examples on these posts, you’ll notice that the UUID for servers will be changing. That’s because we rebuild the environment between runs. Symbolic names stay the same though. That said, here’s our usual 3 node setup:

[vagrant@store ~]$ mysqlfabric group lookup_servers mycluster Command : { success = True return = [{'status': 'SECONDARY', 'server_uuid': '3084fcf2-df86-11e3-b46c-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': '192.168.70.101'}, {'status': 'SECONDARY', 'server_uuid': '35cc3529-df86-11e3-b46c-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': '192.168.70.102'}, {'status': 'PRIMARY', 'server_uuid': '3d3f6cda-df86-11e3-b46c-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': '192.168.70.103'}] activities = }

For our tests, we will be using this simple script:

import mysql.connector from mysql.connector import fabric from mysql.connector import errors import time config = { 'fabric': { 'host': '192.168.70.100', 'port': 8080, 'username': 'admin', 'password': 'admin', 'report_errors': True }, 'user': 'fabric', 'password': 'f4bric', 'database': 'test', 'autocommit': 'true' } fcnx = None print "starting loop" while 1: if fcnx == None: print "connecting" fcnx = mysql.connector.connect(**config) fcnx.set_property(group='mycluster', mode=fabric.MODE_READWRITE) try: print "will run query" cur = fcnx.cursor() cur.execute("select id, sleep(0.2) from test.test limit 1") for (id) in cur: print id print "will sleep 1 second" time.sleep(1) except errors.DatabaseError: print "sleeping 1 second and reconnecting" time.sleep(1) del fcnx fcnx = mysql.connector.connect(**config) fcnx.set_property(group='mycluster', mode=fabric.MODE_READWRITE) fcnx.reset_cache() try: cur = fcnx.cursor() cur.execute("select 1") except errors.InterfaceError: fcnx = mysql.connector.connect(**config) fcnx.set_property(group='mycluster', mode=fabric.MODE_READWRITE) fcnx.reset_cache()

This simple script requests a MODE_READWRITE connection and then issues selects in a loop. The reason it requests a RW connector is that it makes it easier for us to provoke a failure, as we have two SECONDARY nodes that could be used for queries if we requested a MODE_READONLY connection. The select includes a short sleep to make it easier to catch it in SHOW PROCESSLIST. In order to work, this script needs the test.test table to exist in the mycluster group. Running the following statements in the PRIMARY node will do it:

mysql> create database if not exists test; mysql> create table if not exists test.test (id int unsigned not null auto_increment primary key) engine = innodb; mysql> insert into test.test values (null);

Dealing with failure

With everything set up, we can start the script and then cause a PRIMARY failure. In this case, we’ll simulate a failure by shutting down mysqld on it:

mysql> select @@hostname; +-------------+ | @@hostname | +-------------+ | node3.local | +-------------+ 1 row in set (0.00 sec) mysql> show processlist; +----+--------+--------------------+------+------------------+------+-----------------------------------------------------------------------+----------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+--------+--------------------+------+------------------+------+-----------------------------------------------------------------------+----------------------------------------------+ | 5 | fabric | store:39929 | NULL | Sleep | 217 | | NULL | | 6 | fabric | node1:37999 | NULL | Binlog Dump GTID | 217 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL | | 7 | fabric | node2:49750 | NULL | Binlog Dump GTID | 216 | Master has sent all binlog to slave; waiting for binlog to be updated | NULL | | 16 | root | localhost | NULL | Query | 0 | init | show processlist | | 20 | fabric | 192.168.70.1:55889 | test | Query | 0 | User sleep | select id, sleep(0.2) from test.test limit 1 | +----+--------+--------------------+------+------------------+------+-----------------------------------------------------------------------+----------------------------------------------+ 5 rows in set (0.00 sec) [vagrant@node3 ~]$ sudo service mysqld stop Stopping mysqld: [ OK ]

While this happens, here’s the output from the script:

will sleep 1 second will run query (1, 0) will sleep 1 second will run query (1, 0) will sleep 1 second will run query (1, 0) will sleep 1 second will run query sleeping 1 second and reconnecting will run query (1, 0) will sleep 1 second will run query (1, 0) will sleep 1 second will run query (1, 0)

The ‘sleeping 1 second and reconnecting’ line means the script got an exception while running a query (when the PRIMARY node was stopped, waited one second and then reconnected. The next lines confirm that everything went back to normal after the reconnection. The relevant piece of code that handles the reconnection is this:

fcnx = mysql.connector.connect(**config) fcnx.set_property(group='mycluster', mode=fabric.MODE_READWRITE) fcnx.reset_cache()

If fcnx.reset_cache() is not invoked, then the driver won’t connect to the xml-rpc server again, but will use it’s local cache of the group’s status instead. As the PRIMARY node is offline, this will cause the reconnect attempt to fail. By reseting the cache, we’re forcing the driver to connect to the xml-rpc server and fetch up to date group status information. If more failures happen and there is no PRIMARY (or candidate for promotion) node in the group, the following error is received:

will run query (1, 0) will sleep 1 second will run query sleeping 1 second and reconnecting will run query Traceback (most recent call last): File "./reader_test.py", line 34, in cur = fcnx.cursor() File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 1062, in cursor self._connect() File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 1012, in _connect exc)) mysql.connector.errors.InterfaceError: Error getting connection: No MySQL server available for group 'mycluster'

Running without MySQL Fabric

As we have discussed in previous posts, the XML-PRC server can become a single point of failure under certain circumstances. Specifically, there are at least two problem scenarios when this server is down:

  • When a node goes down
  • When new connection attempts are made

The first case is obvious enough. If MySQL Fabric is not running and a node fails, there won’t be any action, and clients will get an error whenever they send a query to the failed node. This is worse if the PRIMARY fails, as failover won’t happen and the cluster will be unavailable for writes. The second case means that while MySQL Fabric is not running, no new connections to the group can be established. This is because when connecting to a group, MySQL Fabric-aware clients first connect to the XML-RPC server to get a list of nodes and roles, and only then use their local cache for decisions. A way to mitigate this is to use connection pooling, which reduces the need for establishing new connections, and therefore minimises the chance of failure due to MySQL Fabric being down. This, of course, is assuming that something is monitoring MySQL Fabric ensuring some host provides the XML-PRC service. If that is not the case, failure will be delayed, but it will eventually happen anyway. Here is an example of what happens when MySQL Fabric is down and the PRIMARY node goes down:

Traceback (most recent call last): File "./reader_test.py", line 35, in cur.execute("select id, sleep(0.2) from test.test limit 1") File "/Library/Python/2.7/site-packages/mysql/connector/cursor.py", line 491, in execute self._handle_result(self._connection.cmd_query(stmt)) File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 1144, in cmd_query self.handle_mysql_error(exc) File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 1099, in handle_mysql_error self.reset_cache() File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 832, in reset_cache self._fabric.reset_cache(group=group) File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 369, in reset_cache self.get_group_servers(group, use_cache=False) File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 478, in get_group_servers inst = self.get_instance() File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 390, in get_instance if not inst.is_connected: File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 772, in is_connected self._proxy._some_nonexisting_method() # pylint: disable=W0212 File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1224, in __call__ return self.__send(self.__name, args) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xmlrpclib.py", line 1578, in __request verbose=self.__verbose File "/Library/Python/2.7/site-packages/mysql/connector/fabric/connection.py", line 272, in request raise InterfaceError("Connection with Fabric failed: " + msg) mysql.connector.errors.InterfaceError: Connection with Fabric failed:

This happens when a new connection attempt is made after resetting the local cache.

Making sure MySQL Fabric stays up

As of this writing, it is the user’s responsibility to make sure MySQL Fabric is up and running. This means you can use whatever you feel comfortable with in terms of HA, like Pacemaker. While it does add some complexity to the setup, the XML-RPC server is very simple to manage and so a simple resource manager should work. For the backend, MySQL Fabric is storage engine agnostic, so an easy way to resolve this could be to use a small MySQL Cluster set up to ensure the backend is available. MySQL’s team blogged about such a set up here. We think the ndb approach is probably the simplest for providing HA at the MySQL Fabric store level, but believe that MySQL Fabric itself should provide or make it easy to achieve HA at the XML-RPC server level. If ndb is used as store, this means any node can take a write, which in turns means multiple XML-PRC instances should be able to write to the store concurrently. This means that in theory, improving this could be as easy as allowing Fabric-aware drivers to get a list of Fabric servers instead of a single IP and port to connect to.

What’s next

In the past two posts, we’ve presented MySQL Fabric’s HA features, seen how it handles failures at the node level, how to use MySQL databases with a MySQL Fabric-aware driver, and what remains unresolved for now. In our next post, we’ll review MySQL Fabric’s Sharding features.

The post High Availability with MySQL Fabric: Part II appeared first on MySQL Performance Blog.

How MySQL ‘queries’ and ‘questions’ are measured

Latest MySQL Performance Blog posts - May 29, 2014 - 3:00am

MySQL has status variables “questions” and “queries” which are rather close but also a bit different, making it confusing for many people. The manual describing it might not be very easy to understand:

Queries The number of statements executed by the server. This variable includes statements executed within stored programs, unlike the Questions variable. It does not count COM_PING or COM_STATISTICS commands. Questions The number of statements executed by the server. This includes only statements sent to the server by clients and not statements executed within stored programs, unlike the Queries variable. This variable does not count COM_PING, COM_STATISTICS, COM_STMT_PREPARE, COM_STMT_CLOSE, or COM_STMT_RESET commands.

In a nutshell if you’re not using prepared statements the big difference between those is what “Questions” would count stored procedure calls as a single statement where “Queries” will count all statements called inside of stored procedures as well.

There seems to be also more subtle difference between them:

Running on MySQL 5.6.17 having created a new connection I see…

mysql> show status like "questions"; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | Questions | 2 | +---------------+-------+ 1 row in set (0.00 sec) mysql> show status like "queries"; +---------------+----------+ | Variable_name | Value | +---------------+----------+ | Queries | 21241470 | +---------------+----------+ 1 row in set (0.00 sec)

Which tells me what “queries” is a global status variable while “questions” is a session and can be used to see how many statements were issued to the server through the current connection.

There is also a global questions variable that shows the number for a server since the start:

mysql> show global status like "questions"; +---------------+----------+ | Variable_name | Value | +---------------+----------+ | Questions | 23375398 | +---------------+----------+ 1 row in set (0.00 sec)

When it comes to global values neither “queries” nor “questions” are reset when FLUSH STATUS is called.

What the manual does not describe in detail though is: When exactly those counters are incremented? This might look irrelevant but really it is not, especially if you’re capturing those values with high resolution and using them to diagnose non trivial MySQL performance incidents.

If you would count queries when they start – when a spike in the amount of queries in the given second could be due to the spike in the traffic, however, and if you measure queries at the completion – spikes could also mean that some critical resource became available which allowed for many queries to complete. Think for example of table-level locks or row-level locks with Innodb as very common cause.

So what is the case with MySQL? MySQL increments this counter Before executing the query so you may well see a very uniform rate of questions when actually a lot of queries were started but were not completing quickly due to waiting on some resource.

To check for unusual numbers of queries running concurrently and struggling to complete in time looking at threads_running status variable is a great idea.

The post How MySQL ‘queries’ and ‘questions’ are measured appeared first on MySQL Performance Blog.

From zero to full visibility of MySQL in 3 minutes with Percona Cloud Tools

Latest MySQL Performance Blog posts - May 28, 2014 - 12:16pm

First, I would like to invite you to my webinar, “Monitoring All (Yes, All!) MySQL Metrics with Percona Cloud Tools,” on Wednesday, June 25 at 10 a.m. Pacific Daylight Time, where I will talk on the
new features in Percona Cloud Tools, including monitoring capabilities.

In this post I’d like to show the cool and interesting things we’ve implemented in Percona Cloud Tools, including the recently released agent that Daniel also talks about here in this post.

Basically our agent allows users to collect ALL MySQL metrics plus important environment’s metrics, like CPU, memory, IO stats.

And when I talk all MySQL it is:

  • Metrics from SHOW GLOBAL STATUS (I counted 571 entries on my Percona Server 5.6 with TokuDB)
  • Metrics from INFORMATION_SCHEMA.INNODB_METRICS ( 214 entries)
  • Data from SHOW GLOBAL VARIABLES ( 522 entries).

So you see we collect more than 1000 points. We collect data every second, aggregate per minute and it becomes available with a 1-minute resolution, but with descriptive stats like min, max, avg, median, 5%-percentile, 95%-percentile of collected data. We found this represents data better than 1-sec points, which can be a quite noisy.

So for example this is how a chart with MySQL command counters looks like:

If you do not have MySQL, in fact, you can just monitor your OS without MySQL if you need to.
For example CPU stats

Please note you can see a value at any point in the past. We even can go to week range and see values several days ago

You can choose a custom timerange back to hours, days, weeks, etc with zoom-in capabilities.

Why do we need avg, min, max stats? Let see Peter’s graph from a server with periodical stalls.

Averaging metric smooths the line, and really hides the problem, while with min we are able to see that throughput drops to zero, that means that during some seconds the server did not execute any queries (which is essentially stalls)

More about the agent.

Our percona-agent is open source. This is our way of sharing our Go knowledge, and also you can check that we do not do anything insane on your server (like bitcoin mining or black magic regression modeling math).

You can see source code here https://github.com/percona/percona-agent:
and pre-compiled binaries are available from our website:

What is also interesting about our technology is that we use a permanent connection (based on WebSockets technology, so it looks like a connection to web browser) between an agent and our servers. This way
we support a bi-directional real-time communication between https://cloud.percona.com/ and an agent.

That means you can manage an agent and receive data updates at the same time. Pretty cool, yeah?

Our agent comes with “minimal efforts” in mind.

  • 1. There is minimal efforts to install agent. Basically it takes 3-minutes
    to download binaries, install them, and start seeing real-time updates of charts with MySQL metrics.
  • 2. Agent comes with self-update capabilities (not activated at this moment). Later you will need
    to worry if there is new version of agent is available, it will updated itself. We thought if Android Apps can do that, why can’t we?
  • 3. Minimal efforts for a maintenance: you do not need to install a dedicated server, configure and maintain database, care about its backups and availability. Basically no more hassle with Cacti configuration and managing a storage for it.

The goal of Percona Cloud Tools is to provide you with FULL visibility on what’s going on inside your system right now or at some point in the past, and to gain additional insights on your database server.

Our tools are useful not only if you have hundreds of database servers to manage, but pretty much for single installations, too. Well, of course, we always can run mysql -e "SHOW GLOBAL STATUS" , vmstat 10 ; iostat -dxm 10 manually when we need to troubleshoot something, but is it not useful to collect all this data automatically and be able to go to any point in the past?

You can register for the Beta right now. No invitation is needed, but it may take sometime for us to activate your account, we see a quite a demand right now and we need to prime our servers.

And what’s important, eventually our tools will require a payment, but we will always provide a free level, which will be useful enough for small accounts (this is not a bait-and-switch 30-days trial approach).

The post From zero to full visibility of MySQL in 3 minutes with Percona Cloud Tools appeared first on MySQL Performance Blog.

Location for InnoDB tablespace in MySQL 5.6.6

Latest MySQL Performance Blog posts - May 28, 2014 - 3:00am

There is one new feature in MySQL 5.6 that didn’t get the attention it deserved (at least from me ) : “DATA DIRECTORY” for InnoDB tables.

This is implemented since MySQL 5.6.6 and can be used only at the creation of the table. It’s not possible to change the DATA DIRECTORY with an ALTER for a normal table (but it’s in some case with partitioned ones as you will see below). If you do so, the option will be just ignored:

mysql> CREATE TABLE `sales_figures` ( -> `region_id` int(11) DEFAULT NULL, -> `sales_date` date DEFAULT NULL, -> `amount` int(11) DEFAULT NULL -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1 -> DATA DIRECTORY = '/tb1/'; Query OK, 0 rows affected (0.11 sec) mysql> alter table sales_figures engine=innodb data directory='/tb2/'; Query OK, 0 rows affected, 1 warning (0.21 sec) Records: 0 Duplicates: 0 Warnings: 1 mysql> show warnings; +---------+------+---------------------------------+ | Level | Code | Message | +---------+------+---------------------------------+ | Warning | 1618 | option ignored | +---------+------+---------------------------------+

You can read more information in the MySQL Manual: Specifying the Location of a Tablespace.

So it’s now possible if for example you use SSD or FusionIO disks to have the large log or archived table to cheaper disks as you won’t require fast random access for those table and then save some expensive diskspace.

The syntax is very simple:

mysql> CREATE TABLE `sales_figures` ( `region_id` int(11) DEFAULT NULL, `sales_date` date DEFAULT NULL, `amount` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 DATA DIRECTORY='/tmp/tb1/' mysql> select @@datadir; +-----------------+ | @@datadir | +-----------------+ | /var/lib/mysql/ | +-----------------+

And in fact if we check on the filesystem:

# ls -lh /var/lib/mysql/fred/
total 20K
-rw-r--r-- 1 mysql mysql 65 May 23 22:30 db.opt
-rw-r--r-- 1 mysql mysql 8.5K May 23 22:30 sales_figures.frm
-rw-r--r-- 1 mysql mysql 31 May 23 22:30 sales_figures.isl

Not the new file .isl (referred as a link to the RemoteDatafile in the source code) that contains the location of the tablespace:

[root@imac2 tmp]# cat /var/lib/mysql/fred/sales_figures.isl
/tmp/tb1/fred/sales_figures.ibd

And indeed the tablespace is there:

[root@imac2 tmp]# ls -lh /tmp/tb1/fred/
total 96K
-rw-r--r-- 1 mysql mysql 96K May 23 22:30 sales_figures.ibd

This is really great ! And something even nicer, it finally works with partitioning too (before that was only possible for MyISAM tables):

mysql> CREATE TABLE sales_figures (region_id INT, sales_date DATE, amount INT) PARTITION BY LIST (region_id) ( PARTITION US_DATA VALUES IN(100,200,300) DATA DIRECTORY = '/tmp/tb1', PARTITION EU_DATA VALUES IN(400,500) DATA DIRECTORY = '/tmp/tb2/' );


[root@imac2 mysql]# ls -l /tmp/tb1/fred/sales_figures#P#US_DATA.ibd
-rw-rw---- 1 mysql mysql 98304 May 23 16:19 /tmp/tb1/fred/sales_figures#P#US_DATA.ibd

[root@imac2 mysql]# ls -l /tmp/tb2/fred/sales_figures#P#EU_DATA.ibd
-rw-rw—- 1 mysql mysql 98304 May 23 16:19 /tmp/tb2/fred/sales_figures#P#EU_DATA.ibd

So now you can have some partitions on fast disks and some on slower disks. This is great for historical partitioning.

For example you have a table orders partitioned by years as follow:

create table orders (id int, purchased DATE) partition by range (YEAR(purchased)) ( partition pre2012 values less than (2012) DATA DIRECTORY '/hdd/', partition pre2013 values less than (2013) DATA DIRECTORY '/hdd/', partition pre2014 values less than (2014) DATA DIRECTORY '/hdd/', partition current values less than MAXVALUE DATA DIRECTORY '/ssd/' );

Only the partition handling the orders for the current year is on SSD.
At the end of the year, you can recreate a new partition and move all the data for 2014 on slower disks:

mysql> ALTER TABLE orders REORGANIZE PARTITION `current` INTO ( partition pre2015 values less than (2015) DATA DIRECTORY '/hdd/', partition current values less than MAXVALUE DATA DIRECTORY '/ssd');

Notice that XtraBackup is also aware of these tablespaces on different locations and is able to deal with them.

There is currently only one issue is that with –copy-back, you need to have the full path created for the tablespaces not in the MySQL data directory.

So in the example above I had to create /tmp/tb1/fred and /tmp/tb2/fred before being able to run innobackupex –copy-back
(see bug 1322658).

I hope now that this important feature got some more visibility as it deserves it.

The post Location for InnoDB tablespace in MySQL 5.6.6 appeared first on MySQL Performance Blog.

Semi-Sync replication performance in MySQL 5.7.4 DMR

Latest MySQL Performance Blog posts - May 27, 2014 - 6:24am

I was interested to hear about semi-sync replication improvements in MySQL’s 5.7.4 DMR release and decided to check it out.  I previously blogged about poor semi-sync performance and was pretty disappointed from semi-sync’s performance across WAN distances back then, particularly with many client threads.

The Test

The basic environment of these tests was:

  • AWS EC2 m3.medium instances
  • Master in us-east-1, slave in us-west-1 (~78ms ping RTT)
  • CentOS 6.5
  • innodb_flush_log_at_trx_commit=1
  • sync_binlog=1
  • Semi-sync replication plugin installed and enabled.
  • GTID’s enabled (except on 5.5)
  • sysbench 0.5 update_index.lua test, 60 seconds, 250k table size.
  • MySQL 5.7 was tested with both AFTER_SYNC and AFTER_COMMIT settings for rpl_semi_sync_master_wait_point
  • I tested Percona XtraDB Cluster 5.6 / Galera 3.5 as well by means of comparison

Without further ado, here’s the TpmC results I got for a single client thread:

 

These graphs are interactive, so mouse-over for more details. I’m using log scales to better highlight the differences.

The blue bars represent transactions per second (more is better).  The red bars represent average latency per transaction per client (less is better).  Remember these transactions are synchronously being copied across the US before the client can execute another.

The first test is our control:  Async allows ~273 TPS on a single thread.  Once we introduce synchronicity, we clearly see the bulk of the time is that round trip.  Note that MySQL 5.5 and 5.6 are a bit higher than MySQL 5.7 and Percona XtraDB Cluster, the latter of which show pretty similar results.

Adding parallelism

This gets more interesting to see if we redo the same tests, but with 32 test threads:

In the MySQL 5.5 and 5.6 tests, we can clearly see nasty serialization.  Both really don’t allow more performance than single threaded sysbench.  I was happy to see, however, that this seems to be dramatically improved in MySQL 5.7, nice job Oracle!

AFTER_SYNC and AFTER_COMMIT vary, but AFTER_SYNC is the default and is preferred over AFTER_COMMIT.  The reasoning here is AFTER_SYNC forces the semi-sync wait BEFORE the transaction is committed on the master.  The client still must wait for the semi-sync in AFTER_COMMIT, but other transactions may see its data on the master BEFORE we confirm the semi-sync slave has received it.  This is potentially bad because if the master crashed at that instant, clients may have read data from the master that did not make it to a failover slave.    This is a type of ‘phantom read’ and Yoshinori explains it in more detail here.

What about Percona XtraDB Cluster?

I also want to discuss the Percona XtraDB Cluster results, Galera here is somewhat slower than MySQL 5.7 semi-sync.  There may be some enhancements to Galera that can be made (competition is a good thing), but there are still some significant differences here:

  • Galera allows for writing on any and all nodes, semi-sync does not
  • Galera introduces the certification process to check for conflicts, Semi-sync does not
  • Galera is not 2-phase commit and transactions are not committed synchronously anywhere except the node originating the transaction.  So, it is similar to Semi-sync in this way.
  • I ran the Galera tests with no log-bin (Galera does not require it)
  • I ran the Galera tests with innodb_flush_log_at_trx_commit=1
  • I set the fc_limit on the second node really high to eliminate Flow control as a bottleneck.  In a live cluster, it would typically be needed.
  • Galera provides parallel slave threads for faster apply, but it doesn’t matter here because I set the fc_limit so high
TL;DR

Semi-sync in MySQL 5.7 looks like a great improvement.  Any form of synchronicity is always going to be expensive, particularly over 10s and 100s of milliseconds of latency.  With MySQL 5.7, I’d be much more apt to recommend semi-sync as an option than in previous releases.  Thanks to Oracle for investing here.

The post Semi-Sync replication performance in MySQL 5.7.4 DMR appeared first on MySQL Performance Blog.

Installing Percona XtraDB Cluster 5.6 with the Docker open-source engine

Latest MySQL Performance Blog posts - May 26, 2014 - 12:00am

In my previous post, I blogged about using Percona Server with Docker and have shown you how fast and easy it was to create a virtual environment with just a few commands.

This time I will be showing you how to setup a three-node Percona XtraDB Cluster (PXC) 5.6 on the Docker open-source engine. Just to review Docker… “is an open-source engine that automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere.”

In this case we will make use of a Dockerfile, think of this more like the Vagrantfile, it is a build script with a set of commands automating the creation of a new docker container.

For this case, we will use the following Dockerfile contents and use Ubuntu 12.04 instead of CentOS 6.5 as guest OS:

FROM ubuntu:precise MAINTAINER Percona.com <info@percona.com> # Upgrade RUN apt-get update && apt-get upgrade -y && apt-get dist-upgrade -y ENV DEBIAN_FRONTEND noninteractive RUN echo "deb http://repo.percona.com/apt precise main" >> /etc/apt/sources.list.d/percona.list RUN echo "deb-src http://repo.percona.com/apt precise main" >> /etc/apt/sources.list.d/percona.list RUN apt-key adv --keyserver keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A RUN apt-get update RUN apt-get install -y percona-xtradb-cluster-56 qpress xtrabackup RUN apt-get install -y python-software-properties vim wget curl netcat

Create a my.cnf file and add the following:

[MYSQLD] user = mysql default_storage_engine = InnoDB basedir = /usr datadir = /var/lib/mysql socket = /var/run/mysqld/mysqld.sock port = 3306 innodb_autoinc_lock_mode = 2 log_queries_not_using_indexes = 1 max_allowed_packet = 128M binlog_format = ROW wsrep_provider = /usr/lib/libgalera_smm.so wsrep_node_address ={node_IP} wsrep_cluster_name="mydockerpxc" wsrep_cluster_address = gcomm://{node1_ip},{node2_ip},{node3_ip} wsrep_node_name = {node_name} wsrep_slave_threads = 4 wsrep_sst_method = xtrabackup-v2 wsrep_sst_auth = pxcuser:pxcpass [sst] streamfmt = xbstream [xtrabackup] compress compact parallel = 2 compress_threads = 2 rebuild_threads = 2

Build an image from the Dockerfile we just created.

root@Perconallc-Support /home/jericho.rivera/docker-test # docker build -rm -t ubuntu_1204/percona:galera56 .

The ‘docker build’ command will create a new image from the Dockerfile build script. This will take a few minutes to complete. You can check if the new image was successfully built:

root@Perconallc-Support /home/jericho.rivera/docker-test # docker images | grep ubuntu ubuntu_1204/percona galera56 c2adc932aaec 9 minutes ago 669.4 MB ubuntu 13.10 5e019ab7bf6d 13 days ago 180 MB ubuntu saucy 5e019ab7bf6d 13 days ago 180 MB ubuntu 12.04 74fe38d11401 13 days ago 209.6 MB ubuntu precise 74fe38d11401 13 days ago 209.6 MB

Now we will launch three containers with Percona XtraDB Cluster using the new docker image we have just created.

root@Perconallc-Support /home/jericho.rivera/docker-test # for n in {1..3}; do docker run --name dockerpxc$n -i -t -d ubuntu_1204/percona:galera56 bash; done

Check if the new containers were created:

root@Perconallc-Support /home/jericho.rivera/docker-test # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES dd5bafb99108 ubuntu_1204/percona:galera56 bash 5 minutes ago Up 5 minutes dockerpxc3 01664cfcbeb7 ubuntu_1204/percona:galera56 bash 5 minutes ago Up 5 minutes dockerpxc2 2e44d8baee99 ubuntu_1204/percona:galera56 bash 5 minutes ago Up 5 minutes dockerpxc1

Get relevant information from the container using ‘docker inspect’ command which by default will show a JSON-format output on the terminal. Since we only need to get the IP address for each container just run the following commands:

root@Perconallc-Support /home/jericho.rivera/docker-test # docker inspect dockerpxc1 | grep IPAddress "IPAddress": "172.17.0.2", root@Perconallc-Support /home/jericho.rivera/docker-test # docker inspect dockerpxc2 | grep IPAddress "IPAddress": "172.17.0.3", root@Perconallc-Support /home/jericho.rivera/docker-test # docker inspect dockerpxc3 | grep IPAddress "IPAddress": "172.17.0.4",

Take note of the IP address because we will need them later.

wsrep_cluster_address=gcomm://172.17.0.2,172.17.0.3,172.17.0.4

Do the same for dockerpxc2 and dockerpxc3 nodes. To attach to a container you need to run ‘docker attach {node_name}’ (eg # docker attach dockerpxc1). To exit without stopping the containers you need to hit CTRL+p/CTRL+q, otherwise an explicit ‘exit’ command on the prompt will drop you out of the container and stop the container as well, as much as possible we try to avoid this. Also make sure to edit wsrep_node_name and wsrep_node_address accordingly.

Bootstrapping the Cluster

Next step is to start the first node in the cluster, or bootstrapping.

root@Perconallc-Support /home/jericho.rivera/docker-test # docker attach dockerpxc1 root@2e44d8baee99:# /etc/init.d/mysql bootstrap-pxc

After bootstrapping the first node, we can then prepare the first node for SST. That means we need to create the SST auth user, and in this case it is wsrep_ss_auth=pxcuser:pxcpass.

After adding the SST auth user on the first node the next step would be to start dockerpxc2 and dockerpxc3:

root@Perconallc-Support /home/jericho.rivera/docker-test # docker attach dockerpxc2 root@01664cfcbeb7:/# /etc/init.d/mysql start

After starting all nodes, check the status of the entire cluster:

root@2e44d8baee99:/# mysql -e "show global status like 'wsrep%';" +------------------------------+--------------------------------------+ | Variable_name | Value | +------------------------------+--------------------------------------+ | wsrep_local_state_uuid | 7ae2a03e-d71e-11e3-9e00-2f7c2c79edda | | wsrep_protocol_version | 5 | | wsrep_last_committed | 1 | | wsrep_replicated | 1 | | wsrep_replicated_bytes | 270 | | wsrep_repl_keys | 1 | | wsrep_repl_keys_bytes | 31 | | wsrep_repl_data_bytes | 175 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 18 | | wsrep_received_bytes | 2230 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.333333 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_cached_downto | 1 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 1.000000 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 1 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_incoming_addresses | ,, | | wsrep_cluster_conf_id | 11 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 7ae2a03e-d71e-11e3-9e00-2f7c2c79edda | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 2 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 3.5(r178) | | wsrep_ready | ON | +------------------------------+--------------------------------------+

All three nodes are in the cluster!

Install net-tools to verify the default port for Galera.

root@dd5bafb99108:/# netstat -ant Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN tcp 0 0 172.17.0.4:33875 91.189.88.153:80 TIME_WAIT tcp 0 0 172.17.0.4:56956 172.17.0.2:4567 ESTABLISHED tcp 0 0 172.17.0.4:57475 172.17.0.3:4567 ESTABLISHED tcp6 0 0 :::3306 :::* LISTEN

Summary

I’ve shown you the following:

* Create a Dockerfile and my.cnf file
* Build a docker container using the created Dockerfile
* Made a few changes on some wsrep_ options on my.cnf
* Bootstrap the first node
* Start the other nodes of the cluster
* Check cluster status and confirmed all nodes are in the cluster

There are other ways to setup Docker with Percona XtraDB Cluster 5.6 such as using vagrant + docker which further automates the whole process or by using shell scripts, this article however shows you the basics of how to accomplish the same task.

On my next post, I will show how to setup Percona ClusterControl on another Docker container and adding this three-node PXC 5.6 cluster to it.

Read related posts here:

* Using MySQL Sandbox with Percona Server
* Testing Percona XtraDB Cluster with Vagrant
* Percona XtraDB Cluster: Setting up a simple cluster

The post Installing Percona XtraDB Cluster 5.6 with the Docker open-source engine appeared first on MySQL Performance Blog.

Introducing the 3-Minute MySQL Monitor

Latest MySQL Performance Blog posts - May 23, 2014 - 10:01am

There are many cool, new things happening with Percona Cloud Tools.  To avoid “tl;dr” I will highlight only one new feature after a brief, general announcement.  The new feature is a 3-minute MySQL monitor.  I’ll blog later about other features.

The general announcement is: Last week we quietly released a brand-new agent called percona-agent, and we added MySQL and system monitoring to Percona Cloud Tools.  We also wrote a brand-new API from the ground up.  We call it all “PCT v2″.  For you it means a better experience and more features, all still free while we’re in beta.

One new feature in Percona Cloud Tools v2 is MySQL monitoring in 3 minutes, i.e. a 3-minute MySQL monitor.  Let’s be honest about monitoring: We know we should but we often don’t.  It’s like software testing and other best-practices.  As Monitorama earlier this month highlighted, the problem is not a lack of sufficiently advanced technology.  Products like Grok demonstrate this.  The problem is that setting up a meaningful MySQL monitor can be difficult.  The reason is simple: genearl-purpose monitoring solutions leave the user to answer and implement important questions like “which MySQL metrics should we monitor?”  To make an analogy: General-purpose monitoring solutions are sledge hammers and MySQL is a nail.  You can drive a nail with a sledge hammer, but it’s a lot easier (and probably safer) with a hammer.  The 3-minute MySQL monitor in Percona Cloud Tools v2 is a hammer.

The “3-minute” qualifier means that you can download and install percona-agent and have charts with MySQL metrics 3 minutes later.  The initial setup is really that quick and easy.  Give it a try; it’s free!  Just sign updownload percona-agent, and run its install script.

The Percona Cloud Tools MySQL monitor is still new and in development.  It cannot replace a general-purpose monitoring solution like Zabbix, and it does not have alerts, but that’s ok because its raison d’être is different: to make monitoring MySQL quick, easy, and meaningful.

We use Percona Cloud Tools internally for our production servers, but we’re developing it for you. (One reason for which percona-agent is free and open-source.)  Try it out and let us know what you think, especially if you run into problems.  Thanks in advanced, and stay tuned for more blog posts and webinars about Percona Cloud Tools, like Vadim’s upcoming webinar on June 25th: Monitoring All (Yes, All!) MySQL Metrics with Percona Cloud Tools.

The post Introducing the 3-Minute MySQL Monitor appeared first on MySQL Performance Blog.

How to improve InnoDB performance by 55% for write-bound loads

Latest MySQL Performance Blog posts - May 23, 2014 - 3:00am

During April’s Percona Live MySQL Conference and Expo 2014, I attended a talk on MySQL 5.7 performance an scalability given by Dimitri Kravtchuk, the Oracle MySQL benchmark specialist. He mentioned at some point that the InnoDB double write buffer was a real performance killer. For the ones that don’t know what the innodb double write buffer is, it is a disk buffer were pages are written before being written to the actual data file. Upon restart, pages in the double write buffer are rewritten to their data files if complete. This is to avoid data file corruption with half written pages. I knew it has an impact on performance, on ZFS since it is transactional I always disable it, but I never realized how important the performance impact could be. Back from PLMCE, a friend had dropped home a Dell R320 server, asking me to setup the OS and test it. How best to test a new server than to run benchmarks on it, so here we go!

ZFS is not the only transactional filesystem, ext4, with the option “data=journal”, can also be transactional. So, the question is: is it better to have the InnoDB double write buffer enabled or to use the ext4 transaction log. Also, if this is better, how does it compare with xfs, the filesystem I use to propose but which do not support transactions.

Methodology

The goal is to stress the double write buffer so the load has to be write intensive. The server has a simple mirror of two 7.2k rpm drives. There is no controller write cache and the drives write caches are disabled. I decided to use the Percona tpcc-mysql benchmark tool and with 200 warehouses, the total dataset size was around 18G, fitting all within the Innodb buffer pool (server has 24GB). Here’re the relevant part of the my.cnf:

innodb_read_io_threads=4 innodb_write_io_threads=8 #To stress the double write buffer innodb_buffer_pool_size=20G innodb_buffer_pool_load_at_startup=ON innodb_log_file_size = 32M #Small log files, more page flush innodb_log_files_in_group=2 innodb_file_per_table=1 innodb_log_buffer_size=8M innodb_flush_method=O_DIRECT innodb_flush_log_at_trx_commit=0 skip-innodb_doublewrite #commented or not depending on test

So, I generated the dataset for 200 warehouses, added they keys but not the foreign key constraints, loaded all that in the buffer pool with a few queries and dumped the buffer pool. Then, with MySQL stopped, I did a file level backup to a different partition. I used the MySQL 5.6.16 version that comes with Ubuntu 14.04, at the time Percona server was not available for 14.04. Each benchmark followed this procedure:

  1. Stop mysql
  2. umount /var/lib/mysql
  3. comment or uncomment skip-innodb_doublewrite in my.cnf
  4. mount /var/lib/mysql with specific options
  5. copy the reference backup to /var/lib/mysql
  6. Start mysql and wait for the buffer pool load to complete
  7. start tpcc from another server

The tpcc_start I used it the following:

./tpcc_start -h10.2.2.247 -P3306 -dtpcc -utpcc -ptpcc -w200 -c32 -r300 -l3600 -i60

I used 32 connections, let the tool run for 300s of warm up, enough to reach a steady level of dirty pages, and then, I let the benchmark run for one hour, reporting results every minute.

ResultsTest:Double write bufferFile system optionsAverage NOPTM over 1hext4_dwYesrw690ext4_dionolock_dwYesrw,dioread_nolock668ext4_nodwNorw1107ext4trx_nodwNorw,data=journal1066xfs_dwYesxfs rw,noatime754

 

So, from the above table, the first test I did was the common ext4 with the Innodb double write buffer enabled and it yielded 690 new order transactions per minute (NOTPM). Reading the ext4 doc, I also wanted to try the “dioread_nolock” setting that is supposed to reduce mutex contention and this time, I got slightly less 668 NOTPM. The difference is within the measurement error and isn’t significant. Removing the Innodb double write buffer, although unsafe, boosted the throughput to 1107 NOTPM, a 60% increase! Wow, indeed the double write buffer has a huge impact. But what is the impact of asking the file system to replace the innodb double write buffer? Surprisingly, the performance level is only slightly lower at 1066 NOTPM and vmstat did report twice the amount writes. I needed to redo the tests a few times to convince myself. Getting a 55% increase in performance with the same hardware is not common except when some trivial configuration errors are made. Finally, I used to propose xfs with the Innodb double write buffer enabled to customers, that’s about 10% higher than ext4 with the Innodb double write buffer, close to what I was expecting. The graphic below presents the numbers in a more visual form.

TPCC NOTPM for various configurations

In term of performance stability, you’ll find below a graphic of the per minute NOTPM output for three of the tests, ext4 non-transactional with the double write buffer, ext4 transactional without the double write buffer and xfs with the double write buffer. The dispersion is qualitatively similar for all three. The values presented above are just the averages of those data sets.

TPCC NOTPM evolution over time

Safety

Innodb data corruption is not fun and removing the innodb double write buffer is a bit scary. In order to be sure it is safe, I executed the following procedure ten times:

  1. Start mysql and wait for recovery and for the buffer pool load to complete
  2. Check the error log for no corruption
  3. start tpcc from another server
  4. After about 10 minutes, physically unplug the server
  5. Plug back and restart the server

I observed no corruption. I was still a bit preoccupied, what if the test is wrong? I removed the “data=journal” mount option and did a new run. I got corruption the first time. So given what the procedure I followed and the number of crash tests, I think it is reasonable to assume it is safe to replace the InnoDB double write buffer by the ext4 transactional journal.

I also looked at the kernel ext4 sources and changelog. Up to recently, before kernel 3.2, O_DIRECT wasn’t supported with data=journal and MySQL would have issued a warning in the error log. Now, with recent kernels, O_DIRECT is mapped to O_DSYNC and O_DIRECT is faked, always for data=journal, which is exactly what is needed. Indeed, I tried “innodb_flush_method = O_DSYNC” and found the same results. With older kernels I strongly advise to use the “innodb_flush_method = O_DSYNC” setting to make sure files are opened is a way that will cause them to be transactional for ext4. As always, test thoroughfully, I only tested on Ubuntu 14.04.

Impacts on MyISAM

Since we are no longer really using O_DIRECT, even if set in my.cnf, the OS file cache will be used for InnoDB data. If the database is only using InnoDB that’s not a big deal but if MyISAM is significantly used, that may cause performance issues since MyISAM relies on the OS file cache so be warned.

Fast SSDs

If you have a SSD setup that doesn’t offer a transactional file system like the FusionIO directFS, a very interesting setup would be to mix spinning drives and SSDs. For example, let’s suppose we have a mirror of spinning drives handled by a raid controller with a write cache (and a BBU) and an SSD storage on a PCIe card. To reduce the write load to the SSD, we could send the file system journal to the spinning drives using the “journal_path=path” or “journal_dev=devnum” options of ext4. The raid controller write cache would do an awesome job at merging the write operations for the file system journal and the amount of write operations going to the SSD would be cut by half. I don’t have access to such a setup but it seems very promising performance wise.

Conclusion

Like ZFS, ext4 can be transactional and replacing the InnoDB double write buffer with the file system transaction journal yield a 55% increase in performance for write intensive workload. Performance gains are also expected for SSD and mixed spinning/SSD configurations.

The post How to improve InnoDB performance by 55% for write-bound loads appeared first on MySQL Performance Blog.

A technical WebScaleSQL review and comparison with Percona Server

Latest MySQL Performance Blog posts - May 22, 2014 - 3:00am

The recent WebScaleSQL announcement has made quite a splash in the MySQL community over the last few weeks, and with a good reason. The collaboration between the major MySQL-at-scale users to develop a single code branch that addresses the needs of, well, web scale, is going to benefit the whole community. But I feel that the majority of community opinions and comments to date have been based on the announcement itself and the organizational matters only. What we have been missing is an actual look at the code. What actual new features and bug-fixes are there? Let’s take a look.

At the same time, as Percona is also a developer of an enhanced MySQL replacement database server, it’s natural to try to compare the two. So let’s try to do that as well, but an important caveat applies here. Both MySQL branches (a branch and an upstream-tracking fork would be more exact) are being developed with different goals and for different end users. WebScaleSQL is all things scale-performance: diagnostics, specific features-for a relatively narrow and highly proficient group of users. There are no binary releases, the code base is supposed to serve as a basis for further code branches, specific to each corporate contributor. On the other hand, Percona Server is a general-purpose server that is developed with broad input from Percona’s customers, professional services departments, and general community. Thus, it would be unfair to say that one of the branches should be considered better than the other just because a certain feature is missing or not reaching as far. The software serves different needs.

The rest of this post is an annotated list of WebScaleSQL-specific code commits: user-visible features, performance fixes, general fixes, and finally the stuff of interest to developers.

New features
  • Ability for clients to specify millisecond (as opposed to second in MySQL) read/write/connect timeouts. This patch also carries an internal cleanup to introduce a timeout data type to avoid second-milisecond unit conversion errors.
  • Super read-only when regular read-only is not enough, that is, when writes by SUPER users need to be prevented as well.

All of the above are absent from Percona Server but possible to merge if there is interest. I’d also note that this list is rather short at the moment with some obvious stuff missing, such as the user statistics patch. I’d expect this to change in near future.

Performance-related features and fixes

It is Web Scale, remember.

These two changes require understanding on the user’s part of what are the tradeoffs. They are missing in Percona Server, but, again, possible to merge if there is interest in them.

Again Percona Server does not carry these. Different from the previous ones, these should be safe for every single user and we could merge them without having to give any further thought to their merits. Oracle MySQL should do the same.

That is one general performance-related change that Percona Server has too.

InnoDB flushing performance fixes

These would belong to the previous section, but I’d like to highlight them separately. We spent a lot of effort to analyse and improve the 5.6 InnoDB flushing before the Percona Server 5.6 GA release and continue to do so in the point releases. The WebScaleSQL changes below show that we and they have discovered a lot of identical improvement areas independently, and provided different fixes for the same issues. For an overview of XtraDB 5.6 changes in this area, go here and here. Note that these changes are somewhat more extensive, especially for the high-concurrency cases.

  • Back port of 5.7 WL #7047 and a fix for MySQL bug #71411 (fixed in Percona Server too). WL #7047 reduces the buffer list scan complexity. We have identified the same issue but attempted to work around it with flushing heuristic tweaks.
  • Fix for MySQL bugs #70500 and #71988 to remove potential flushing instabilities. Both fixed in Percona Server.
  • Fix for MySQL bug #62534, enabling finer-grained setting of innodb_max_dirty_pages_pct and unbreaking it for value zero. Not fixed in Percona Server, but the Oracle fix should be coming in 5.6.19.
  • Fix for MySQL bug #70899, removing redundant flush on server startup, which should speed up crash recovery with large buffer pools. Not fixed in Percona Server. Oracle fix expected in 5.6.20.
  • Ability to specify idle system flushing rate. Absent in Percona Server. I believe it should be possible to get the same effect by tuning existing variables: setting innodb_io_capacity lower and innodb_io_capacity_max higher, but it needs experimenting before being able to tell for sure.
General fixes

Fixes for assorted MySQL bugs. None of them are present in Percona Server, they might be merged as needed. Our own list of assorted MySQL bugs we have fixed is here. I have omitted fixes for MySQL developer-specific bugs, these are listed in the next session.

  • Fix for MySQL bug #64751 – Make NO_UNSIGNED_SUBTRACTION SQL mode work for additions too, i.e. unifying the cases of “foo – 1″ and “foo + -1″.
  • Stop spawning one extra thread on server startup to work around a bug in glibc that was fixed in 2006. Interestingly I was not able to find any MySQL bug report for this. Anyone?
  • Preserve slave I/O thread connection settings if compression is used. Again, is there a MySQL bug report for this?
  • Fix for MySQL bug #64347 –  database option mix-up if lower_case_table_names = 0 and the database names differ only by case.
Build changes

Making MySQL play nicer with the system libs, and other assorted changes.

  • Static linking of semisync replication plugins, based on a MariaDB patch. This might have a performance angle to it – MySQL bug #70218?
  • Do not embed OpenSSL and zlib in the static libraries.
  • Fix building with system OpenSSL and zlib. Using system libraries whenever possible make packaging easier and more conformant to Linux distribution requirements. We have been working on this too.
  • Fix building with system libreadline (MySQL bug #63130, closed without fix). Likewise.
Developer changes

These are patches of interest to MySQL / WebScaleSQL developers and not immediately visible for end users. I’m omitting some things, such as testcase compatibility with various build options, testsuite timeout tweaks, and patches that integrate with tools used for project development: Jenkins, Phabricator, etc.

  • Switch to C++11 and C99, the newer C and C++ language versions. It’s a big change from development perspective and one that is possible to pull off only if the project does not need to support older systems and their compilers (or even the newest compilers on some platforms). This is precisely the kind of thing that is easiest to implement for WebScaleSQL than for everybody else. As for the benefits of the change, the project already makes use of C++11 memory model – see the next item.
  • An efficient atomic stat counter framework, using C++11. I wonder how its performance compares to that of get_sched_indexer_t, which is present in Oracle MySQL 5.6, but not used?
  • Making the Performance Schema MTR suite slightly more sane by not recording stuff that tests nothing and at the same time is prone to change. Performance Schema MTR bits are something I’m sure every single 5.6 branch developer has encountered. This particular commit fixes MySQL bug #68714. Fixed in Percona Server. This is useful if one configures the build to re-enable the Performance Schema.
  • More of the same. Half of that commit fixes MySQL bug #68635, which is fixed in Percona Server too but unfortunately was considered by upstream as not requiring any fixes.
  • Stabilising the MTR testsuite, SHOW PROCESSLIST bits. Is there a MySQL bug report for this?
  • Stabilising the MTR testsuit, missing ORDER BY in 5.6.17 bits. Likewise, is there a bug report for this one?
  • Fix for MTR breaking if there is a ‘@’ somewhere in the working directories. Jenkins CI likes to put ‘@’ there. Same question re. bug report?
  • Unbreak a bunch of tests in the parts suite. This looks to me like MySQL bug #69252, but it has been already fixed by Oracle. Is the WebScaleSQL fix still required?
  • Re-enable AIO if MTR –mem option is passed.
  • Stress tests in MTR.
  • Fix compilation with Bison 3, based on a MariaDB patch (MySQL bug #71250). Fixed in Percona Server.
  • Fix compilation warnings (more). A bug report?
  • Fix uninitialised variable use warnings as reported by AddressSanitizer. Is there a bug report?
  • Fix a potential out-of-bound access, found by AddressSanitizer. Is there a MySQL bug for this?
  • Fix CMake confusion of two different ways to ask for a debug build resulting in different builds (MySQL bug #70647, fixed in 5.7).

Notice that the last list is quite long, especially if compared to the list of user-visible features added to date. That makes perfect sense for the project at this stage: building a solid development foundation first so that the features can follow in good quality and reduced maintenance effort.  Add a whole bunch of performance fixes to make a big picture view for today: A solid foundation for further development; numerous performance fixes; a few general fixes and new features.

As for comparing to Percona Server, currently the biggest overlap is in the performance-InnoDB flushing-fixes. For the rest, we can merge from WebScaleSQL as necessary – if you think that a certain WebScaleSQL feature or a fix would benefit you, drop us a line to discuss the options. And of course we invite WebScaleSQL to take any our fixes or patches they would like.

The post A technical WebScaleSQL review and comparison with Percona Server appeared first on MySQL Performance Blog.

MySQL High Availability With Percona XtraDB Cluster (Percona MySQL Training)

Latest MySQL Performance Blog posts - May 21, 2014 - 3:00am

I’ve had the opportunity to train lots of people on Percona XtraDB Cluster (PXC) over the last few years the product has existed.  This has taken the form of phone calls, emails, blog posts, webinars, and consulting engagements. This doesn’t count all the time I’ve spent grilling Codership on how things actually work.  But it has culminated in the PXC tutorial I have given annually at Percona Live MySQL Conference and Expo in Santa Clara, California for the last two years.  Baron even attended this year and had this say:

“Jay Janssen’s tutorial on Percona XtraDB Cluster was impressive. I can’t imagine how much time he must have spent preparing for that.”

(Baron: Thanks for this compliment, BTW)  The answer is: I haven’t kept track, but I’ve delivered it 3 times at conferences and it gets better each time.  I started working on it during the summer of 2012 for Percona Live NYC that year and have put in countless hours since making it better.  This year, despite my voice problems, I was really happy with how it came out.

We’ve had lots of requests for repeats of this tutorial and with more detail.  As such, I’ll be doing a 2-day training about MySQL HA with Percona XtraDB Cluster in sunny San Jose, California on July 16th -17th 2014.  This Percona MySQL Training session is appropriately titled, “MySQL High Availability With Percona XtraDB Cluster” and you can get more information and also register here.

This course will include an overview of how Percona XtraDB Cluster fits into the MySQL High Availability world, followed by a deep-dive into PXC.  All the content of the tutorial will be covered, but it will be spread out with more details on all the components and important bits of using PXC and Galera.

If you’re unfamiliar with my tutorial — it is intensively hands-on.  This will not be 2 days of lecture: You’ll be setting up and experimenting with a real cluster on the fly during the class. Expect plenty of command-line work and to leave with a much better sense of how to truly operate and use a Percona XtraDB Cluster.

Once again, you can check out the details on our website and register NOW to attend right here.

The post MySQL High Availability With Percona XtraDB Cluster (Percona MySQL Training) appeared first on MySQL Performance Blog.

Database auditing alternatives for MySQL

Latest MySQL Performance Blog posts - May 20, 2014 - 8:08am

Database auditing is the monitoring of selected actions of database users. It doesn’t protect the database in case privileges are set incorrectly, but it can help the administrator detect mistakes.

Audits are needed for security. You can track data access and be alerted to suspicious activity. Audits are required for data integrity. They are the only way to validate that changes made to data are correct and legal.

There are several regulations that require database audits:

  • Sarbanes-Oxley (SOX) Act of 2002 is a US federal law that regulates how financial data must be handled and protected.
  • Payment Card Industry Data Security Standard, otherwise known as PCI-DSS is an international standard developed to protect cardholder’s data.
  • Health Insurance Portability and Accountability Act (HIPAA) enacted by the U.S. Congress to protect medical and personal information.

MySQL since version 5.5.3 provides the Audit Plugin API which can be used to write an Audit Plugin. The API provides notification for the following events:

  • messages written to general log (LOG)
  • messages written to error log (ERROR)
  • query results sent to client (RESULT)
  • logins (including failed) and disconnects (CONNECT)

All current audit plugins for MySQL provide an audit log as result of their work. They differ in record format, filtering capabilities and verbosity of log records.

McAfee MySQL Audit Plugin
This plugin is available for MySQL versions 5.1, 5.5, 5.6. It does not officially support Percona Server and MariaDB. It doesn’t use the Audit API and has better verbosity and better filtering features. This is achieved by binary patching the server at runtime inserting the hooks which extract data stored in known offsets in memory. Thus, the plugin is sensitive to any changes of server code.

Summary:

  • json log format
  • log to file or UNIX socket (allows to log with syslog-ng)
  • filter logged events by users, databases and tables, commands (insert, update, delete)

Oracle Enterprise Audit Log Plugin
Oracle provides this audit plugin as a part of the MySQL Enterprise pack. It uses the MySQL Audit API and is able to log RESULT and CONNECT events. The plugin has support for two XML-based formats.

Summary:

  • XML format
  • log to file
  • filter by event type

MariaDB Audit Plugin
MariaDB developers extended the MySQL Audit API by adding fields for existing events and adding new TABLE event which notifies of operation with tables (read, write, create, drop, alter). The plugin can still be used with MySQL and Percona Server but MariaDB’s additions will not be available.

Summary:

  • CSV log format
  • log to file or syslog
  • filter by users, event types

Percona Server Audit Log feature
Percona has developed an audit log feature that is a part of Percona Server since 5.5.35-37.0 and 5.6.17-65.0. It’s goal is to be compatible with Oracle’s Enterprise Audit Plugin providing a similar set of features for Percona Server users. It asynchronously logs all queries and connections in order to “audit” Percona Server usage, without the overhead of the General Query Log. The Audit Log feature can be very beneficial for web applications that deal with sensitive data (e.g., credit card numbers or medical records) and require security compliance (e.g., HIPAA or SOX). Administrators of multi-tenant applications or MySQL as a service can easily audit data access from a security and performance standpoint when using the Audit Log feature in Percona Server. The Audit Log feature is helpful for investigating and troubleshooting issues and auditing performance, too. The Audit Log feature can be dynamically enabled (does not require a server restart).

The post Database auditing alternatives for MySQL appeared first on MySQL Performance Blog.

Errant transactions: Major hurdle for GTID-based failover in MySQL 5.6

Latest MySQL Performance Blog posts - May 19, 2014 - 12:00am

I have previously written about the new replication protocol that comes with GTIDs in MySQL 5.6. Because of this new replication protocol, you can inadvertently create errant transactions that may turn any failover to a nightmare. Let’s see the problems and the potential solutions.

In short
  • Errant transactions may cause all kinds of data corruption/replication errors when failing over.
  • Detection of errant transactions can be done with the GTID_SUBSET() and GTID_SUBTRACT() functions.
  • If you find an errant transaction on one server, commit an empty transaction with the GTID of the errant one on all other servers.
  • If you are using a tool to perform the failover for you, make sure it can detect errant transactions. At the time of writing, only mysqlfailover and mysqlrpladmin from MySQL Utilities can do that.
What are errant transactions?

Simply stated, they are transactions executed directly on a slave. Thus they only exist on a specific slave. This could result from a mistake (the application wrote to a slave instead of writing to the master) or this could be by design (you need additional tables for reports).

Why can they create problems that did not exist before GTIDs?

Errant transactions have been existing forever. However because of the new replication protocol for GTID-based replication, they can have a significant impact on all servers if a slave holding an errant transaction is promoted as the new master.

Compare what happens in this master-slave setup, first with position-based replication and then with GTID-based replication. A is the master, B is the slave:

# POSITION-BASED REPLICATION # Creating an errant transaction on B mysql> create database mydb; # Make B the master, and A the slave # What are the databases on A now? mysql> show databases like 'mydb'; Empty set (0.01 sec)

As expected, the mydb database is not created on A.

# GTID-BASED REPLICATION # Creating an errant transaction on B mysql> create database mydb; # Make B the master, and A the slave # What are the databases on A now? mysql> show databases like 'mydb'; +-----------------+ | Database (mydb) | +-----------------+ | mydb | +-----------------+

mydb has been recreated on A because of the new replication protocol: when A connects to B, they exchange their own set of executed GTIDs and the master (B) sends any missing transaction. Here it is the create database statement.

As you can see, the main issue with errant transactions is that when failing over you may execute transactions ‘coming from nowhere’ that can silently corrupt your data or break replication.

How to detect them?

If the master is running, it is quite easy with the GTID_SUBSET() function. As all writes should go to the master, the GTIDs executed on any slave should always be a subset of the GTIDs executed on the master. For instance:

# Master mysql> show master status\G *************************** 1. row *************************** File: mysql-bin.000017 Position: 376 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 8e349184-bc14-11e3-8d4c-0800272864ba:1-30, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-7 # Slave mysql> show slave status\G [...] Executed_Gtid_Set: 8e349184-bc14-11e3-8d4c-0800272864ba:1-29, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-9 # Now, let's compare the 2 sets mysql> > select gtid_subset('8e349184-bc14-11e3-8d4c-0800272864ba:1-29, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-9','8e349184-bc14-11e3-8d4c-0800272864ba:1-30, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-7') as slave_is_subset; +-----------------+ | slave_is_subset | +-----------------+ | 0 | +-----------------+

Hum, it looks like the slave has executed more transactions than the master, this indicates that the slave has executed at least 1 errant transaction. Could we know the GTID of these transactions? Sure, let’s use GTID_SUBTRACT():

select gtid_subtract('8e349184-bc14-11e3-8d4c-0800272864ba:1-29, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-9','8e349184-bc14-11e3-8d4c-0800272864ba:1-30, 8e3648e4-bc14-11e3-8d4c-0800272864ba:1-7') as errant_transactions; +------------------------------------------+ | errant_transactions | +------------------------------------------+ | 8e3648e4-bc14-11e3-8d4c-0800272864ba:8-9 | +------------------------------------------+

This means that the slave has 2 errant transactions.

Now, how can we check errant transactions if the master is not running (like master has crashed, and we want to fail over to one of the slaves)? In this case, we will have to follow these steps:

  • Check all slaves to see if they have executed transactions that are not found on any other slave: this is the list of potential errant transactions.
  • Discard all transactions originating from the master: now you have the list of errant transactions of each slave

Some of you may wonder how you can know which transactions come from the master as it is not available: SHOW SLAVE STATUS gives you the master’s UUID which is used in the GTIDs of all transactions coming from the master.

How to get rid of them?

This is pretty easy, but it can be tedious if you have many slaves: just inject an empty transaction on all the other servers with the GTID of the errant transaction.

For instance, if you have 3 servers, A (the master), B (slave with an errant transaction: XXX:3), and C (slave with 2 errant transactions: YYY:18-19), you will have to inject the following empty transactions in pseudo-code:

# A - Inject empty trx(XXX:3) - Inject empty trx(YYY:18) - Inject empty trx(YYY:19) # B - Inject empty trx(YYY:18) - Inject empty trx(YYY:19) # C - Inject empty trx(XXX:3)

Conclusion

If you want to switch to GTID-based replication, make sure to check errant transactions before any planned or unplanned replication topology change. And be specifically careful if you use a tool that reconfigures replication for you: at the time of writing, only mysqlrpladmin and mysqlfailover from MySQL Utilities can warn you if you are trying to perform an unsafe topology change.

The post Errant transactions: Major hurdle for GTID-based failover in MySQL 5.6 appeared first on MySQL Performance Blog.

Introduction to the Percona MySQL Audit Log Plugin

Latest MySQL Performance Blog posts - May 16, 2014 - 6:00am

Percona has developed a MySQL Audit Log plugin that is now included in Percona Server since the recent 5.5 and 5.6 releases. This implementation is alternative to the MySQL Enterprise Audit Log Plugin: Percona re-implemented the Audit Plugin code as GPL as Oracle’s code was closed source. This post is a quick introduction to this plugin.

Installation
There are two ways to install the Percona MySQL Audit Plugin:

INSTALL PLUGIN audit_log SONAME 'audit_log.so';

or in my.cnf

[mysqld] plugin-load="audit_log=audit_log.so"

Verify installation

mysql> SHOW PLUGINS\G ... *************************** 38. row ***************************   Name: audit_log Status: ACTIVE   Type: AUDIT Library: audit_log.so License: GPL 38 rows in set (0.00 sec)

Configuration
Let’s see variables provided by the Percona MySQL Audit Plugin:

mysql> show global variables like 'audit%'; +--------------------------+--------------+ | Variable_name            | Value        | +--------------------------+--------------+ | audit_log_buffer_size    | 1048576      | | audit_log_file           | audit.log    | | audit_log_flush          | OFF          | | audit_log_format         | OLD          | | audit_log_policy         | ALL          | | audit_log_rotate_on_size | 0            | | audit_log_rotations      | 0            | | audit_log_strategy       | ASYNCHRONOUS | +--------------------------+--------------+ 7 rows in set (0.00 sec)

The Percona MySQL Audit Plugin can log using the memory buffer to deliver better performance. Messages will be written into memory buffer first and then flushed to file in background. A certain amount of events can be lost in case of server crash or power outage. Another option is to log directly to file without using memory buffer. There is also an option to fsync every event.

Set audit_log_strategy to control log flushing:

  • ASYNCHRONOUS log using memory buffer, do not drop events if buffer is full
  • PERFORMANCE log using memory buffer, drop events if buffer is full
  • SEMISYNCHRONOUS log directly to file, do not fsync every event
  • SYNCHRONOUS log directly to file, fsync every event

audit_log_buffer_size specifies the size of memory buffer, it makes sense only for ASYNCHRONOUS and PERFORMANCE strategy.

Variable audit_log_file specifies the file to log into. It’s value can be path relative to datadir or absolute path.

The Percona MySQL Audit Plugin can automatically rotate log file based on size. Set audit_log_rotate_size to enable this feature. File is rotated when log grew in size to specified amount of bytes. Set audit_log_rotations to limit the number of log files to keep.

It is possible to log only logins or only queries by setting audit_log_policy value.

Log file format
Lets see how audit records look like

OLD format (audit_log_format = OLD):

<AUDIT_RECORD  "NAME"="Connect"  "RECORD"="2_2014-04-21T12:34:32"  "TIMESTAMP"="2014-04-21T12:34:32 UTC"  "CONNECTION_ID"="1"  "STATUS"="0"  "USER"="root"  "PRIV_USER"="root"  "OS_LOGIN"=""  "PROXY_USER"=""  "HOST"="localhost"  "IP"=""  "DB"="" />

NEW format (audit_log_format = NEW):

<AUDIT_RECORD>  <NAME>Connect</NAME>  <RECORD>17481_2014-04-21T12:39:03</RECORD>  <TIMESTAMP>2014-04-21T12:39:05 UTC</TIMESTAMP>  <CONNECTION_ID>4</CONNECTION_ID>  <STATUS>0</STATUS>  <USER>root</USER>  <PRIV_USER>root</PRIV_USER>  <OS_LOGIN></OS_LOGIN>  <PROXY_USER></PROXY_USER>  <HOST>localhost</HOST>  <IP></IP>  <DB>test</DB> </AUDIT_RECORD>

The difference is that the audit record in the OLD format was written as a single element with attributes, while in the NEW format it is written as a single element with sub-elements.

A good idea of what each sub-element means can be found in Audit Plugin API documentation here: https://dev.mysql.com/doc/refman/5.6/en/writing-audit-plugins.html.

Performance
Lets compare performance of different audit_log_strategy modes. I used readonly sysbench on my laptop for it. Workload is CPU-bound with dataset fit in buffer pool and I set number of sysbench threads to the amount for which count of transactions per seconds is maximum.

I got TPS drop for PERFORMANCE and ASYNCHRONOUS strategies around 7%, 9% for SEMISYNCHRONOUS and 98% for SYNCHRONOUS which shows that syncing every logged statement to disk is not the best thing for performance.

 

Conclusion
Of course any software has bugs and this plugin has plenty of them. Please give it a try and provide us your feedback. Report any issues here: https://bugs.launchpad.net/percona-server.

The post Introduction to the Percona MySQL Audit Log Plugin appeared first on MySQL Performance Blog.

Benchmark: SimpleHTTPServer vs pyclustercheck (twisted implementation)

Latest MySQL Performance Blog posts - May 16, 2014 - 12:00am

Github user Arianlzt provided a python-twisted alternative version of pyclustercheck per discussion on issue 7.

Due to sporadic performance issues noted with the original implementation in SimpleHTTPserver, the benchmarks which I’ve included as part of the project on github use mutli-mechanize library,

  • cache time 1 sec
  • 2 x 100 thread pools
  • 60s ramp up time
  • 600s total duration
  • testing simulated node fail (always returns 503, rechecks mysql node on cache expiry)
  • AMD FX(tm)-8350 Eight-Core Processor
  • Intel 330 SSD
  • local loop back test (127.0.0.1)

The SimpleHTTPServer instance faired as follows:

Right away we can see around 500TPS throughput, however as can be seen in both response time graphs there are “outlying” transactions, something is causing the response time to spike dramatically  SimpleHTTPServer, how does the twisted alternative compare? (note the benchmarks are from the current HEAD with features re-added to avoid regression, caching and ipv6 support)

 

Ouch! We appear to have taken a performance hit, at least in terms of TPS -19% rough estimates however compare the response time graphs to find a much more consistent plot, we had outliers hitting near  70s for SimpleHTTP server, we’re always under 1s within twisted.

Great! So why isn’t this merged into the master branch as the main project and therfor bundled with Percona XtraDB Cluster (PXC)? The issue here is the required version of python-twisted; ipv6 support was introduced in issue 8 by user Nibler999 and to avoid regression I re-added support for ipv6 in this commit for twisted

ipv6 support for python-twisted is not in the version distributed to main server OS’s such as

  • EL6 python-twisted 8.x
  • Ubuntu 10.04 LTS python-twisted 11.x

What’s the issue here? Attempting to bind / listen to an ipv6 interface yields the following error: twisted.internet.error.CannotListenError: Couldn't listen on :::8000: [Errno -9] Address family for hostname not supported.

Due to this regression (breaking of ipv6 support) the twisted version can not at this time be merged into master, the twisted version however as can be seen from above is much more consistent and if you have the “cycles” to implement it (e.g. install twisted from pypy via pip / easy_install to get >= 12.x) and test it’s a promising alternative.

To illustrate this further the benchmark was made more gruling:

  • 5 x 100 thread pools
  • 60s ramp up
  • 600s total duration

First the twisted results, note the initial spike is due to a local python issue where it locked up creating a new thread in multi-mechanize:

Now the SimpleHTTPServer results:

Oh dear, as the load increases clearly we get some stability issues inside SimpleHTTP server…

Also worth noting is the timeouts

  • twisted: grep 'timed out' results.csv | wc -l == 0
  • SimpleHTTPServer: grep 'timed out' results.csv | wc -l == 470

 

… in the case of increased load the twisted model performs far more consistently under the same test conditions when compared against SimpleHTTPServer. I include the multi-mechanize scripts as part of the project on GitHub – as such you can recreate these tests yourself and gauge the performance to see if twisted or SimpleHTTP suits your needs.

The post Benchmark: SimpleHTTPServer vs pyclustercheck (twisted implementation) appeared first on MySQL Performance Blog.

High Availability with MySQL Fabric: Part I

Latest MySQL Performance Blog posts - May 15, 2014 - 6:00am

In our previous post, we introduced the MySQL Fabric utility and said we would dig deeper into it. This post is the first part of our test of MySQL Fabric’s High Availability (HA) functionality.

Today, we’ll review MySQL Fabric’s HA concepts, and then walk you through the setup of a 3-node cluster with one Primary and two Secondaries, doing a few basic tests with it. In a second post, we will spend more time generating failure scenarios and documenting how Fabric handles them. (MySQL Fabric is an extensible framework to manage large farms of MySQL servers, with support for high-availability and sharding.)

Before we begin, we recommend you read this post by Oracle’s Mats Kindahl, which, among other things, addresses the issues we raised on our first post. Mats leads the MySQL Fabric team.

Our lab

All our tests will be using our test environment with Vagrant (https://github.com/martinarrieta/vagrant-fabric)

If you want to play with MySQL Fabric, you can have these VMs running in your desktop following the instructions in the README file. If you don’t want full VMs, our colleague Jervin Real created a set of wrapper scripts that let you test MySQL Fabric using sandboxes.

Here is a basic representation of our environment.

Set up

To set up MyQSL Fabric without using our Vagrant environment, you can follow the official documentation, or check the ansible playbooks in our lab repo. If you follow the manual, the only caveat is that when creating the user, you should either disable binary logging for your session, or use a GRANT statement instead of CREATE USER. You can read here for more info on why this is the case.

A description of all the options in the configuration file can be found here. For HA tests, the one thing to mention is that, in our experience, the failure detector will only trigger an automatic failover if the value for failover_interval in the [failure_tracking] section is greater than 0. Otherwise, failures will be detected and written to the log, but no action will be taken.

MySQL configuration

In order to manage a mysqld instance with MySQL Fabric, the following options need to be set in the [mysqld] section of its my.cnf file:

log_bin gtid-mode=ON enforce-gtid-consistency log_slave_updates

Additionally, as in any replication setup, you must make sure that all servers have a distinct server_id.

When everything is in place, you can setup and start MySQL Fabric with the following commands:

[vagrant@store ~]$ mysqlfabric manage setup [vagrant@store ~]$ mysqlfabric manage start --daemon

The setup command creates the database schema used by MySQL Fabric to store information about managed servers, and the start one, well, starts the daemon. The –daemon option makes Fabric start as a daemon, logging to a file instead of to standard output. Depending on the port and file name you configured in fabric.cfg, this may need to be run as root.

While testing, you can make MySQL Fabric reset its state at any time (though it won’t change existing node configurations such as replication) by running:

[vagrant@store ~]$ mysqlfabric manage teardown [vagrant@store ~]$ mysqlfabric manage setup

If you’re using our Vagrant environment, you can run the reinit_cluster.sh script from your host OS (from the root of the vagrant-fabric repo) to do this for you, and also initialise the datadir of the three instances.

Creating a High Availability Cluster:

A High Availability Cluster is a set of servers using the standard Asynchronous MySQL Replication with GTID.

Creating a group

The first step is to create the group by running mysqlfabric with this syntax:

$ mysqlfabric group create <group_name>

In our example, to create the cluster “mycluster” you can run:

[vagrant@store ~]$ mysqlfabric group create mycluster Procedure : { uuid = 605b02fb-a6a1-4a00-8e24-619cad8ec4c7, finished = True, success = True, return = True, activities = }

Add the servers to the group

The second step is add the servers to the group. The syntax to add a server to a group is:

$ mysqlfabric group add <group_name> <host_name or IP>[:port]

The port number is optional and only required if distinct from 3306. It is important to mention that the clients that will use this cluster must be able to resolve this host or IP. This is because clients will connect directly both with MySQL Fabric’s XML-PRC server and with the managed mysqld servers. Let’s add the nodes to our group.

[vagrant@store ~]$ for i in 1 2 3; do mysqlfabric group add mycluster node$i; done Procedure : { uuid = 9d65c81c-e28a-437f-b5de-1d47e746a318, finished = True, success = True, return = True, activities = } Procedure : { uuid = 235a7c34-52a6-40ad-8e30-418dcee28f1e, finished = True, success = True, return = True, activities = } Procedure : { uuid = 4da3b1c3-87cc-461f-9705-28a59a2a4f67, finished = True, success = True, return = True, activities = }

Promote a node as a master

Now that we have all our nodes in the group, we have to promote one of them. You can promote one specific node or you can let MySQL Fabric to choose one for you.

The syntax to promote a specific node is:

$ mysqlfabric group promote <group_name> --slave_uuid='<node_uuid>'

or to let MySQL Fabric pick one:

$ mysqlfabric group promote <group_name>

Let’s do that:

[vagrant@store ~]$ mysqlfabric group promote mycluster Procedure : { uuid = c4afd2e7-3864-4b53-84e9-04a40f403ba9, finished = True, success = True, return = True, activities = }

You can then check the health of the group like this:

[vagrant@store ~]$ mysqlfabric group health mycluster Command : { success = True return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}} activities = }

One current limitation of the ‘health’ command is that it only identifies servers by their uuid. To get a list of the servers in a group, along with quick status summary, and their host names, use lookup_servers instead:

[vagrant@store ~]$ mysqlfabric group lookup_servers mycluster Command : { success = True return = [{'status': 'SECONDARY', 'server_uuid': 'e245ec83-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node1'}, {'status': 'SECONDARY', 'server_uuid': 'e826d4ab-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node2'}, {'status': 'PRIMARY', 'server_uuid': 'edf2c45b-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node3'}] activities = }

We sent a merge request to use a Json string instead of the “print” of the object in the “return” field from the XML-RPC in order to be able to use that information to display the results in a friendly way. In the same merge, we have added the address of the servers in the health command too.

Failure detection

Now we have the three lab machines set up in a replication topology of one master (the PRIMARY server) and two slaves (the SECONDARY ones). To make MySQL Fabric start monitoring the group for problems, you need to activate it:

[vagrant@store ~]$ mysqlfabric group activate mycluster Procedure : { uuid = 230835fc-6ec4-4b35-b0a9-97944c18e21f, finished = True, success = True, return = True, activities = }

Now MySQL Fabric will monitor the group’s servers, and depending on the configuration (remember the failover_interval we mentioned before) it may trigger an automatic failover. But let’s start testing a simpler case, by stopping mysql on one of the secondary nodes:

[vagrant@node2 ~]$ sudo service mysqld stop Stopping mysqld: [ OK ]

And checking how MySQL Fabric report’s the group’s health after this:

[vagrant@store ~]$ mysqlfabric group health mycluster Command : { success = True return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'FAULTY', 'is_alive': False, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}} activities = }

We can see that MySQL Fabric successfully marks the server as faulty. In our next post we’ll show an example of this by using one of the supported connectors to handle failures in a group, but for now, let’s keep on the DBA/sysadmin side of things, and try to bring the server back online:

[vagrant@node2 ~]$ sudo service mysqld start Starting mysqld: [ OK ] [vagrant@store ~]$ mysqlfabric group health mycluster Command : { success = True return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'FAULTY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}} activities = }

So the server is back online, but Fabric still considers it faulty. To add the server back into rotation, we need to look at the server commands:

[vagrant@store ~]$ mysqlfabric help server Commands available in group 'server' are: server set_weight uuid weight [--synchronous] server lookup_uuid address server set_mode uuid mode [--synchronous] server set_status uuid status [--update_only] [--synchronous]

The specific command we need is set_status, and in order to add the server back to the group, we need to change it’s status twice: first to SPARE and then back to SECONDARY. You can see what happens if we try to set it to SECONDARY directly:

[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SECONDARY Procedure : { uuid = 9a6f2273-d206-4fa8-80fb-6bce1e5262c8, finished = True, success = False, return = ServerError: Cannot change server's (e826d4ab-d889-11e3-86df-0800274fb806) status from (FAULTY) to (SECONDARY)., activities = }

So let’s try it the right way:

[vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SPARE Procedure : { uuid = c3a1c244-ea8f-4270-93ed-3f9dfbe879ea, finished = True, success = True, return = True, activities = } [vagrant@store ~]$ mysqlfabric server set_status e826d4ab-d889-11e3-86df-0800274fb806 SECONDARY Procedure : { uuid = 556f59ec-5556-4225-93c9-b9b29b577061, finished = True, success = True, return = True, activities = }

And check the group’s health again:

[vagrant@store ~]$ mysqlfabric group health mycluster Command : { success = True return = {'e245ec83-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'e826d4ab-d889-11e3-86df-0800274fb806': {'status': 'SECONDARY', 'is_alive': True, 'threads': {}}, 'edf2c45b-d889-11e3-86df-0800274fb806': {'status': 'PRIMARY', 'is_alive': True, 'threads': {}}} activities = }

In our next post, when we discuss how to use the Fabric aware connectors, we’ll also test other failure scenarios like hard VM shutdown and network errors, but for now, let’s try the same thing but on the PRIMARY node instead:

[vagrant@node3 ~]$ sudo service mysqld stop Stopping mysqld: [ OK ]

And let’s check the servers again:

[vagrant@store ~]$ mysqlfabric group lookup_servers mycluster Command : { success = True return = [{'status': 'SECONDARY', 'server_uuid': 'e245ec83-d889-11e3-86df-0800274fb806', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': 'node1'}, {'status': 'PRIMARY', 'server_uuid': 'e826d4ab-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node2'}, {'status': 'FAULTY', 'server_uuid': 'edf2c45b-d889-11e3-86df-0800274fb806', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': 'node3'}] activities = }

We can see that MySQL Fabric successfully marked node3 as FAULTY, and promoted node2 to PRIMARY to resolve this. Once we start mysqld again on node3, we can add it back as SECONDARY using the same process of setting it’s status to SPARE first, as we did for node2 above.

Remember that unless failover_interval is greater than 0, MySQL Fabric will detect problems in an active group, but it won’t take any automatic action. We think it’s a good thing that the value for this variable in the documentation is 0, so that automatic failover is not enabled by default (if people follow the manual, of course), as even in mature HA solutions like Pacemaker, automatic failover is something that’s tricky to get right. But even without this, we believe the main benefit of using MySQL Fabric for promotion is that it takes care of reconfiguring replication for you, which should reduce the risk for error in this process, specially once the project becomes GA.

What’s next

In this post we’ve presented a basic replication setup managed by MySQL Fabric and reviewed a couple of failure scenarios, but many questions are left unanswered, among them:

  • What happens to clients connected with a Fabric aware driver when there is a status change in the cluster?
  • What happens when the XML-RPC server goes down?
  • How can we improve its availability?

We’ll try to answer these and other questions in our next post. If you have some questions of your own, please leave them in the comments section and we’ll address them in the next or other posts, depending on the topic.

The post High Availability with MySQL Fabric: Part I appeared first on MySQL Performance Blog.

Why ALTER TABLE runs faster on Percona Server 5.5 vs. MySQL 5.5

Latest MySQL Performance Blog posts - May 15, 2014 - 3:00am

Some of us Perconians are at OpenStack summit this week in Atlanta. Matt Griffin, our director of product management, tweeted about the turbo-hipster CI talk about their experience of ALTER TABLEs running faster on Percona Server. Oracle’s Morgan Tocker then tweeted in response, asking why this was the case. I decided that the simplest way to answer that was here in this post.

The reason for this is the expand_fast_index_creation feature of Percona Server. I did a quick schema change on MySQL 5.5 and Percona Server 5.5 to demonstrate this (in the talk, the speaker mentioned that these versions were used).

The schema modifications in the talk could fall in 2 categories, the ones that could use fast index creation and the ones that could not.

I did the following tests on my laptop, on a sysbench tale with 300k records.

Vanilla MySQL 5.5:

mysql> alter table sbtest1 add index idx_c(c); Query OK, 0 rows affected (4.37 sec)

Percona Server 5.5:

mysql> alter table sbtest1 add index idx_c(c); Query OK, 0 rows affected (3.90 sec)

We know that this used fast index creation from the 0 rows affected. In this case, there is nor substantial difference between the 2 servers, also probably my laptop with CPU frewquency scaling doesn’t have the most consistent performance in the world.

For the second schema change, I added a column which copies the table.

Vanilla MySQL 5.5:

mysql> alter table sbtest1 add column d int default 0; Query OK, 300000 rows affected (37.05 sec) Records: 300000 Duplicates: 0 Warnings: 0

Percona Server 5.5:

mysql> alter table sbtest1 add column d int default 0; Query OK, 300000 rows affected (9.51 sec) Records: 300000 Duplicates: 0 Warnings: 0

The reason for this speed difference is that in case of Percona Server, for the table copy, the table is created only with a primary key, and the secondary indexes are built at the end of the process (rather than on the fly). For more details, check Alexey’s blog post on this topic.

This can be tuned further, by tuning innodb_merge_sort_block_size (in Percona Server 5.6, this is replaced by innodb_sort_buffer_size).

mysql> select @@innodb_merge_sort_block_size/1024/1024; +------------------------------------------+ | @@innodb_merge_sort_block_size/1024/1024 | +------------------------------------------+ | 1.00000000 | +------------------------------------------+ 1 row in set (0.00 sec) mysql> set innodb_merge_sort_block_size=8*1024*1024; Query OK, 0 rows affected (0.00 sec) mysql> alter table sbtest1 add column d int default 0; Query OK, 300000 rows affected (8.61 sec) Records: 300000 Duplicates: 0 Warnings: 0

So, in order to be accurate, schema changes are faster in Percona Server if they are table copies and if the tables have secondary indexes.

The post Why ALTER TABLE runs faster on Percona Server 5.5 vs. MySQL 5.5 appeared first on MySQL Performance Blog.

Tips on benchmarking Go + MySQL

Latest MySQL Performance Blog posts - May 14, 2014 - 9:00am

We just released, as an open source release, our new percona-agent (https://github.com/percona/percona-agent), the agent to work with Percona Cloud Tools. This agent is written in Go.

I will give a webinar titled “Monitoring All MySQL Metrics with Percona Cloud Tools” on June 25 that will cover the new features in percona-agent and Percona Cloud Tools, where I will also explain how it works. You are welcome to register now and join me.

There will be more posts about percona-agent, but in the meantime I want to dedicate this one to Go, Go with MySQL and some performance topics.

I have had an interest in the Go programming language for a long time, but in initial versions I did not quite like the performance of the gorountine scheduler. See my report from more than two years ago on runtime: reduce scheduling contention for large $GOMAXPROCS.

Supposedly this performance issue was fixed in Go 1.1, so this is a good time to revisit my benchmark experiment.

A simple run of prime or fibonachi numbers calculation in N threas is quite boring, so I am going to run queries against Percona Server. Of course it adds some complication as there are more moving parts (i.e. go scheduler, go sql driver, MySQL by itself), but it just makes the experiment more interesting.

Source code of my benchmark: Go-pk-bench:
This is probably not the best example of how to code in Go, but that was not the point of this exercise. This post is really about some tips to take into account when writing an application in Go using a MySQL (Percona Server) database.

So, first, we will need a MySQL driver for Go. The one I used two years ago (https://github.com/Philio/GoMySQL) is quite outdated. It seems the most popular choice today is Go-MySQL-Driver, and this is the one we use for internal development. This driver is based on the standard Go “database/sql” package. This package kind of provides a standard Go-way to deal with SQL-like databases. “database/sql” seems to work out OK, with some questionable design decisions as for my taste. So using “database/sql” and Go-MySQL-Driver you will need to deal with some quirks like almost unmanageable connection pool.

The first thing you should take into account it is a proper setting of
runtime.GOMAXPROCS().

If you do not do that, Go scheduler will use the default, which is 1. That binary will use one and only 1 CPU (so much for a modern concurrent language).

The command runtime.GOMAXPROCS(runtime.NumCPU())
will prescribe to use all available CPUs. Always remember to use this if you care about multi-threaded performance.

The next problem I faced in the benchmark is that when I ran queries in a loop, i.e. to repeat as much possible…

rows, err := db.Query("select k from sbtest"+strconv.Itoa(tab+1)+" where id = "+strconv.Itoa(i))

… very soon we ran out of TCP ports. Apparently “database/sql” and Go-MySQL-Driver and its smart connection pool creates a NEW CONNECTION for each query. I can explain why this happens, but using the following statement:

'db.SetMaxIdleConns(10000)'

helps (I hope somebody with “database/sql” knowledge will explain what it is doing).

So after these adjustments we now can run the benchmark, which by query you see is quite simple – run primary key lookups against Percona Server which we know scales perfectly in this scenario (I used sysbench to create 64 tables 1mln rows each, all this fits into memory). I am going to run this benchmark with 1, 2, 4, 8, 16, 24, 32, 48, 64 user threads.

Below you can see graphs for MySQL Throughput and CPU Usage (both graph are built using new metrics graphing in Percona Cloud Tools)

MySQL Throughput (user threads are increasing from 1 to 64)

CPU Usage (user threads are increasing from 1 to 64)

I would say the result scales quite nicely, at least it is really much better than it was two years ago. It is interesting to compare with something, so there is a graph from an identical run, but now I will use sysbench + lua for main workload driver.

MySQL Throughput (sysbench, user threads are increasing from 1 to 64)

CPU Usage (sysbench, user threads are increasing from 1 to 64)

From the graphs (this is what I like them for), we can clearly see increases in User CPU utilization (and actually we are able to use CPUs on 100% in user+system usage) and it clearly corresponds to increased throughput.

And if you are a fan of raw numbers:

MySQL Throughput, q/s (more is better) Threads | Go-MySQL | sysbench 1 | 13,189 | 16,765 2 | 26,837 | 33,534 4 | 52,629 | 65,943 8 | 95,553 | 116,953 16 | 146,979 | 182,781 24 | 169,739 | 231,895 32 | 181,334 | 245,939 48 | 198,238 | 250,497 64 | 207,732 | 251,972

(one with a knowledge of Universal Scalability Law can draw a prediction till 1000 threads, I leave it as a homework)

So, in conclusion, I can say that Go+MySQL is able to show decent results, but it is still not as effective as plan raw C (sysbench), as it seems it spends some extra CPU time in system calls.

If you want to try these new graphs in Percona Cloud Tools and see how it works with your system – join the free beta!

The post Tips on benchmarking Go + MySQL appeared first on MySQL Performance Blog.

Pages

Subscribe to Percona aggregator
Contact Us 24 Hours A Day
Support Contact us 24×7
Emergency? Contact us for help now!
Sales North America (888) 316-9775 or
(208) 473-2904
Sales
Europe
+44-208-133-0309 (UK)
0-800-051-8984 (UK Toll Free)
0-800-181-0665 (GER Toll Free)
More Numbers
Training (855) 55TRAIN or
(925) 271-5054

 

Share This
]]>