Facebook’s Simon Martin on semi-synchronous replication

Latest MySQL Performance Blog posts - September 4, 2015 - 8:41am

Facebook, with 1.49 billion monthly active users,  is one of the world’s top MySQL users. Simon Martin, a production engineer on Facebook’s MySQL Infrastructure team, has been working with MySQL for most of his career, starting from 2 servers built out of spare parts and moving through to one of the largest deployments in the world.

Simon will be sharing some of the challenges Facebook has tackled as it rolled out semi-synchronous replication across our different services at Percona Live Amsterdam on Sept. 22. His talk is aptly titled, “The highs and lows of semi-synchronous replication.” I sat down, virtually, with Simon the other day. Our conversation is below, but first, as a special reward to my readers, save €20 on your Percona Live registration by entering promo code “BlogInterview” at registration. Please feel free to share this offer!

Tom: On a scale from 1-10, how important is MySQL to Facebook? And how does Facebook use MySQL?

Simon: 10. We have a sophisticated in memory caching layer that will serve most requests, but MySQL is the persistent store for our graph. This means all your profile data, all your friends, likes and comments and the same for pages, events, places and the rest are stored permanently in MySQL.

We rely on MySQL in this role for 3 key features. Firstly as the final store it needs to not lose data, and InnoDB is well proven in this space. It needs to be highly available, MySQL and InnoDB are both very stable and we use replication as well to provide redundancy. Finally, even with extensive caching, it needs to be performant, both in latency and throughput, MySQL is both and we can use replication again to spread the read traffic to slaves in remote regions to help here too.

Tom: What are some of the advantages of using Semi-Synchronous Replication at Facebook — and what are the challenges for deployments of that size when using it?

Simon: That’s a big question, I could probably talk for 50 minutes on it! We started looking at Semi-Synchronous as a solution to reduce downtime when a MySQL master, or the host it’s on, crashes. Historically, if you are running a replicated environment and the master crashes, you are faced with a choice. You could promote another slave right away to reduce downtime, but it’s impossible to be sure that any of your slaves got all the transactions off the master. At Facebook we cannot lose people’s data, so we always chose to recover the master and re-connect the slaves before promoting if required. The downside is recovering InnoDB on a busy host can be slow, and if the host is rebooted it will be even slower, giving us many minutes of downtime.

Now that we run Semi-Synchronous replication it means that a master will not commit a transaction until at least one slave has acknowledged receipt of the binary logs for that transaction. With this running when a master crashes we can be sure our most up-to-date slave has all the data, so once it’s applied by the SQL thread we can promote safely without waiting for crash recovery.

There are many challenges in this though. Firstly there is performance, we now need a network round trip for each transaction, so we need the acknowledging slaves to be very close. Slaves in a different data hall, let-alone a different region, will be too slow.

We also need to pay attention to slave availability, previously not having a slave connected to a master for a short time was not a problem, now this will cause writes to stop and connections pile up, so we need to be much more careful about how we manage our replication topology. A target of 99.999% uptime for a service now requires the same SLA on slaves being available and connected locally to acknowledge the commits.

On top of this running at “webscale” adds a layer of requirements of its own. Like the rest of our environment everything needs to be automated, anything that requires a human is not going to scale. So our automation needs to respond to any failure and heal the system without intervention in any circumstance. An edge case that has even a tiny chance of occurring on a given day needs to be handled automatically, to keep our SLA and to stop our engineers constantly having to fix things.

Tom: What are you looking forward to the most at this year’s conference (besides your own talk)?

Simon: I always enjoy the keynotes, they don’t all seem to be announced yet but it’s a great way to get a state of the community update. I’ll certainly stop by “Binlog Servers at Booking.com,” it sounds like they might be doing the same kind of things we are for Semi-Synchronous replication, so it’ll be great to compare ideas. I’ll also be looking at the talks on MySQL 5.7 to get the scoop on what cool new stuff is coming down the pipeline!

The post Facebook’s Simon Martin on semi-synchronous replication appeared first on MySQL Performance Blog.

Proper replication check techniques?

Lastest Forum Posts - September 3, 2015 - 12:54pm
I would like to get my database replication check technique peer reviewed. Any suggestions for improvement? It's fairly simple, in truth.

We have had instances where the mysql master and slave both report in-sync, but data on the master is not consistent with data on the slave. So to me that spells one thing: Read the data, perform a checksum, compare.

So without further ado, here is what I do:

: LOG=/var/log/POOH { mysqlrplsync --master=repli_chk:PASSWD@localhost:3306 --slaves=repli_chk:PASSWD@db-replicant:3306 | tee -a $LOG | grep -qF " [DIFF] "; } && { echo "ERROR in mysql replication." 1>&2; exit 1; } I created user "repli_chk" in accordance with the requirements at the bottom of this page: http://dev.mysql.com/doc/mysql-utili...qlrplsync.html

I run this job twice a week, on Sundays in the afternoon (the day before work starts), and at Friday at 7pm (a couple of hours after work ends and just before backups start).

Proper Backup Techniques?

Lastest Forum Posts - September 3, 2015 - 12:23pm
I would like to get a peer review of my backup script. Am I doing ok? Any recommendations?

We are backing up on the weekends, and I'm not concerned about a database outage at the time of backups. They take less than 5 minutes at this time so performance is quite acceptable.

Without further ado, here is the meaty part:

: suffix="POOH" BACKUP_DIR=/home/mysql.backup LOG_DIR=/var/log/mysql.backup BACKUP_NAME="full-$suffix" FULL_BACKUP_DIR="$BACKUP_DIR/$BACKUP_NAME" echo "flush tables with read lock; set global read_only = ON;" | mysql --defaults-file=/var/lib/mysql/defaults.cnf innobackupex --defaults-file $TMPFILE --no-timestamp $FULL_BACKUP_DIR > $LOG_DIR/$suffix 2>&1 innobackupex --defaults-file $TMPFILE --apply-log --no-timestamp $FULL_BACKUP_DIR >> $LOG_DIR/$suffix 2>&1 pt-table-checksum --defaults-file=/var/lib/mysql/defaults.cnf --chunk-size=1000000 --chunk-size-limit=0 --recursion-method none >/dev/null echo "set global read_only = OFF; unlock tables;" | mysql --defaults-file=/var/lib/mysql/defaults.cnf cd $FULL_BACKUP_DIR mkdir external_files # Columns: db tbl this_crc this_cnt echo "use percona; select db,tbl,this_crc,this_cnt from checksums;" | mysql --defaults-file=/var/lib/mysql/defaults.cnf > external_files/checksums.$suffix

Percona Toolkit and systemd

Latest MySQL Performance Blog posts - September 3, 2015 - 8:18am

After some recent work with systemd I’ve realized it’s power and I can come clean that I am a fan. I realize that there are multitudes of posts out there with arguments both for and against systemd but let’s look at some nice ways to have systemd provide us with (but not limited to) pt-kill-as-a-service.

This brief post introduces you to a systemd unit file and how we can leverage them to enable pt-kill at startup, have it start after mysqld and ensure that MySQL is running by using the mysql service as a dependency of pt-kill. By using systemd to handle this we don’t have to complicate matters by ‘monitoring the monitor’ using hacky shell scripts, cron or utilities like monit.

So then, a quick primer on systemd, because lets face it, we’ve all been avoiding it. Systemd is not new but it made recent headlines in the Linux world due to some of the major distros announcing their intentions to migrate upcoming releases to systemd.

What is it? Well due to it’s depth it is best described as a suite of management daemons, libraries and tools that will replace the traditional init scripts. So essentially remember how you start a service, mount a volume or read the system logs…well start forgetting all of that because systemd is disrupting this space. With systemd comes some really neat tricks for administering your machines and I’m really only beginning to see the tip of this iceberg. There is admittedly a lot to learn with systemd but this should serve as pragmatic entrée.

Systemd what? When did this happen?Linux distributionDate released as defaultArch Linux000000002012-10-01-0000October 2012CoreOS000000002013-10-01-0000October 2013 (v94.0.0)Debian000000002015-04-01-0000April 2015 (v8 aka jessie)Fedora000000002011-05-01-0000May 2011 (v15)Gentoo LinuxN/AMageia000000002012-05-01-0000May 2012 (v2.0)openSUSE000000002012-09-01-0000September 2012 (v12.2)Red Hat Enterprise Linux000000002014-06-01-0000June 2014 (v7.0)SlackwareN/ASUSE Linux Enterprise Server000000002014-10-01-0000October 2014 (v12)Ubuntu000000002015-04-01-0000April 2015 (v15.04)

Lennart Poettering, the name frequently attached with systemd is seeking to modernize the most fundamental process(es) of the Linux startup system, bringing the paradigms of modern computing; concurrency, parallelism and efficiency. The dependency tree of processes and services is more intuitive and the structure of the underlying startup scripts are unified. I feel that the direction proposed by systemd is an evolutionary one which promotes consistency within the startup scripts enabling conventions that can be easier understood by a broader audience.

Systemd and Percona Toolkit

This post aims to show that we can rely on systemd to handle processes such as pt-kill, pt-stalk, and other daemonized scripts that we like to have running perpetually, are fired at startup and can be reinstated after failure.

The scenario is this; I want pt-kill to drop all sleeping connections from a certain application user, lets call them, ‘caffeinebob’, because they never close connections. Due to various reasons we can’t make changes in the application so we’re employing Percona Toolkit favourite, pt-kill, to do this for us. For convenience we want this result to persist across server restarts. In the olden days we might have some cron job that fires a shell script in combination with a sentinal file to ensure it’s running. I’m pretty sure that this kitty could be skinned many ways.

The systemd Unit File

After some research and testing, the below unit file will play nicely on a Centos 7 node with systemd at it’s core. In this example I am running Percona Server 5.6 installed using Percona’s yum repo with the mysql.service unit file generated at installation. I suspect that there could be some systemd deviation with other MySQL variants however, this configuration is working for me.

[Unit] Description = pt-kill caffeinebob After=syslog.target mysql.service Requires=mysql.service [Service] Type = simple PIDFile = /var/run/ptkill.pid ExecStart = /usr/bin/pt-kill --daemonize --pid=/var/run/ptkill.pid --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print Restart=on-abort [Install] WantedBy=multi-user.target

Let’s examine the above and see what we’re working with. Systemd unit files have various biologies. The example above is a simple Service unit file. This means we are enacting a process controlled and supervised by systemd. The significance of the After directive is that this service will not attempt startup until after syslog.target and mysql.service have been called. The Required directive is makes ptkill.service dependant on the mysql.service startup being successful.

The next part, the [Service] grouping, details the actions to be taken by the service. The Type can be one of many but as it’s a simple call to a script I’ve used the simple type. We are describing the command and the handling of it. The ExecStart is evidently the pt-kill command that we would usually run from the shell prompt or from within a shell script. This is a very corse example because we can opt to parameterize the command with the assistance of an Environment file. Note the use of the Restart directive, used so that systemd can handle a reaction should a failure occur that interrupts the process.

Finally under the [Install] grouping we’re telling systemd that this service should startup on a multi user system, and could be thought of as runlevel 2 or 3 (Multiuser mode).

So providing that we’ve got all the relevant paths, users and dependencies in place, once you reboot your host, mysql.service should in order, initiate mysqld and when that dependency is met, systemd will initiate pt-kill with our desired parameters to cull connections that meet the criteria stipulated in our configuration. This means you rely on systemd to manage pt-kill for you and you don’t necessarily need to remember to start this or similar processes when you restart you node.

Start up & enable

Now to envoke our service manually and add enable it to work on start up we should run the following systemctl commands;

[moore@localhost ~]$ sudo systemctl start ptkill.service [moore@localhost ~]$ sudo systemctl enable ptkill.service

No feedback but no errors so we can check the status of the service

[moore@localhost ~]$ sudo systemctl status ptkill -l ptkill.service - keep pt-kill persistent across restarts Loaded: loaded (/etc/systemd/system/ptkill.service; enabled) Active: active (running) since Wed 2015-08-12 02:39:13 BST; 1h 19min ago Main PID: 2628 (perl) CGroup: /system.slice/ptkill.service └─2628 perl /usr/bin/pt-kill --daemonize --pid=/var/run/ptkill.pid --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print

Perfect we can also instruct systemd to disable this and|or stop our service when the application is changed and caffeinebob close() all those open connections.

[moore@localhost ~]$ sudo systemctl stop ptkill.service [moore@localhost ~]$ sudo systemctl disable ptkill.service

Now after successful implementation we see that our process is running delightfully;

[moore@localhost ~]$ ps -ef | grep pt-kill root 2547 1 0 02:37 ? 00:00:00 perl /usr/bin/pt-kill --daemonize --pid=/var/run/ptkill.pid --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print

Catch me if I fall

Lets issue a kill signal to the process and observe it’s behaviour using journalctl

[moore@localhost ~]$ sudo kill -SEGV 2547

This will write similar entries into the system log;

[moore@localhost ~]$ sudo journalctl -xn -f Aug 12 02:39:13 localhost.localdomain sudo[2624]: moore : TTY=pts/1 ; PWD=/home/moore ; USER=root ; COMMAND=/bin/kill -SEGV 2547 Aug 12 02:39:13 localhost.localdomain systemd[1]: ptkill.service: main process exited, code=killed, status=11/SEGV Aug 12 02:39:13 localhost.localdomain systemd[1]: Unit ptkill.service entered failed state. Aug 12 02:39:13 localhost.localdomain systemd[1]: ptkill.service holdoff time over, scheduling restart. Aug 12 02:39:13 localhost.localdomain systemd[1]: Stopping keep pt-kill persistent across restarts... -- Subject: Unit ptkill.service has begun shutting down -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ptkill.service has begun shutting down. Aug 12 02:39:13 localhost.localdomain systemd[1]: Starting keep pt-kill persistent across restarts... -- Subject: Unit ptkill.service has begun with start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ptkill.service has begun starting up. Aug 12 02:39:13 localhost.localdomain systemd[1]: Started keep pt-kill persistent across restarts. -- Subject: Unit ptkill.service has finished start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ptkill.service has finished starting up. -- -- The start-up result is done.

Pt-kill flaps after the kill signal but systemd has been instructed to restart on failure so we don’t see caffeinebob saturate our processlist with sleeping connections.

Another bonus with this workflow is use within orchestration. Any standardized unit files can be propagated to your fleet of hosts with tools such as Ansible, Chef, Puppet or Saltstack.

Closing note

I’d love to hear from the pragmatists from the systemd world to understand if this approach can be improved or whether there are any flaws in this example unit file that would require addressing. This is very much a new-school of thought for me and feedback is both welcome and encouraged.

Thank you for your time, happy systemd-ing.

The post Percona Toolkit and systemd appeared first on MySQL Performance Blog.

different behaviour for --stop between pt-kill and pt-heartbeat?

Lastest Forum Posts - September 3, 2015 - 8:03am

it seems that pt-kill and pt-heartbeat behave differently in regards to the --stop and --sentinel options.

I can easily put something like this into a cron for pt-heartbeat to restart it regularly:
30 * * * * root /usr/bin/pt-heartbeat --utc --update --interval 1 --host XXX --database percona --table heartbeat --stop --sentinel /tmp/pt-heartbeat-hourly>/dev/null

I couldn't find a way to easily achieve the same for pt-kill.

Am I missing something or is there a difference between these two tools?


Percona Server 5.5.45-37.4 is now available

Latest MySQL Performance Blog posts - September 2, 2015 - 7:36am

Percona is glad to announce the release of Percona Server 5.5.45-37.4 on September 2, 2015. Based on MySQL 5.5.45, including all the bug fixes in it, Percona Server 5.5.45-37.4 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. Details of the release can be found in the 5.5.45-37.5 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:

  • Querying INFORMATION_SCHEMA GLOBAL_TEMPORARY_TABLES table would crash threads working with internal temporary tables used by ALTER TABLE. Bug fixed #1113388.
  • FLUSH INDEX_STATISTICS/FLUSH CHANGED_PAGE_BITMAPS and FLUSH USER_STATISTICS/RESET CHANGE_PAGE_BITMAPS pairs of commands were inadvertently joined, i.e. issuing either command had the effect of both. The first pair, besides flushing both index statistics and changed page bitmaps, had the effect of FLUSH INDEX_STATISTICS requiring SUPER instead of RELOAD privilege. The second pair resulted in FLUSH USER_STATISTICS destroying changed page bitmaps. Bug fixed #1472251.
  • If a new connection thread was created while a SHOW PROCESSLIST command or a INFORMATION_SCHEMA.PROCESSLIST query was in progress, it could have a negative TIME_MS value returned in the PROCESSLIST output. Bug fixed #1379582.

Other bugs fixed: #768038 and #1472256.

Release notes for Percona Server 5.5.45-37.4 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.45-37.4 is now available appeared first on MySQL Performance Blog.

Percona Toolkit fan

Lastest Forum Posts - September 2, 2015 - 4:25am
I just wanted to say I really like this set of tools.

Cannot install Percona Server on Debian Jeese. packages have unmet dependencies

Lastest Forum Posts - September 1, 2015 - 6:27am
Is it me or the repo is broken?
Im using these instructions: https://www.percona.com/doc/percona-.../apt_repo.html

Distributor ID: Debian
Description: Debian GNU/Linux 8.1 (jessie)
Release: 8.1
Codename: jessie

soporte@ayuda:~$ su -
root@ayuda:~# apt-get install percona-server-server-5.6
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
percona-server-server-5.6 : Depends: percona-server-client-5.6 (>= 5.6.25-73.1-1.jessie) but it is not going to be installed
Depends: libdbi-perl but it is not installable
Depends: libdbd-mysql-perl but it is not installable
Depends: libaio1 (>= 0.3.93) but it is not installable
E: Unable to correct problems, you have held broken packages.

Booking dot yeah! Booking.com’s Jean-François Gagné on Percona Live Amsterdam

Latest MySQL Performance Blog posts - September 1, 2015 - 12:00am

Booking.com, one of the world’s leading e-commerce companies, helps travels book nearly 1 million rooms per night. Established in 1996, Booking.com B.V. guarantees the best prices for any type of property, from small, family-run bed and breakfasts to executive apartments and five-star luxury suites.

The travel website is also a dedicated contributor to the MySQL and Perl community. Other open source technologies include CentOS Linux, Nginx, python, puppet, Git and more.

A Diamond sponsor of Percona Live Amsterdam Sept. 21-23, you can meet the people who power Booking.com at booth 205. Enter promo code “BlogInterview” at registration to save €20!

In the meantime, meet Jean-François Gagné, a system engineer at Booking.com. He’ll be presenting a couple of talks: “Riding the Binlog: an in Deep Dissection of the Replication Stream” and “Binlog Servers at Booking.com.”

Tom: Hi Jean-François, in your session, “Riding the Binlog: an in Deep Dissection of the Replication Stream“, you talk about how we can think of the binary logs as a transport for a “Stream of Transactions”. What will be the top 3 things attendees will come away with following this 50-minute talk?

Booking.com’s Jean-François Gagné

Jean-François: Hi Tom, thanks for this opportunity to give a sneak peak of my talk.  The most important subject that will be discussed is that the binary logs evolves: by the usage of “log-slave-updates”, the stream can grow, shrink or morph.  Said in another way: the binary logs of a slave can be very different from the binary logs of the master, and this should be taken into account when relying on those (including when replicating using intermediate master and when promoting a slave as a new master using GTIDs).  We will also explore how the binary logs can be decomposed in sub-streams, or viewed as the multiplexing of many streams.  We will also look for de-multiplexing functions and the new possibilities that are opened with that.


Tom: Percona Live, starting with this conference, has a new venue and a broader theme – now encompassing, in addition to MySQL, MongoDB, NoSQL and data in the cloud. Your thoughts? And what do think is missing – what would you change (if anything)?

Jean-François: I think you forget the best of all changes: going from a 2 day conference last year in London to a 3 day conference this year.  This will allow better knowledge exchange and I am very happy about that.  I think this event will be a success with a good balance of sessions focus on technologies and presentation about a specific use-case of those technologies.  If I had one wish: I would like to see more sessions about specific use-cases of NoSQL technologies with and in deep discussion about why they are a better choice than more traditional solutions: maybe more of those sessions will be submitted next year.


Tom: Which other session(s) are you most looking forward to besides your own?

Jean-François: I will definitely attend the Facebook session about Semi-Synchronous Replication: it is very close to my interest, especially as Booking.com is thinking about using loss-less semi-sync replication in the future, and I look forward to hear war stories about this feature.  All sessions dissecting internals of a technology (InnoDB, TokuDB, RocksDB, …) will also have my attention.  Finally, it is always interesting to hear about how large companies are using databases, so I plan to attend the MySQL@Wikimedia session.


Tom: As a resident of Amsterdam, what are some of the must-do activities/sightseeing for those visiting for Percona Live from out of town?

Jean-François: Seeing the city from a high point is impressive, and you will have the opportunity of enjoying that view from the Booking.com office at the Community Dinner.  Also, I recommend finding a bike and discover the city pedaling (there are many renting shops, just ask Google).  From the conference venue, you can do a 70 minutes ride crossing three nice parks: the Westerpark, the Rembrandtpark and the Vondelpark – https://goo.gl/P13Mc7 – and you can discover the first of third park in a shorter ride (45 minutes).  If you feel a little more adventurous, I recommend a 90 minute ride South following the Amstel: once out of Amsterdam, you will have the water on one side at the level of the road, and the fields (Polder) 3 meters below on the other side (https://goo.gl/OPDv5z).  This will allow you to see for yourself why this place is called the “Low Countries”.

The post Booking dot yeah! Booking.com’s Jean-François Gagné on Percona Live Amsterdam appeared first on MySQL Performance Blog.

pt table checksum and pt tablesync compatibility with ROW binlog format

Lastest Forum Posts - August 31, 2015 - 9:43pm

We are using MySQL 5.6.21 enterprise version with master slave configuration. Binlog format is ROW.

It was mentioned in one of the post that there are some compatibility issues with pt table checksum and pt tablesync tool with respect to ROW based binlog format. Can anyone please clarify regarding the same.

Issue applying logs to full backup

Lastest Forum Posts - August 31, 2015 - 6:03pm
Hi - I'm trying to apply the logs to a full back up I just took. However, I'm receiving the following error:

: innobackupex: got a fatal error with the following stacktrace: at /usr/bin/innobackupex line 3910 main::check_args() called at /usr/bin/innobackupex line 1531 innobackupex: Error: Too many command line arguments

To take my backup, I ran the following command:
: innobackupex -user=root --no-timestamp /backup/prod-db/percona/2015-08-30

After the completed, successfully, I attempted to apply the logs to give myself a stable backup to restore from:
: innobackupex —apply-log /backup/prod-db/percona/2015-08-30

Running this commend resulted in the error message above.

Additional information:
Using server version 5.5.41-0+wheezy1
xtrabackup version 2.2.12 based on MySQL server 5.6.24 Linux (x86_64) (revision id: 8726828)

Any help is appreciated!


pt-online-schema change and replication

Lastest Forum Posts - August 31, 2015 - 5:13am

Sorry if this is a repeat issue. Over the weekend I used pt-online-schema-change to alter a 2.5 billion row table on one of our slaves. I did not switch the tables or drop the original table. The triggers are still inplace to change the new table when the original table is updated. There is only replication traffic to this server. pt-osc was chosen because a straight alter table would take 72 hours to run. When replication was turned back on new rows are seen in the old table but they are not making it to the new table. The plan is to promote this slave to a new master so that the outage window can be shortened significantly. Then the old master will become a slave and synched from the new master. So, there are no triggers on the master, only on the slave. Replication is mixed mode on the master. The slave does not have binary logging enabled.

We are running Percona 5.6.21-70 and Toolkit is 2.2.15-1.

How can I get the triggers on the slave to fire when the original table is updated by replication?

Thank you,


High-load clusters and desynchronized nodes on Percona XtraDB Cluster

Latest MySQL Performance Blog posts - August 31, 2015 - 3:00am

There can be a lot of confusion and lack of planning in Percona XtraDB Clusters in regards to nodes becoming desynchronized for various reasons.  This can happen a few ways:

When I say “desynchronized” I mean a node that is permitted to build up a potentially large wsrep_local_recv_queue while some operation is happening.  For example a node taking a backup would set wsrep_desync=ON during the backup and potentially fall behind replication some amount.

Some of these operations may completely block Galera from applying transactions, while others may simply increase load on the server enough that it falls behind and applies at a reduced rate.

In all the cases above, flow control is NOT used while the node cannot apply transactions, but it MAY be used while the node is recovering from the operation.  For an example of this, see my last blog about IST.

If a cluster is fairly busy, then the flow control that CAN happen when the above operations catch up MAY be detrimental to performance.

Example setup

Let us take my typical 3 node cluster with workload on node1.  We are taking a blocking backup of some kind on node3 so we are executing the following steps:

  1. node3> set global wsrep_desync=ON;
  2. Node3’s “backup” starts, this starts with FLUSH TABLES WITH READ LOCK;
  3. Galera is paused on node3 and the wsrep_local_recv_queue grows some amount
  4. Node3’s “backup” finishes, finishing with UNLOCK TABLES;
  5. node3> set global wsrep_desync=OFF;
During the backup

This includes up through step 3 above.  My node1 is unaffected by the backup on node3, I can see it averaging 5-6k writesets(transactions) per second which it did before we began:


node2 is also unaffected:

but node3 is not applying and its queue is building up:

Unlock tables, still wsrep_desync=ON

Let’s examine briefly what happens when node3 is permitted to start applying, but wsrep_desync stays enabled:

node1’s performance is pretty much the same, node3 is not using flow control yet. However, there is a problem:

It’s hard to notice, but node3 is NOT catching up, instead it is falling further behind!  We have potentially created a situation where node3 may never catch up.

The PXC nodes were close enough to the red-line of performance that node3 can only apply just about as fast (and somewhat slower until it heats up a bit) as new transactions are coming into node1.

This represents a serious concern in PXC capacity planning:

Nodes do not only need to be fast enough to handle normal workload, but also to catch up after maintenance operations or failures cause them to fall behind.

Experienced MySQL DBA’s will realize this isn’t all that different than Master/Slave replication.

Flow Control as a way to recovery

So here’s the trick:  if we turn off wsrep_desync on node3 now, node3 will use flow control if and only if the incoming replication exceeds node3’s apply rate.  This gives node3 a good chance of catching up, but the tradeoff is reducing write throughput of the cluster.  Let’s see what this looks like in context with all of our steps.  wsrep_desync is turned off at the peak of the replication queue size on node3, around 12:20PM:

So at the moment node3 starts utilizing flow control to prevent falling further behind, our write throughput (in this specific environment and workload) is reduced by approximately 1/3rd (YMMV).   The cluster will remain in this state until node3 catches up and returns to the ‘Synced’ state.  This catchup is still happening as I write this post, almost 4 hours after it started and will likely take another hour or two to complete.

I can see a more realtime representation of this by using myq_status on node1, summarizing every minute:

[root@node1 ~]# myq_status -i 1m wsrep mycluster / node1 (idx: 1) / Galera 3.11(ra0189ab) Cluster Node Outbound Inbound FlowC Conflct Gcache Appl time P cnf # stat laten msgs data que msgs data que pause snt lcf bfa ist idx %ef 19:58:47 P 5 3 Sync 0.9ms 3128 2.0M 0 27 213b 0 25.4s 0 0 0 3003k 16k 62% 19:59:47 P 5 3 Sync 1.1ms 3200 2.1M 0 31 248b 0 18.8s 0 0 0 3003k 16k 62% 20:00:47 P 5 3 Sync 0.9ms 3378 2.2M 32 27 217b 0 26.0s 0 0 0 3003k 16k 62% 20:01:47 P 5 3 Sync 0.9ms 3662 2.4M 32 33 266b 0 18.9s 0 0 0 3003k 16k 62% 20:02:47 P 5 3 Sync 0.9ms 3340 2.2M 32 27 215b 0 27.2s 0 0 0 3003k 16k 62% 20:03:47 P 5 3 Sync 0.9ms 3193 2.1M 0 27 215b 0 25.6s 0 0 0 3003k 16k 62% 20:04:47 P 5 3 Sync 0.9ms 3009 1.9M 12 28 224b 0 22.8s 0 0 0 3003k 16k 62% 20:05:47 P 5 3 Sync 0.9ms 3437 2.2M 0 27 218b 0 23.9s 0 0 0 3003k 16k 62% 20:06:47 P 5 3 Sync 0.9ms 3319 2.1M 7 28 220b 0 24.2s 0 0 0 3003k 16k 62% 20:07:47 P 5 3 Sync 1.0ms 3388 2.2M 16 31 251b 0 22.6s 0 0 0 3003k 16k 62% 20:08:47 P 5 3 Sync 1.1ms 3695 2.4M 19 39 312b 0 13.9s 0 0 0 3003k 16k 62% 20:09:47 P 5 3 Sync 0.9ms 3293 2.1M 0 26 211b 0 26.2s 0 0 0 3003k 16k 62%

This reports around 20-25 seconds of flow control every minute, which is consistent with that ~1/3rd of performance reduction we see in the graphs above.

Watching node3 the same way proves it is sending the flow control (FlowC snt):

mycluster / node3 (idx: 2) / Galera 3.11(ra0189ab) Cluster Node Outbound Inbound FlowC Conflct Gcache Appl time P cnf # stat laten msgs data que msgs data que pause snt lcf bfa ist idx %ef 17:38:09 P 5 3 Dono 0.8ms 0 0b 0 4434 2.8M 16m 25.2s 31 0 0 18634 16k 80% 17:39:09 P 5 3 Dono 1.3ms 0 0b 1 5040 3.2M 16m 22.1s 29 0 0 37497 16k 80% 17:40:09 P 5 3 Dono 1.4ms 0 0b 0 4506 2.9M 16m 21.0s 31 0 0 16674 16k 80% 17:41:09 P 5 3 Dono 0.9ms 0 0b 0 5274 3.4M 16m 16.4s 27 0 0 22134 16k 80% 17:42:09 P 5 3 Dono 0.9ms 0 0b 0 4826 3.1M 16m 19.8s 26 0 0 16386 16k 80% 17:43:09 P 5 3 Jned 0.9ms 0 0b 0 4957 3.2M 16m 18.7s 28 0 0 83677 16k 80% 17:44:09 P 5 3 Jned 0.9ms 0 0b 0 3693 2.4M 16m 27.2s 30 0 0 131k 16k 80% 17:45:09 P 5 3 Jned 0.9ms 0 0b 0 4151 2.7M 16m 26.3s 34 0 0 185k 16k 80% 17:46:09 P 5 3 Jned 1.5ms 0 0b 0 4420 2.8M 16m 25.0s 30 0 0 245k 16k 80% 17:47:09 P 5 3 Jned 1.3ms 0 0b 1 4806 3.1M 16m 21.0s 27 0 0 310k 16k 80%

There are a lot of flow control messages (around 30) per minute.  This is a lot of ON/OFF toggles of flow control where writes are briefly delayed rather than a steady “you can’t write” for 20 seconds straight.

It also interestingly spends a long time in the Donor/Desynced state (even though wsrep_desync was turned OFF hours before) and then moves to the Joined state (this has the same meaning as during an IST).

Does it matter?

As always, it depends.

If these are web requests and suddenly the database can only handle ~66% of the traffic, that’s likely a problem, but maybe it just slows down the website somewhat.  I want to emphasize that WRITES are what is affected here.  Reads on any and all nodes should be normal (though you probably don’t want to read from node3 since it is so far behind).

If this were some queue processing that had reduced throughput, I’d expect it to possibly catch up later

This can only be answered for your application, but the takeaways for me are:

  • Don’t underestimate your capacity requirements
  • Being at the redline normally means you are well past the redline for abnormal events.
  • Plan for maintenance and failure recoveries
  • Where possible, build queuing into your workflows so diminished throughput in your architecture doesn’t generate failures.

Happy clustering!

Graphs in this post courtesy of VividCortex.

The post High-load clusters and desynchronized nodes on Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Mysql help

Lastest Forum Posts - August 31, 2015 - 1:08am
How to link a web inquiry form to my sql server

Linux Monitoring Plugin - "NaN value for Used Memory"

Lastest Forum Posts - August 30, 2015 - 11:07pm
I have installed Percona monitoring plugin into my cacti server and running without any issue. Only problem is with "used Memory" Value that shown as "NaN".

My Server Memory limit " 72G" . When I run the script manually, it shows negative value for hu option. I think, that is the problem.

[root@Monitoring-Srv ~]# /usr/bin/php -q /usr/share/cacti/scripts/ss_get_by_ssh.php --host --type memory --items hq,hr,hs,ht,hu,hv
hq:2990836121 hr:2377703424 hs:7340761088 ht:1242996736 hu:-5231743385 hv:7601856921
[root@Monitoring-Srv ~]#

I run the same script towards on one of my low memory capacity server [ 8 GB]. it shows all the value correctly.

[root@Monitoring-Srv ~]#/usr/bin/php -q /usr/share/cacti/scripts/ss_get_by_ssh.php --host localhost --type memory --items hq,hr,hs,ht,hu,hv
hq:2255319040 hr:157446144 hs:139264 ht:153554944 hu:5654937600 hv:8221396992
[root@Monitoring-Srv ~]#

Is there any max limit enabled in Memory plugin to show the value ?.

mysql server crashes on archive table

Lastest Forum Posts - August 30, 2015 - 4:45am
Server version: 5.5.44-0+deb7u1 (Debian) on raspberry pi.
In this server an ARCHIVE table with 5.635.816 records.
After a recent addition I tried an 'optimize' step:

ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql> describe table p1_archive;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...

This server-side crash consistently happens since then, on every optimize table, check table, on select .... into outfile, on mysqldump.
On running mysqldump with where clause I succeed in grabbing about 2.700.000 records into the dumpfile from all 5.635.816 in the table, before this server-crash happens.
pt-archive does not run: Cannot find an ascendable index in table...

Any suggestions as to how to recover the remainder?

Thanks in advance for any suggestion.

MySQL deadlock information incomplete

Lastest Forum Posts - August 28, 2015 - 2:11pm
I am investigating a deadlock in our application and find that the MySQL log is providing insufficient information. Specifically the LATEST DETECTED DEADLOCK section from SHOW ENGINE INNODB STATUS\G is copied below.
Transaction 2 - We see lock HOLDS and WAITING information,
Transaction 1 - only WAITING information.
What is Transaction 1 holding and where is the cycle to cause deadlock?
Let me reiterate that I am not asking for ways to resolve the deadlock, rather why the conflicting lock information is incomplete and if there is any way to obtain more complete information.

The relevant table definitions are as follows.


NetworkAddress varchar (255),
Port int NULL ,
PartID int NOT NULL,


2015-08-25 20:24:58 0x1378
TRANSACTION 5564, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 24, OS thread handle 2280, query id 6587 localhost CoreSvc updating
delete from SystemSequence where NextId < _nextId
RECORD LOCKS space id 50 page no 3 n bits 72 index `PRIMARY` of table `pangaea`.`systemsequence` trx id 5564 lock_mode X waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 80000001; asc ;;
1: len 6; hex 0000000015bb; asc ;;
2: len 7; hex 3500000177102e; asc 5 w .;;

TRANSACTION 5563, ACTIVE 0 sec inserting, thread declared inside InnoDB 5000
mysql tables in use 10, locked 10
17 lock struct(s), heap size 1136, 9 row lock(s), undo log entries 8
MySQL thread id 30, OS thread handle 4984, query id 6677 localhost CoreSvc update
insert CustomSystem (SystemID, NetworkAddress,Port, PartID )
values (_SystemID, _NetworkAddress, _Port, _PartID )
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 50 page no 3 n bits 72 index `PRIMARY` of table `pangaea`.`systemsequence` trx id 5563 lock_mode X
Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 32

0: len 4; hex 80000001; asc ;;
1: len 6; hex 0000000015bb; asc ;;
2: len 7; hex 3500000177102e; asc 5 w .;;

Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
0: len 4; hex 80000002; asc ;;
1: len 6; hex 0000000015bb; asc ;;
2: len 7; hex b5000001280110; asc ( ;;

RECORD LOCKS space id 68 page no 7 n bits 72 index `PartID` of table `pangaea`.`customsystem` trx id 5563 lock_mode X insert intention waiting
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
0: len 8; hex 73757072656d756d; asc supremum;;


Percona Toolkit 2.2.15 is now available

Latest MySQL Performance Blog posts - August 28, 2015 - 10:30am

Percona is pleased to announce the availability of Percona Toolkit 2.2.15.  Released August 28, 2015. Percona Toolkit is a collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

This release is the current GA (Generally Available) stable release in the 2.2 series. It includes multiple bug fixes as well as continued preparation for MySQL 5.7 compatibility. Full details are below. Downloads are available here and from the Percona Software Repositories.

New Features:

  • Added --max-flow-ctl option to pt-online-schema-change and pt-archiver with a value set in percent. When a Percona XtraDB Cluster node is very loaded, it sends flow control signals to the other nodes to stop sending transactions in order to catch up. When the average value of time spent in this state (in percent) exceeds the maximum provided in the option, the tool pauses until it falls below again.Default is no flow control checking.
  • Added the --sleep option for pt-online-schema-change to avoid performance problems. The option accepts float values in seconds.
  • Implemented ability to specify --check-slave-lag multiple times for pt-archiver. The following example enables lag checks for two slaves:
    pt-archiver --no-delete --where '1=1' --source h=oltp_server,D=test,t=tbl --dest h=olap_server --check-slave-lag h=slave1 --check-slave-lag h=slave2 --limit 1000 --commit-each
  • Added the --rds option to pt-kill, which makes the tool use Amazon RDS procedure calls instead of the standard MySQL kill command.

Bugs Fixed:

  • Fixed bug 1042727: pt-table-checksum doesn’t reconnect the slave $dbh
    Before, the tool would die if any slave connection was lost. Now the tool waits forever for slaves.
  • Fixed bug 1056507: pt-archiver --check-slave-lag agressiveness
    The tool now checks replication lag every 100 rows instead of every row, which significantly improves efficiency.
  • Fixed bug 1215587: Adding underscores to constraints when using pt-online-schema-change can create issues with constraint name length
    Before, multiple schema changes lead to underscores stacking up on the name of the constraint until it reached the 64 character limit. Now there is a limit of two underscores in the prefix, then the tool alternately removes or adds one underscore, attempting to make the name unique.
  • Fixed bug 1277049pt-online-schema-change can’t connect with comma in password
    For all tools, documented that commas in passwords provided on the command line must be escaped.
  • Fixed bug 1441928: Unlimited chunk size when using pt-online-schema-change with --chunk-size-limit=0 inhibits checksumming of single-nibble tables
    When comparing table size with the slave table, the tool now ignores --chunk-size-limit if it is set to zero to avoid multiplying by zero.
  • Fixed bug 1443763: Update documentation and/or implentation of pt-archiver --check-interval
    Fixed the documentation for --check-interval to reflect its correct behavior.
  • Fixed bug 1449226: pt-archiver dies with “MySQL server has gone away” when --innodb_kill_idle_transaction is set to a low value and --check-slave-lag is enabled
    The tool now sends a dummy SQL query to avoid timing out.
  • Fixed bug 1446928: pt-online-schema-change not reporting meaningful errors
    The tool now produces meaningful errors based on text from MySQL errors.
  • Fixed bug 1450499: ReadKeyMini causes pt-online-schema-change session to lock under some circumstances
    Removed ReadKeyMini, because it is no longer necessary.
  • Fixed bug 1452914: --purge and --no-delete are mutually exclusive, but still allowed to be specified together by pt-archiver
    The tool now issues an error when --purge and --no-delete are specified together.
  • Fixed bug 1455486: pt-mysql-summary is missing the --ask-pass option
    Added the --ask-pass option to the tool.
  • Fixed bug 1457573: pt-mysql-summary fails to download pt-diskstats pt-pmp pt-mext pt-align
    Added the -L option to curl and changed download address to use HTTPS.
  • Fixed bug 1462904: pt-duplicate-key-checker doesn’t support triple quote in column name
    Updated TableParser module to handle literal backticks.
  • Fixed bug 1488600: pt-stalk doesn’t check TokuDB status
    Implemented status collection similar to how it is performed for InnoDB.
  • Fixed bug 1488611: various testing bugs related to newer Perl versions

Details of the release can be found in the release notes and the 2.2.15 milestone on Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

The post Percona Toolkit 2.2.15 is now available appeared first on MySQL Performance Blog.

MySQL/Percona Audit Log Plugin Not Working

Lastest Forum Posts - August 28, 2015 - 8:20am
I've been looking to implemement the Percona Audit Log plugin. So far I have installed it and configured it, and it seems to be working ok.
Except, I have installed it on a slave Database as I don't want to run it on my master database (because of the additional overhead, and the fact it requires a DB restart if I want to configure it), and it doesn't appear to be logging any queries.
I have set up a script to import the logfile back into a standalone MySQL instance so I can query it, and from what I can tell it only records events that happen locally on the slave, but ignores all queries that are replicated from the master.
I implemented it because I need a way to audit the types of statements being run on our databases (ALTER, UPDATE etc.) but if they're not going to be recorded it's not much use.
I can't see anything in the documentation for this (either Percona or MySQL). Does anyone else use this, and is it just me that has missed something or am I going to have to go back to the drawing board.
I'm running MySQL 5.5 with Statement Based Replication.

updating galera2 to galera3 in percona-xtradb-5.6 cluster

Lastest Forum Posts - August 27, 2015 - 3:20am

I have a percona cluster of three nodes running innodb 5.6 with galera 2.12. I wish to do a rolling upgrade from galera2 to galera3 .
I saw the article regarding the upgrade process which states that i should set wsrep_provider_options=”socket.checksum=1″ in order to have compatibilities with the different galera versions.

However, doing so, when i restart the server i get the following error and the server wont start.

"Unrecognized parameter 'socket.checksum'"

Any ideas?