September 19, 2014

Check for MySQL slave lag with Percona Toolkit plugin for Tungsten Replicator

Continuent Tungsten ReplicatorA while back, I made some changes to the plugin interface for pt-online-schema-change which allows custom replication checks to be written. As I was adding this functionality, I also added the --plugin option to pt-table-checksum. This was released in Percona Toolkit 2.2.8.

With these additions, I spent some time writing a plugin that allows Percona Toolkit tools to use Tungsten Replicator to check for slave lag, you can find the code at https://github.com/grypyrg/percona-toolkit-plugin-tungsten-replicator

Requirements

The plugin uses the perl JSON::XS module (perl-JSON-XS rpm package, http://search.cpan.org/dist/JSON-XS/XS.pm), make sure it’s available or the plugin will not work.

Preparation

We need to use the --recursion-method=dsns as the Percona Toolkit tools are not able to automatically find the tungsten replicator slaves that are connected to the master database. (I did add a blueprint on launchpad to make this possible https://blueprints.launchpad.net/percona-toolkit/+spec/plugin-custom-recursion-method)

The dsns recursion-method gets the list of slaves from a database table you specify:

Here one slave node3 is replicating from the master:

Configuration

Currently, it is not possible to specify extra options for the plugin with Percona Toolkit, so some manual editing of the perl file is still necessary to configure it.

So before we can run a checksum, we need to configure the plugin:

Running A Checksum

Here I did a checksum of a table with pt-table-checksum. During the checksum process, I brought the slave node offline and brought it back online again:

I recommend to change the check-interval higher than the default 1 second as running trepctl takes a while. This could slow down the process quite a lot.

Making Schema Changes

The plugin also works with pt-online-schema-change:

As you can see, there was some slave lag during the schema changes.

Binlog Format & pt-online-schema-change

pt-online-schema-change uses triggers in order to do the schema changes. Tungsten Replicator has some limitations with different binary log formats and triggers (https://code.google.com/p/tungsten-replicator/wiki/TRCAdministration#Triggers_and_Row_Replication).

In Tungsten Replicator, ROW based binlog events will be converted to SQL statements, which causes triggers to be executed on the slave as well, this does not happen with traditional replication.

Different settings:

  • STATEMENT based binary logging works by default
  • ROW based binary logging works, the plugin recreates the triggers and uses the technique documented at https://code.google.com/p/tungsten-replicator/wiki/TRCAdministration#Triggers_and_Row_Replication
  • MIXED binary logging does not work, as there is currently no way to determine whether an event was written to the binary log in statement or row based format, so it’s not possible to know if triggers should be run or not. The tool will exit and and error will be returned:

Be Warned

The binlog_format can be overriden on a per session basis, make sure that this does NOT happen when using pt-online-schema-change.

Summary

The documentation on the Continuent website already mentions how you can compare data with pt-table-checksum.

I believe this plugin is a good addition to it. The features in Percona Toolkit that monitor replication lag can now be used with Tungsten Replicator and therefore gives you control on how much replication lag is tolerated while using those tools.

About Kenny Gryp

Kenny is a Principal Architect at Percona. Bitten somewhere in the late 90s by the open-source bug, most of his spare time was spent working with Linux & OpenBSD servers.
Before joining Percona, he worked at a large social networking company in Europe as DBA & Puppeteer.

Speak Your Mind

*