EmergencyEMERGENCY? Get 24/7 Help Now!

How to setup a PXC cluster with GTIDs (and have async slaves replicating from it!)

 | February 20, 2015 |  Posted In: GTID, MySQL, Percona XtraBackup, Percona XtraDB Cluster

PREVIOUS POST
NEXT POST

This past week was marked by a series of personal findings related to the use of Global Transaction IDs (GTIDs) on Galera-based clusters such as Percona XtraDB Cluster (PXC). The main one being the fact that transactions touching MyISAM tables (and FLUSH PRIVILEGES!) issued on a giving node of the cluster are recorded on a GTID set bearing the node’s server_uuid as “source id” and added to the binary log (if the node has binlog enabled), thus being replicated to any async replicas connected to it. However, they won’t be replicated across the cluster (that is, all of this is by design, if wsrep_replicate_myisam is disabled, which it is by default).

My colleague Stéphane covered this story in one of his recent blog posts titled, “Percona XtraDB Cluster 5.6: a tale of 2 GTIDs,” explaining how those local (in reference to the node) transactions are saved in a different GTID set to the cluster’s main one and the impact this may cause when re-pointing an async slave to replicate from a different node.

GTIDs is a feature introduced in MySQL 5.6 that made replication management much easier and considering there’s a series of advantages in having an async replica attached to a PXC cluster, why hasn’t this popped out earlier to either of us? I guess there aren’t so many people using GTIDs with Galera-based clusters around yet so here’s a post to show you how to do it.

Initializing a PXC cluster configured with GTIDs

My testing environment for a 3-node cluster is composed of node1 (192.168.70.2), node2 (.3) and node3 (.4). All of them have the same PXC binaries installed:

and are configured with an almost identical /etc/my.cnf (apart from wsrep_node_address and the prompt line):

server_id, log_bin, log_slave_updates and binlog_format are not needed for the cluster to operate but they are required to setup regular replication later on so I’ve added those to let the nodes ready to operate as masters.

We start with an empty, inexisting datadir on all nodes so I’ll use the mysql_install_db script to create a base datadir with all that is needed for MySQL to work on node1, which will be the reference node of the cluster:

We’re now ready to bootstrap the cluster from this reference node:

With that, we have an operational reference node:


Note from above my.cnf settings that I’ve chosen xtrabackup-v2 as the State Snapshot Transfer (SST) method, which requires authentication (wsrep_sst_auth). For this reason, if we now try to start MySQL on node2 it will fail with its error log showing:

The problem here is that the SST method requires MySQL authentication: even though the credentials are shared on the wsrep_sst_auth variable of all node’s my.cnf configuration file this only tells xtrabackup-v2 to require it, it doesn’t actually configure MySQL with it. That’s a step left for us:

And this constitutes our very first commited transaction, which goes into the cluster’s GTID set:

This GRANT only needs to be issued once, in the reference node. Now you start MySQL on node2 and node3: they will use XtraBackup to make a backup of node1’s dataset, which will contain this GRANT, and restore it in their own datadir:

and:

OK, that’s done. But how do I attach an async replica to the cluster?

Just to make it clear, you cannot attach an async replica to the cluster: you need to choose a node that is member of the cluster and make it the master for the async replica. Considering all nodes should have the same data replicated it should be easy to change the async replica’s configuration and make it replicate from any other node from the cluster – and it is, though you may need to do some adjustments (more on this later).

The initial procedure is exactly the same one used with regular replication: you start by taking a backup of the master and restoring it on the replica. We’ll use XtraBackup again to perform the backup and we’ll start by having node2 as the master of our async replica (192.168.70.7). We could stream the backup from node2 directly to the async replica and later “prepare” it (by applying the logs, which needs to be done using the same version of Percona XtraBackup that you’ve used to take the backup), but to make things simple we’ll first take the backup on node2:

then “prepare” it:

and from our async replica we’ll copy the backup from node2 using rsync over ssh (you could use scp or any other mathod to copy the files from one server to the other):

We now need to change the ownership of those files to the ‘mysql‘ user:

and take note of the “replication coordinates” – but related to GTID. We have those in the xtrabackup_info file:

In our case we took the backup very early in this database’s lifecycle (seeing by the GTID # in there) but in practice that’s seldom the case. We can now start MySQL:

Before we can point our async replica to replicate from node2 we need to create a replication user with the right set of privileges there:

Now we can use CHANGE MASTER TO on our async replica to point it to node2 using the above credentials:

If we START SLAVE now we’ll run into error 1236: we need first to set the replication coordinates from when the backup was taken. With GTID replication, this is done in a different manner: instead of providing replicate coordinates in the CHANGE MASTER TO command (where we’ve used MASTER_AUTO_POSITION=1 instead) we do it by redefining the gtid_purged global variable with the GTID sets we got from the xtrabackup_info file (in this example there’s only one set):

We can now START SLAVE and check its status:

Note that Executed_Gtid above is showing a second transaction in the cluster’s GTID set: this is related to the GRANT statement we issued on node2 to setup the replication account and it means it is now configured on all nodes members of the cluster (and this async replica as well). With that in place we can easily point our async replica to replicate from a different node, such as node3:

Let’s verify this is indeed the case:

Nice! What about the caveats you were talking about in the other blog post?

The problem arises when you start to have other GTID sets on the gtid_executed variable of the nodes. This happens when a DML transaction involving a MyISAM table is issued on the node while having wsrep_replicate_myisam disabled: instead of having that transaction going to the cluster’s main GTID set it will go to a new one, bearing the node’s server_uuid as source id. In fact, that’s the behavior you find on regular MySQL replication configured with GTIDs. Here’s an example:

Creating a MyISAM table per si is not an issue, because this is a DDL, so the transaction will go to the cluster’s GTID set and replicated to the other nodes:

But doing an INSERT in such a table is a DML so the transaction will go to a different GTID set:

Now if we move our async replica to node1 it might just works:

and that is because transaction ‘834bca7f-b45e-11e4-a7b5-0800272e951c:1could still be found on the binary logs of node1. Let’s do something slightly different now, but on node2:

Transaction ‘8a157d9c-b465-11e4-aafa-0800272e951c:1‘ that contains the INSERT statement we issued from node2 was recorded into binary log file percona-bin.000008 but the following PURGE BINARY LOGS TO command deleted that file. Now, if we point our async replica back to node2 we’ll run into a problem:

The master can no longer provide transaction ‘834bca7f-b45e-11e4-a7b5-0800272e951c:1‘ to the replica as requested thus breaking replication.

Ouch! Is there a fix for this?

Well, kind of. You can trick MySQL into believing it has processed this transaction by injecting an empty transaction bearing the same GTID of the missing one:

The problem is that you’ll end up with an inconsistent dataset – injecting an empty transaction is the same as using SQL_SLAVE_SKIP_COUNTER on regular MySQL replication.

Take-home lesson

It is not complicated to configure a PXC cluster to use GTIDs, in fact you may prefer to do so once you get used to it. And it makes that all much easier to have an asynchronous slave replicating from one of the nodes, even moving it around to replicate from a different node. It should all go well while the nodes record their transactions on the cluster’s main GTID set, which is shared by all nodes: this implies all of them will have the same transactions in their binlogs, as designed. The problem appears when we start seeing a deviation on this pattern, with a few nodes recording local transactions on their “own” GTID set. This won’t bother the cluster operation per si, as these other GTID sets are simply ignored and won’t be replicated to other nodes, but they may complicate things for attached async replicas if you need to point them to a different node.

In such environments, strive to maintain a unified GTID set around the cluster’s main one and if you find out that one of the nodes has started adding transactions to a different set, investigate it. You shouldn’t be using MyISAM tables on an XtraDB Cluster as those aren’t officially supported but if you must do them you should probably consider using wsrep_replicate_myisam. Operations on mysql system tables should be done through the use of DDLs instead of DMLs (like GRANT instead of INSERT) and, above all, keep distance from this bug.

PREVIOUS POST
NEXT POST
Fernando Laudares

Fernando Laudares joined Percona in early 2013 after working 8 years for a Canadian company specialized in offering services based in open source technologies. Fernando's work experience includes the architecture, deployment and maintenance of IT infrastructures based on Linux, open source software and a layer of server virtualization. From the basic services such as DHCP & DNS to identity management systems, but also including backup routines, configuration management tools and thin-clients. He's now focusing on the universe of MySQL with a particular interest in understanding the intricacies of database systems.

One Comment

  • Hi,

    This is nice to read, but is it not possible at all to have the slave being updated from all nodes ?

    Let’s say node2 dies, your “offsite” sync is gone and what happens to the slave when it comes back online ?

    Thanks!

    Matt

Leave a Reply