EmergencyEMERGENCY? Get 24/7 Help Now!

Multi-Valued INSERTs, AUTO_INCREMENT & Percona XtraDB Cluster

 | September 9, 2014 |  Posted In: Insight for DBAs, MySQL, Percona XtraDB Cluster

PREVIOUS POST
NEXT POST

A common migration path from standalone MySQL/Percona Server to a Percona XtraDB Cluster (PXC) environment involves some measure of time where one node in the new cluster has been configured as a slave of the production master that the cluster is slated to replace. In this way, the new cluster acts as a slave of the production environment – traditional replication takes care of getting the data into the cluster, and then Galera replication handles the intra-cluster traffic. This often works without issue, although there is one case that I’ve encountered recently where special care must be taken to properly configure the stream to ensure that replication does not break. If you use multi-valued inserts with auto-increment columns, then this post is for you.

For purposes of our discussion, assume that we have a basic 3-node PXC cluster that we’ve set up using the PXC Reference Architecture document, and that we’re replicating from an asynchronous master (call it “server A”) into one of the PXC nodes. Without loss of generality, we’ll pick PXC03. Also, for purposes of our discussion, we’ll be working with the following table definition:

If we insert rows into this table one at a time, we have no issues.

But if we start doing multi-valued inserts, we can run into a problem.

Uh oh. Replication is broken and our data is now inconsistent. So why does this happen and how can we prevent it?

binlog_format

The astute observer will note that I have not yet said anything about the binary log format on the master. If the binary log format on the master is already set to ROW, then the above error will not occur. RBR will properly replicate multi-valued INSERTs to the PXC cluster without issue, and the data will be consistent. Problem solved. However, there may be reasons that the master is not using or cannot use RBR, such as disk space or IOPS limitations, and thus it’s running in MIXED or STATEMENT mode. In that case, we need to look elsewhere….

wsrep_auto_increment_control

When set to ON (the default), this variable has the effect of automatically specifying values for auto_increment_increment and auto_increment_offset based on the cluster size. The idea behind it is to help prevent auto-increment value conflicts when writing to multiple nodes. However, what it also means is that in a multi-node cluster, the auto-increment values generated by any given node will never be consecutive and the “next” auto-increment value on the slave cluster node will always be higher than what the master believes it should be. For example:

Hmm, that’s not good.

The above scenario can be avoided by setting wsrep_auto_increment_control to OFF on the node that’s acting as a slave while the cluster is still receiving asynchronous replication traffic. This can be configured in /etc/my.cnf or set dynamically from the MySQL command line with:

In the testing that I’ve done, this appears to be sufficient to deal with the problem, even if the original master has non-standard values specified for the auto_increment_* variables (e.g., because it’s part of a master-master pair where the odd ID numbers are generated by one server and the even ID numbers from the other one).

In fact, if the cluster is always going to be used in single-writer mode, there’s a compelling argument to be made for setting this variable to OFF on all of the cluster nodes even when going into production: it will prevent the cluster from burning through N auto-increment IDs every time a single row is inserted.

Fixing it when it’s broken

There’s no secret magic here. Percona Toolkit’s pt-table-checksum and pt-table-sync can be used to check and repair the data divergence between the master and the slave cluster node, just as if PXC were not involved, although you may find that it’s just faster to rebuild the cluster, depending upon how many diffs are found. Sometimes this issue happens right away when the master-to-cluster replication is started, and fixing it involves just one or two rows; other times I have seen it not occur for days with lots of rows out of sync once it finally breaks.

The tl;dr version

In sum, if you’re doing traditional replication into a PXC cluster (or any other flavor of MySQL/MariaDB + Galera), you may have issues with multi-valued INSERT statements, and this can be prevented with a configuration change on either side of the replication stream:

  • On the master, set binlog_format=ROW
  • On the PXC slave node, set wsrep_auto_increment_control=OFF
PREVIOUS POST
NEXT POST
Ernie Souhrada

Ernie joined Percona in April 2012 as a Senior Consultant. In his previous lives, he has been everything from a Perl/Java developer to a Linux sysadmin, a MySQL DBA to a Cisco network engineer, and a security auditor to an IT engineering manager, many of these things all at the same time. When not working on MySQL, he might be found on the ski slope, at a psytrance festival, or at the nearest sushi bar.

4 Comments

  • How do you manage to switch to another node in a system where you only write to one node?
    I think then you have to set wsrep_auto_increment_control=on/off on a different machine?
    Or do you prevent your system from switching state or hide node via haproxy?

  • @R.Rusch–

    Typically you wouldn’t have to worry about that – generally the use case for replicating into a cluster is part of a migration project, and you just keep the replication running from the standalone node into the cluster for as long as you need to run your migration tests. However, if you were going to run async replication into the cluster for some extended period of time and potentially need to change nodes, you would either have to adjust wsrep_auto_increment_control on your target node or replicate into the cluster with RBR. Once you remove async replication from the equation or you switch your entire system to RBR, this becomes a non-issue.

Leave a Reply