September 22, 2014

Doing a rolling upgrade of Percona XtraDB Cluster from 5.5 to 5.6

Overview

Percona XtraDB Cluster 5.6 has been GA for several months now and people are thinking more and more about moving from 5.5 to 5.6. Most people don’t want to upgrade all at once, but would prefer a rolling upgrade to avoid downtime and ensure 5.6 is behaving in a stable fashion before putting all of production on it. The official guide to a rolling upgrade can be found in the PXC 5.6 manual. This blog post will attempt to summarize the basic process.

However, there are a few caveats to trying to do a rolling 5.6 upgrade from 5.5:

  1. If you mix Galera 2 and Galera 3 nodes, you must set wsrep_provider_options=”socket.checksum=1″ on the Galera 3 nodes for backwards compatibility between Galera versions.
  2. You must set some 5.6 settings for 5.5 compatibility with respect to replication:
    1. You can’t enable GTID async replication on the 5.6 nodes
    2. You should use –log-bin-use-v1-row-events on the 5.6 nodes
    3. You should set binlog_checksum=NONE  on the 5.6 nodes
  3. You must not SST a 5.5 donor to a 5.6 joiner as the SST script does not handle mysql_upgrade
  4. You should set the 5.6 nodes read_only and not write to them!

The basic upgrade flow

The basic upgrade flow is:

  1. For some node(s): upgrade to 5.6, do all the above stuff, put them into a read_only pool
  2. Repeat step 1 as desired
  3. Once your 5.6 pool is sufficiently large, cut over writes to the 5.6 pool (turn off read_only, etc.) and upgrade the rest.

This is, in essence, exactly like upgrading a 5.5 master/slave cluster to 5.6 — you upgrade the slaves first, promote a slave and upgrade the master; we just have more masters to think about.

Once your upgrade is fully to 5.6, then you can go back through and remove all the 5.5 backwards compatibility

Why can’t I write to the 5.6 nodes?

The heaviest caveat is probably the fact that in a mixed 5.5 / 5.6 cluster, you are not supposed to write to the 5.6 nodes.  Why is that?  Well, the reason goes back to MySQL itself.  PXC/Galera uses standard RBR binlog events from MySQL for replication.   Replication between major MySQL versions is only ever officially supported:

  • across 1 major version (i.e., 5.5 to 5.6, though multiple version hops do often work)
  • from a lower version master to a higher version slave (i.e., 5.5 is your master and 5.6 is your slave, but not the other way around)

This compatibility requirement (which has existed for a very long time in MySQL) works great when you have a single Master replication topology, but true multi-master (multi-writer) has obviously never been considered.

Some alternatives

Does writing to the 5.6 nodes REALLY break things?

The restriction on 5.6 masters of 5.5 slaves is probably too strict in many cases.  Technically only older to newer replication is ever truly supported, but in practice you may be able to run a mixed cluster with writes to all nodes as long as you are careful.  This means (at least) that any modifications to column type formats in the newer version NOT be upgraded while the old version remains active in the cluster.  There might be other issues, I’m not sure, I cannot say I’ve tested every possible circumstance.

So, can I truly say I recommend this?  I cannot say that officially, but you may find it works fine.  As long as you acknowledge that something unforeseen may break your cluster and your migration plan, it may be reasonable.  If you decide to explore this option, please test this thoroughly and be willing to accept the consequences of it not working before trying it in production!

Using Async replication to upgrade

Another alternative is rather than trying to mix the clusters and keeping 5.6 nodes read_only, why not just setup the 5.6 cluster as an async slave of your 5.5 cluster and migrate your application to the new cluster when you are ready?  This is practically the same as maintaining a split 5.5/5.6 read_write/read_only cluster without so much risk and a smaller list of don’ts.  Cutover in this case would be effectively like promoting a 5.6 slave to master, except you would promote the 5.6 cluster.

One caveat with this approach might be dealing with replication throughput:  async may not be able to keep up replicating your 5.5 cluster writes to a separate 5.6 cluster.  Definitely check out wsrep_preordered to speed things up, it may help.  But realize some busy workloads just may not ever be able to use async into another cluster.

Just take the outage

A final alternative for this post is the idea of simply upgrading the entire cluster to 5.6 all at once during a maintenance window.  I grant that this defeats the point of a rolling upgrade, but it may offer a lot of simplicity in the longer run.

Conclusion

A rolling PXC / Galera upgrade across major MySQL versions is limited by the fact that there is no official support or reason for Oracle to support newer master to older slave.  In practice, it may work much of the time, but these situations should be considered carefully and the risk weighed against all other options.

About Jay Janssen

Jay joined Percona in 2011 after 7 years at Yahoo working in a variety of fields including High Availability architectures, MySQL training, tool building, global server load balancing, multi-datacenter environments, operationalization, and monitoring. He holds a B.S. of Computer Science from Rochester Institute of Technology.

Speak Your Mind

*