Galera/Percona XtraDB Cluster (PXC) for MySQL is a hot thing right now and some users jump right in without enough testing. Consequently, they’re more likely to either suffer failure or issues that prevent them from moving forward. If you are thinking of migrating your workload to Percona XtraDB Cluster, make sure to go through these basics.
log_slave_updates is REQUIRED
You need to have log_slave_updates enabled on the cluster node acting as async slave for replicated events from the async master to be applied to the other nodes, that is if you have more than one PXC node. This is because before Galera can create writesets for the replicated events, binlog events must be generated for the transactions first. Under normal async replication, an event will not be written to the slave’s binary unless log_slave_updates is enabled, this is similar to Percona XtraDB Cluster in that if you want an async event replicated to the whole cluster you have to have the same enabled.
MyISAM in PXC May Lead to Inconsistencies and May Not Even Work!
MyISAM tables are supported within Percona XtraDB Cluster, however, MyISAM has only basic support, primarily because the storage engine is non-transactional and so PXC cannot guarantee the data will remain consistent within the cluster. Also, at the time of this writing, from async stream, MyISAM is not being replicated at all which I reported on this bug. This would be a showstopper for anyone who wants to, but still have MyISAM tables. You can still try by filtering MyISAM tables though if you can leave them behind. Lastly, once that bug above is fixed, and you still have MyISAM tables you wish to keep running under PXC, wsrep_replicate_myisam allows you to do so. However, if you can, you should consider moving to InnoDB altogether. There are very few reasons to stay with MyISAM nowadays i.e. if you have FULLTEXT you simply cannot replace in short term.
Control Your Auto-Incrementing Columns
PXC/Galera controls auto-incrementing values internally within the cluster, this is to avoid collisions when INSERTs are happening on not only a single node. However, this may work differently when replicating from an async master, for example like the one described on these two bugs. Galera use writesets to replicate cluster events to the other nodes, in essence these are RBR events, plus a few additional structures used for certification. Having said that, it would be good if your async master can use ROW based binlog format as well to achieve better consistency, if you have an async master <= 5.0 though, you can workaround this by turning off wsrep_auto_increment_control from the Percona XtraDB Cluster nodes as workaround. Note that with the latter, make sure to not forget turning the feature back on when you switch to the new cluster especially if you are planning to write on multiple nodes.
Have PRIMARY KEYS
If you still have tables without PRIMARY KEYs, then its time to make one for them. Galera does not work well with those and even if there is basic support when wsrep_certify_nonPK is enabled, you can still hit issues like when automatic creation of primary keys for use during certification becomes non-deterministic. Although the previous bug has been fixed on latest release (5.5.30-23.7.4), table without PK imposes an additional overhead, and because cluster performance is somewhat dependent on the slowest node – this overhead can easily become visible on the whole cluster and your async replication being affected.
Be Prepared for some Latency
PXC can take workloads, however not just any workload – it shines with small transactions but not with big ones. If you are consistently running overnight reporting jobs and getting them through the replication stream expect some replication lag. This is because because synchronous replication inside PXC has an additional overhead, this means the SQL_THREAD will not be able to execute events as fast, on top of that, other factors affecting async replication like if your workload is CPU or IO bound. Peter wrote some good details about it here.
If you have encountered any other issues replicating to Percona XtraDB Cluster, I’d like to hear your thoughts and experience on the comments 🙂