EmergencyEMERGENCY? Get 24/7 Help Now!

Diagnosing SST errors with Percona XtraDB Cluster for MySQL

 | December 30, 2014 |  Posted In: MySQL, Percona XtraDB Cluster

PREVIOUS POST
NEXT POST

State Snapshot Transfer (SST) is used in Percona XtraDB Cluster (PXC) when a new node joins the cluster or to resync a failed node if Incremental State Transfer (IST) is no longer available. SST is triggered automatically but there is no magic: If it is not configured properly, it will not work and new nodes will never be able to join the cluster. Let’s have a look at a few classic issues.

Port for SST is not open

The donor and the joiner communicate on port 4444, and if the port is closed on one side, SST will always fail.

You will see in the error log of the donor that SST is started:

But then nothing happens, and some time later you will see a bunch of errors:

On the joiner side, you will see a similar sequence: SST is started, then hangs and is finally aborted:

The solution is of course to make sure that the ports are open on both sides.

SST is not correctly configured

Sometimes you will see an error like this on the donor:

And if you look at innobackup.backup.log:

What happened?

The default SST method is xtrabackup-v2 and for it to work, you need to specify a username/password in the my.cnf file:

And you also need to create the corresponding MySQL user:

So you should check that the user has been correctly created in MySQL and that wsrep_sst_auth is correctly set.

Galera versions do not match

Here is another set of errors you may see in the error log of the donor:

Here the issue is that you try to connect a node using Galera 2.x and a node running Galera 3.x. This can happen if you try to use a PXC 5.5 node and a PXC 5.6 node.

The right solution is probably to understand why you ended up with such inconsistent versions and make sure all nodes are using the same Percona XtraDB Cluster version and Galera version.

But if you know what you are doing, you can also instruct the node using Galera 3.x that it will communicate with Galera 2.x nodes by specifying in the my.cnf file:

Conclusion

SST errors can have multiple reasons for occurring, and the best way to diagnose the issue is to have a look at the error log of the donor and the joiner. Galera is in general quite verbose so you can follow the progress of SST on both nodes and see where it fails. Then it is mostly about being able to interpret the error messages.

PREVIOUS POST
NEXT POST
Stephane Combaudon

Stéphane joined Percona in July 2012, after working as a MySQL DBA for leading French companies such as Dailymotion and France Telecom. In real life, he lives in Paris with his wife and their twin daughters. When not in front of a computer or not spending time with his family, he likes playing chess and hiking.

Leave a Reply