MySQL Test Framework for Percona XtraDB Cluster

At my latest webinar “MySQL Test Framework (MTR) for Troubleshooting”, I received an interesting question about MTR test cases for Percona XtraDB Cluster (PXC). Particularly about testing SST and IST.

This post is intended to answer this question. It assumes you are familiar with MTR and can write tests for MySQL servers. If you are not, please watch the webinar recording first.

You can find example tests in any PXC tarball package. They are located in directories mysql-test/suite/galera , mysql-test/suite/galera_3nodes  and mysql-test/suite/wsrep , though that last directory only contains a configuration file.

If you simply try to run tests in galera suite you will find they all are disabled, because the environment variable WSREP_PROVIDER  was not set:

In order to run these tests you need to set this variable first.

I use the quite outdated 5.7.19 PXC package (the version does not matter for the purpose of this post) and run tests as:

After the variable WSREP_PROVIDER  is set,  mtr  can successfully run:


Now we are ready to write our first PXC test. The easiest way to get started is to open any existing test and check how it is written. Then modify it so that it replays our own scenario.

Since the question was about testing IST  and SST, I will use the test galera_ist_progress  as an example. First let’s check that it runs successfully and that it does not have any requirements that could prevent it from running inside regular production binaries:

Everything is fine. Now let’s look into the test itself.

First, this test has its own configuration file. Let’s check what’s in there:

galera_2nodes.cnf  is one of the standard configuration files in galera suite. If we look into it we may notice that  wsrep_provider_options  is defined and overriding this option is not required for all tests.

We’ll continue our review. The test script includes the  file:

This file is located outside of galera suite and contains 2 lines: , in its turn, creates as many nodes as defined by the  galera_cluster_size  variable and additionally creates a default connection for each of them.

Now let’s step out from galera_ist_progress  and check if this knowledge is enough to create our first PXC test.

I created a simple test based on a two node setup which checks a few status and system variables, creates a table, inserts data into it, and ensures that content is accessible on both nodes:

However, if I run this test in the main suite, it will fail:

The reason for this failure is that galera suite has default option files that set the necessary variables. Let’s skip those option files for a while and simply run our test in galera suite:

You will see that the test reports that the two nodes run on different ports:

… and that PXC started:

And we can also clearly see that each node sees the changes to our test table that were made by the other node.

Now let’s get back to IST  test, defined in galera_ist_progress.test .

In order to test IST  it first stops writes to the cluster:

Then it connects to node 1 and waits until  wsrep_cluster_size  becomes 1:

Then it turns wsrep_on OFF  on node 2:

Now node 2 is completely isolated and node 1 can be updated, so we can test IST  when we bring node 2 back online.

After the update is done, node 2 is brought online:

Once node 2 is online, checks for IST progress are performed. To check for IST progress, the test greps the error log file from node 2 where any messages about IST progress are printed:

Here is the error log snipped from node 2 when it re-joined the cluster and initiated state transfer.

If you want to write your own tests for IST and SST operations you can use existing test cases as a baseline. You are not required to use grep, and can explore your own scenarios. The important parts of the code are:

  • The variable WSREP_PROVIDER must be set before the test run
  • The test should be either in galera suite or if you choose to use your own suite you must copy the definitions from the galera suite default configuration file
  • The test should include the file include/
  • To isolate the node from the cluster run the following code:

Replace the node numbers if needed.

To bring the node back to the cluster run the following code:

Depending on the size of the updates and gcache you can test either IST or SST in this way.

Share this post