Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Contact Us

Tracking IST Progress in Percona XtraDB Cluster

April 20, 2017

Author

Krunal Bauskar

MySQL

Percona Software

Share this Post:

In this blog post, we’ll look at how Percona XtraDB Cluster uses IST.

Introduction

Percona XtraDB Cluster uses the concept of an Incremental State Transfer (IST). When a node of the cluster leaves the cluster for a short period of time, it can rejoin the cluster by getting the delta set of missing changes from any active node in the cluster.

This process of getting the delta set of changes is named as IST in Percona XtraDB Cluster.

Tracking IST Progress

The number of write-sets/changes that the joining node needs to catch up on when rejoining the cluster is dictated by:

1. The duration the node was not present in the cluster

1. The workload of the cluster during that time frame

This catch-up process can be time-consuming. Until this process is complete, the rejoining node is not ready to process any active workloads.

We believe that any process that is time-consuming should have a progress monitor attached to it. This is exactly what we have done.

In the latest release of Percona XtraDB Cluster 5.7.17-29.20, we added an IST progress monitor that is exposed through SHOW STATUS. This helps you to monitor the percentage of write-sets which has been applied by the rejoining node.

Let’s see this in a working example:

- Start a two-node cluster

- Process some basic workloads, allow cluster replication

- Shutdown node-2

- Node-1 then continues to process more workloads (the workload fits the allocated gcache)

- Restart Node-2, causing it to trigger an IST

mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| wsrep_ist_receive_status | 3% complete, received seqno 1421771 of 1415410-1589676 |
+--------------------------+--------------------------------------------------------+
1 row in set (0.00 sec)

....

mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 52% complete, received seqno 1506799 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)

....

mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| wsrep_ist_receive_status | 97% complete, received seqno 1585923 of 1415410-1589676 |
+--------------------------+---------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show status like 'wsrep_ist_receive_status';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| wsrep_ist_receive_status | |
+--------------------------+-------+
1 row in set (0.00 sec)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

mysql> show status like 'wsrep_ist_receive_status';

+--------------------------+--------------------------------------------------------+

| Variable_name | Value |

+--------------------------+--------------------------------------------------------+

| wsrep_ist_receive_status | 3% complete, received seqno 1421771 of 1415410-1589676 |

+--------------------------+--------------------------------------------------------+

1 row in set (0.00 sec)

....

mysql> show status like 'wsrep_ist_receive_status';

+--------------------------+---------------------------------------------------------+

| Variable_name | Value |

+--------------------------+---------------------------------------------------------+

| wsrep_ist_receive_status | 52% complete, received seqno 1506799 of 1415410-1589676 |

+--------------------------+---------------------------------------------------------+

1 row in set (0.00 sec)

....

mysql> show status like 'wsrep_ist_receive_status';

+--------------------------+---------------------------------------------------------+

| Variable_name | Value |

+--------------------------+---------------------------------------------------------+

| wsrep_ist_receive_status | 97% complete, received seqno 1585923 of 1415410-1589676 |

+--------------------------+---------------------------------------------------------+

1 row in set (0.00 sec)

mysql> show status like 'wsrep_ist_receive_status';

+--------------------------+-------+

| Variable_name | Value |

+--------------------------+-------+

| wsrep_ist_receive_status | |

+--------------------------+-------+

1 row in set (0.00 sec)

As you can see, the wsrep_ist_receive_status monitoring string indicates the percentage completed, currently received write-set and the range of write-sets applicable to the IST.

Once the IST activity is complete, the variable shows an empty-string.

Closing Comments

I hope you enjoy this newly added feature. Percona Engineering would be happy to hear from you, about more such features that can help you make effective use of Percona XtraDB Cluster. We will try our best to include them in our future plans (based on feasibility).

Note: Special thanks for Kenn Takara and Roel Van de Paar for helping me edit this post.

0 0 votes

Article Rating

Subscribe

2 Comments

Oldest

Newest Most Voted

tommymcneely

9 years ago

I like the idea! Is it reasonable to believe that the end number (1589676) will not change throughout the process (outside of a lab)? Lets assume this is a production environment with a 3+ node cluster, as a “two-node” cluster is not really reasonable with WSREP due to the “donor” is desync’d as well. A proper healthcheck should probably not be sending database connections to the “donor” node, right? So, the rest of the nodes, besides the donor and the “down” node, are still receiving transactions. Does this increment the ending number? or do they have another incremental sync when they get done with their lengthy sync?

I would also really like to see SST progress, which we inevitably end up doing every time the cluster crashes. 🙁

~tommy

0

Reply

Author

Krunal Bauskar

9 years ago

Reply to tommymcneely

Other cluster nodes continue to receive the traffic that is replicated on group-channel and consumed by DONOR and JOINER node. Once IST apply action is complete JOINER node will proceed with apply of this traffic.

0

Reply