EmergencyEMERGENCY? Get 24/7 Help Now!

Tracking IST Progress in Percona XtraDB Cluster

 | April 20, 2017 |  Posted In: MySQL, Percona XtraDB Cluster, XtraDB Cluster

PREVIOUS POST
NEXT POST

ISTIn this blog post, we’ll look at how Percona XtraDB Cluster uses IST.

Introduction

Percona XtraDB Cluster uses the concept of an Incremental State Transfer (IST). When a node of the cluster leaves the cluster for a short period of time, it can rejoin the cluster by getting the delta set of missing changes from any active node in the cluster.

This process of getting the delta set of changes is named as IST in Percona XtraDB Cluster.

Tracking IST Progress

The number of write-sets/changes that the joining node needs to catch up on when rejoining the cluster is dictated by:

  1. The duration the node was not present in the cluster
  2. The workload of the cluster during that time frame

This catch-up process can be time-consuming. Until this process is complete, the rejoining node is not ready to process any active workloads.

We believe that any process that is time-consuming should have a progress monitor attached to it. This is exactly what we have done.

In the latest release of Percona XtraDB Cluster 5.7.17-29.20, we added an IST progress monitor that is exposed through SHOW STATUS. This helps you to monitor the percentage of write-sets which has been applied by the rejoining node.

Let’s see this in a working example:

  • Start a two-node cluster
  • Process some basic workloads, allow cluster replication
  • Shutdown node-2
  • Node-1 then continues to process more workloads (the workload fits the allocated gcache)
  • Restart Node-2, causing it to trigger an IST

As you can see, the wsrep_ist_receive_status monitoring string indicates the percentage completed, currently received write-set and the range of write-sets applicable to the IST.

Once the IST activity is complete, the variable shows an empty-string.

Closing Comments

I hope you enjoy this newly added feature. Percona Engineering would be happy to hear from you, about more such features that can help you make effective use of Percona XtraDB Cluster. We will try our best to include them in our future plans (based on feasibility).

Note: Special thanks for Kenn Takara and Roel Van de Paar for helping me edit this post.

PREVIOUS POST
NEXT POST
Krunal Bauskar

Krunal is PXC lead at Percona. He is responsible for day-day PXC development, what goes into PXC, bug fixes, releases, etc.. Before joining Percona he use to work as part of InnoDB team at MySQL/Oracle. He authored most of the temporary table revamp work, undo log truncate, atomic truncate and lot of other features. In past he was associated with Yahoo! Labs researching on bigdata problems and database startup which is now part of Teradata. His interest mainly includes data-management at any scale and has been practicing it for more than decade now.

2 Comments

  • I like the idea! Is it reasonable to believe that the end number (1589676) will not change throughout the process (outside of a lab)? Lets assume this is a production environment with a 3+ node cluster, as a “two-node” cluster is not really reasonable with WSREP due to the “donor” is desync’d as well. A proper healthcheck should probably not be sending database connections to the “donor” node, right? So, the rest of the nodes, besides the donor and the “down” node, are still receiving transactions. Does this increment the ending number? or do they have another incremental sync when they get done with their lengthy sync?

    I would also really like to see SST progress, which we inevitably end up doing every time the cluster crashes. 🙁

    ~tommy

    • Other cluster nodes continue to receive the traffic that is replicated on group-channel and consumed by DONOR and JOINER node. Once IST apply action is complete JOINER node will proceed with apply of this traffic.

Leave a Reply