Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Galera Cache (gcache) is finally recoverable on restart

November 30, 2016

Author

Krunal Bauskar

MySQL

Percona Software

Share this Post:

This post describes how to recover Galera Cache (or gcache) on restart.

Recently Codership introduced (with Galera 3.19) a very important and long awaited feature. Now users can recover Galera cache on restart.

Need

If you gracefully shutdown cluster nodes one after another, with some lag time between nodes, then the last node to shutdown holds the latest data. Next time you restart the cluster, the last node shutdown will be the first one to boot. Any followup nodes that join the cluster after the first node will demand an SST.

Why SST, when these nodes already have data and only few write-sets are missing? The DONOR node caches missing write-sets in Galera cache, but on restart this cache is wiped clean and restarted fresh. So the DONOR node doesn’t have a Galera cache to donate missing write-sets.

This painful set up made it necessary for users to think and plan before gracefully taking down the cluster. With the introduction of this new feature, the user can retain the Galera cache.

How does this help ?

On restart, the node will revive the galera-cache. This means the node can act as a DONOR and service missing write-sets (facilitating IST, instead of using SST). This option to retain the galera-cache is controlled by an option named gcache.recover=yes/no. The default is NO (Galera cache is not retained). The user can set this option for all nodes, or selective nodes, based on disk usage.

gcache.recover in action

The example below demonstrates how to use this option:

- Let’s say the user has a three node cluster (n1, n2, n3), with all in sync.

- The user gracefully shutdown n2 and n3.

- n1 is still up and running, and processes some workload, so now n1 has latest data.

- n1 is eventually shutdown.

- Now the user decides to restart the cluster. Obviously, the user needs to start n1 first, followed by n2/n3.

- n1 boots up, forming an new cluster.

- n2 boots up, joins the cluster, finds there are missing write-sets and demands IST but given that n1 doesn’t have a gcache, it falls back to SST.

n2 (JOINER node log):

2016-11-18 13:11:06 3277 [Note] WSREP: State transfer required: 
 Group state: 839028c7-ad61-11e6-9055-fe766a1886c3:4680
 Local state: 839028c7-ad61-11e6-9055-fe766a1886c3:3893

2016-11-18 13:11:06 3277 [Note] WSREP: State transfer required:

Group state: 839028c7-ad61-11e6-9055-fe766a1886c3:4680

Local state: 839028c7-ad61-11e6-9055-fe766a1886c3:3893

n1 (DONOR node log), gcache.recover=no:

2016-11-18 13:11:06 3245 [Note] WSREP: IST request: 839028c7-ad61-11e6-9055-fe766a1886c3:3893-4680|tcp://192.168.1.3:5031
2016-11-18 13:11:06 3245 [Note] WSREP: IST first seqno 3894 not found from cache, falling back to SST

1 2	2016-11-18 13:11:06 3245 [Note] WSREP: IST request: 839028c7-ad61-11e6-9055-fe766a1886c3:3893-4680\|tcp://192.168.1.3:5031 2016-11-18 13:11:06 3245 [Note] WSREP: IST first seqno 3894 not found from cache, falling back to SST

Now let’s re-execute this scenario with gcache.recover=yes.

n2 (JOINER node log):

2016-11-18 13:24:38 4603 [Note] WSREP: State transfer required: 
 Group state: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:1495
 Local state: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:769
....
2016-11-18 13:24:41 4603 [Note] WSREP: Receiving IST: 726 writesets, seqnos 769-1495
....
2016-11-18 13:24:49 4603 [Note] WSREP: IST received: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:1495

2016-11-18 13:24:38 4603 [Note] WSREP: State transfer required:

Group state: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:1495

Local state: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:769

....

2016-11-18 13:24:41 4603 [Note] WSREP: Receiving IST: 726 writesets, seqnos 769-1495

....

2016-11-18 13:24:49 4603 [Note] WSREP: IST received: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:1495

n1 (DONOR node log):

2016-11-18 13:24:38 4573 [Note] WSREP: IST request: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:769-1495|tcp://192.168.1.3:5031
2016-11-18 13:24:38 4573 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

1 2	2016-11-18 13:24:38 4573 [Note] WSREP: IST request: ee8ef398-ad63-11e6-92ed-d6c0646c9f13:769-1495\|tcp://192.168.1.3:5031 2016-11-18 13:24:38 4573 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

You can also validate this by checking the lowest write-set available in gcache on the DONOR node.

mysql> show status like 'wsrep_local_cached_downto';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| wsrep_local_cached_downto | 1 |
+---------------------------+-------+
1 row in set (0.00 sec)

mysql> show status like 'wsrep_local_cached_downto';

+---------------------------+-------+

| Variable_name | Value |

+---------------------------+-------+

| wsrep_local_cached_downto | 1 |

+---------------------------+-------+

1 row in set (0.00 sec)

So as you can see, gcache.recover could restore the cache on restart and help service IST over SST. This is a major resource saver for most of those graceful shutdowns.

gcache revive doesn’t work if . . .

If gcache pages are involved. Gcache pages are still removed on shutdown, and the gcache write-set until that point also gets cleared.

Again let’s see and example:

- Let’s assume the same configuration and workflow as mentioned above. We will just change the workload pattern.

- n1, n2, n3 are in sync and an average-size workload is executed, such that the write-set fits in the gcache. (seqno=1-x)

- n2 and n3 are shutdown.

- n1 continues to operate and executes some average size workload followed by a huge transaction that results in the creation of a gcache page. (1-x-a-b-c-h) [h represent transaction seqno]

- Now n1 is shutdown. During shutdown, gcache pages are purged (irrespective of the keep_page_sizes setting).

- The purge ensures that all the write-sets that has seqno smaller than gcache-page-residing write-set are purged, too. This effectively means (1-h) everything is removed, including (a,b,c).

- On restart, even though n1 can revive the gcache it can’t revive anything, as all the write-sets are purged.

- When n2 boots up, it requests IST, but n1 can’t service the missing write-set (a,b,c,h). This causes SST to take place.

Summing it up

Needless to say, gcache.recover is a much needed feature, given it saves SST pain. (Thanks Codership.) It would be good to see if the feature can be optimized to work with gcache pages.

And yes, Percona XtraDB Cluster inherits this feature in its upcoming release.

0 0 votes

Article Rating

5 Comments

Oldest

Newest Most Voted

Shrinivasa

9 years ago

Hi Krunal,

As per documentation, majority of nodes should be up for cluster to function. But how come both node 1 is able to serve request when other 2 nodes are down.

Author

Krunal Bauskar

9 years ago

You can have cluster with single node too. When you first boot the node you have cluster with single node.
If you have 2 node cluster and node-2 leaves the cluster gracefully (user shutdown) that is not treated as split-brain as before going off node-2 communicate its graceful shutdown status to node-1.