Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Contact Us

How to calculate the correct size of Percona XtraDB Cluster’s gcache

September 8, 2014

Author

Share this Post:

When a write query is sent to Percona XtraDB Cluster all the nodes store the writeset on a file called gcache. By default the name of that file is galera.cache and it is stored in the MySQL datadir. This is a very important file, and as usual with the most important variables in MySQL, the default value is not good for high-loaded servers. Let’s see why it’s important and how can we calculate a correct value for the workload of our cluster.

What’s the gcache?
When a node goes out of the cluster (crash or maintenance) it obviously stops receiving changes. When you try to reconnect the node to the cluster the data will be outdated. The joiner node needs to ask a donor to send the changes happened during the downtime.

The donor will first try to transfer an incremental (IST), that is, the writesets the cluster received while the node was down. The donor checks the last writeset received by the joiner and then checks local gcache file. If all needed writesets are on that cache the donor sends them to the joiner. The joiner applies them and that’s all, it is up to date and ready to join the cluster. Therefore, IST can only be achieved if all changes missed by the node that went away are still in that gcache file of the donor.

On the other hand, if the writesets are not there a full transfer would be needed (SST) using one of the supported methods, XtraBackup, Rsync or mysqldump.

In a summary, the difference between a IST and SST is the time that a node needs to join the cluster. The difference could be from seconds to hours. In case of WAN connections and large datasets maybe days.

That’s why having a correct gcache is important. It work as a circular log, so when it is full it starts to rewrite the writesets at the beginning. With a larger gcache a node can be out of the cluster more time without requiring a SST. My colleague Jay Janssen explains in more detail about how IST works and how to find the right server to use as donor.

Calculating the correct size
When trick is pretty similar to the one used to calculate the correct InnoDB log file size. We need to check how many bytes are written every minute. The variables to check are:

wsrep_replicated_bytes: Total size (in bytes) of writesets sent to other nodes.

wsrep_received_bytes: Total size (in bytes) of writesets received from other nodes.

mysql> show global status like 'wsrep_received_bytes'; 
show global status like 'wsrep_replicated_bytes'; 
select sleep(60); 
show global status like 'wsrep_received_bytes'; 
show global status like 'wsrep_replicated_bytes';
+----------------------+----------+
| Variable_name        | Value    |
+----------------------+----------+
| wsrep_received_bytes | 83976571 |
+----------------------+----------+
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| wsrep_replicated_bytes | 0     |
+------------------------+-------+

[...]

+----------------------+----------+
| Variable_name        | Value    |
+----------------------+----------+
| wsrep_received_bytes | 90576957 |
+----------------------+----------+
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| wsrep_replicated_bytes | 800   |
+------------------------+-------+

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

mysql> show global status like 'wsrep_received_bytes';

show global status like 'wsrep_replicated_bytes';

select sleep(60);

show global status like 'wsrep_received_bytes';

show global status like 'wsrep_replicated_bytes';

+----------------------+----------+

| Variable_name | Value |

+----------------------+----------+

| wsrep_received_bytes | 83976571 |

+----------------------+----------+

+------------------------+-------+

| Variable_name | Value |

+------------------------+-------+

| wsrep_replicated_bytes | 0 |

+------------------------+-------+

[...]

+----------------------+----------+

| Variable_name | Value |

+----------------------+----------+

| wsrep_received_bytes | 90576957 |

+----------------------+----------+

+------------------------+-------+

| Variable_name | Value |

+------------------------+-------+

| wsrep_replicated_bytes | 800 |

+------------------------+-------+

Therefore:

Bytes per minute:

(second wsrep_received_bytes – first wsrep_received_bytes) + (second wsrep_replicated_bytes – first wsrep_replicated_bytes)

(90576957 – 83976571) + (800 – 0) = 6601186 bytes or 6 MB per minute.

Bytes per hour:

6MB * 60 minutes = 360 MB per hour of writesets received by the cluster.

If you want to allow one hour of maintenance (or downtime) of a node, you need to increase the gcache to that size. If you want more time, just make it bigger.

0 0 votes

Article Rating

Subscribe

5 Comments

Oldest

Newest Most Voted

Mrten

11 years ago

Is there a downside to setting galera.cache to “ridiculously” high values, besides of course the space it takes on disk? I’ve set it to 10G and do not see problems.

0

Reply