How to calculate the correct size of Percona XtraDB Cluster’s gcache

How to calculate the correct size of Percona XtraDB Cluster’s gcache

PREVIOUS POST
NEXT POST

How to calculate the correct size of Percona XtraDB Cluster's gcacheWhen a write query is sent to Percona XtraDB Cluster all the nodes store the writeset on a file called gcache. By default the name of that file is galera.cache and it is stored in the MySQL datadir. This is a very important file, and as usual with the most important variables in MySQL, the default value is not good for high-loaded servers. Let’s see why it’s important and how can we calculate a correct value for the workload of our cluster.

What’s the gcache?
When a node goes out of the cluster (crash or maintenance) it obviously stops receiving changes. When you try to reconnect the node to the cluster the data will be outdated. The joiner node needs to ask a donor to send the changes happened during the downtime.

The donor will first try to transfer an incremental (IST), that is, the writesets the cluster received while the node was down. The donor checks the last writeset received by the joiner and then checks local gcache file. If all needed writesets are on that cache the donor sends them to the joiner. The joiner applies them and that’s all, it is up to date and ready to join the cluster. Therefore, IST can only be achieved if all changes missed by the node that went away are still in that gcache file of the donor.

On the other hand, if the writesets are not there a full transfer would be needed (SST) using one of the supported methods, XtraBackup, Rsync or mysqldump.

In a summary, the difference between a IST and SST is the time that a node needs to join the cluster. The difference could be from seconds to hours. In case of WAN connections and large datasets maybe days.

That’s why having a correct gcache is important. It work as a circular log, so when it is full it starts to rewrite the writesets at the beginning. With a larger gcache a node can be out of the cluster more time without requiring a SST. My colleague Jay Janssen explains in more detail about how IST works and how to find the right server to use as donor.

Calculating the correct size
When trick is pretty similar to the one used to calculate the correct InnoDB log file size. We need to check how many bytes are written every minute. The variables to check are:

wsrep_replicated_bytes: Total size (in bytes) of writesets sent to other nodes.

wsrep_received_bytes: Total size (in bytes) of writesets received from other nodes.

Therefore:

Bytes per minute:

(second wsrep_received_bytes – first wsrep_received_bytes) + (second wsrep_replicated_bytes – first wsrep_replicated_bytes)

(90576957 – 83976571) + (800 – 0) = 6601186 bytes or 6 MB per minute.

Bytes per hour:

6MB * 60 minutes = 360 MB per hour of writesets received by the cluster.

If you want to allow one hour of maintenance (or downtime) of a node, you need to increase the gcache to that size. If you want more time, just make it bigger.

PREVIOUS POST
NEXT POST

Share this post

Comments (5)

  • Mrten Reply

    Is there a downside to setting galera.cache to “ridiculously” high values, besides of course the space it takes on disk? I’ve set it to 10G and do not see problems.

    September 11, 2014 at 5:00 am
  • Miguel Angel Nieto Reply

    I’m not aware of downsides of using a large gcache.

    September 12, 2014 at 5:26 am
  • micu Reply

    Does anyone know if the pattern gcache file access are sequential or random writes and reads?

    November 15, 2014 at 12:01 pm
  • Neo Reply

    Micu,

    I am pretty sure it should be sequential writes just like it is in standard Innodb log files.

    June 9, 2015 at 3:27 am
  • Senn Reply

    is galera.cache the same like MongoDB’s oplog? See https://docs.mongodb.org/manual/core/replica-set-oplog/

    March 5, 2016 at 9:26 am

Leave a Reply