Beware the SST
In Percona XtraDB Cluster (PXC) I often run across users who are fearful of SSTs on their clusters. I’ve always maintained that if you can’t cope with a SST, PXC may not be right for you, but that doesn’t change the fact that SSTs with multiple Terabytes of data can be quite costly.
SST, by current definition, is a full backup of a Donor to Joiner. The most popular method is Percona XtraBackup, so we’re talking about a donor node that must:
- Run a full XtraBackup that reads its entire datadir
- Keep up with Galera replication to it as much as possible (though laggy donors don’t send flow control)
- Possibly still be serving application traffic if you don’t remove Donors from rotation.
So, I’ve been interested in alternative ways to work around state transfers and I want to present one way I’ve found that may be useful to someone out there.
Percona XtraBackup and Incrementals
It is possible to use Percona XtraBackup Full and Incremental backups to build a datadir that might possibly SST. First we’ll focus on the mechanics of the backups, preparing them and getting the Galera GTID and then later discuss when it may be viable for IST.
Suppose I have fairly recent full Xtrabackup and and one or more incremental backups that I can apply on top of that to get VERY close to realtime on my cluster (more on that ‘VERY’ later).
# innobackupex --no-timestamp /backups/full
... sometime later ...
# innobackupex --incremental /backups/inc1 --no-timestamp --incremental-basedir /backups/full
... sometime later ...
# innobackupex --incremental /backups/inc2 --no-timestamp --incremental-basedir /backups/inc1
In my proof of concept test, I now have a full and two incrementals:
# du -shc /backups/*
To recover this data, I follow the normal Xtrabackup incremental apply process:
# cp -av /backups/full /backups/restore
# innobackupex --apply-log --redo-only --use-memory=1G /backups/restore
xtrabackup: Recovered WSREP position: 1663c027-2a29-11e5-85da-aa5ca45f600f:35694784
# innobackupex --apply-log --redo-only /backups/restore --incremental-dir /backups/inc1 --use-memory=1G
# innobackupex --apply-log --redo-only /backups/restore --incremental-dir /backups/inc2 --use-memory=1G
xtrabackup: Recovered WSREP position: 1663c027-2a29-11e5-85da-aa5ca45f600f:46469942
# innobackupex --apply-log /backups/restore --use-memory=1G
I can see that as I roll forward on my incrementals, I get a higher and higher GTID. Galera’s GTID is stored in the Innodb recovery information, so Xtrabackup extracts it after every batch it applies to the datadir we’re restoring.
We now have a datadir that is ready to go, we need to copy it into the datadir of our joiner node and setup a grastate.dat. Without a grastate, starting the node would force an SST no matter what.
# innobackupex --copy-back /backups/restore
# ... copy a grastate.dat from another running node ...
# cat /var/lib/mysql/grastate.dat
# GALERA saved state
# chown -R mysql.mysql /var/lib/mysql/
If I start the node now, it should see the grastate.dat with the -1 seqo and run –wsrep_recover to extract the GTID from Innodb (I could have also just put that directly into my grastate.dat).
This will allow the node to startup from merged Xtrabackup incrementals with a known Galera GTID.
But will it IST?
That’s the question. IST happens when the selected donor has all the transactions the joiner needs to get it fully caught up inside of the donor’s gcache. There are several implications of this:
- A gcache is mmap allocated and does not persist across restarts on the donor. A restart essentially purges the mmap.
- You can query the oldest GTID seqno on a donor by checking the status variable ‘wsrep_local_cached_downto’. This variable is not available on 5.5, so you are forced to guess if you can IST or not.
- most PXC 5.6 will auto-select a donor based on IST. Prior to that (i.e., 5.5) donor selection was not based on IST candidacy at all, meaning you had to be much more careful and do donor selection manually.
- There’s no direct mapping from the earliest GTID in a gcache to a specific time, so knowing at a glance if a given incremental will be enough to IST is difficult.
- It’s also difficult to know how big to make your gcache (set in MB/GB/etc.) with respect to your backups (which are scheduled by the day/hour/etc.)
All that being said, we’re still talking about backups here. The above method will only work if and only if:
- You do frequent incremental backups
- You have a large gcache (hopefully more on this in a future blog post)
- You can restore a backup faster than it takes for your gcache to overflow