When using the Percona Operator for MySQL based on Percona XtraDB Cluster (PXC), it’s common to encounter scenarios where cluster nodes request a full State Snapshot Transfer (SST) when rejoining the cluster.
One typical scenario where a State Snapshot Transfer (SST) is required is when a node has been offline long enough that the GCache no longer contains the necessary write sets for an Incremental State Transfer (IST). Unlike SST, which involves a full data copy from another node, IST is a much lighter process that replays the missing write sets from the donor’s GCache, avoiding the need for a complete data transfer.
Another situation that triggers SST is scaling up the cluster by adding new nodes. Each joiner node will require a full SST to synchronize with the cluster.
Additionally, when adding multiple nodes at once, the cluster must perform a separate backup for each joiner. This results in repeated reads from the donor and multiple data transfers over the network, which can quickly become a bottleneck.
In PXC, SST is performed by default using Percona XtraBackup, a physical backup tool. The process involves reading the donor’s data files and streaming them to the joiner node. While the backup operation can be optimized by increasing parallelism and enabling compression, the data must still be read and transferred over the network.
This process can be time-consuming in environments with large database sizes, as it involves transferring a full backup from an existing node to a new one.
These scenarios are ideal for K8s volume snapshots, which operate at the storage layer via the Container Storage Interface (CSI). This almost immediate process doesn’t involve compressing, sending data over the network, or even reading the full dataset.
The PXC Operator supports creating a new cluster from a volume snapshot, a useful feature for cloning or disaster recovery scenarios. In this blog post, however, we’ll explore how volume snapshots can also be used to add new nodes to an existing cluster, significantly reducing the time and resource cost, especially when dealing with large datasets.
The procedure described in this post involves directly manipulating PersistentVolumeClaims (PVCs), including deletion and restoration operations. These actions can lead to data loss or cluster instability if not performed carefully.
Ensure you have proper backups and fully understand the implications before proceeding in a production environment. Always test in a staging setup first.
For this test, I used Google Kubernetes Engine (GKE) with the Percona XtraDB Cluster Operator v1.16.1, running Percona Cluster 8.0.39 images. The PersistentVolumeClaims (PVCs) were 1 TiB in size, hosting a database dataset of approximately 500 GiB.
K8s relies on the CSI (Container Storage Interface) to manage volume operations, including snapshots. To use snapshots, your StorageClass must be associated with a CSI driver that supports the VolumeSnapshot feature.
The Volume Snapshot Class should be created first, as below:
|
1 |
$ cat snapshot-class.yaml<br>apiVersion: snapshot.storage.k8s.io/v1<br>kind: VolumeSnapshotClass<br>metadata:<br> name: snapshot-class<br>driver: pd.csi.storage.gke.io<br>deletionPolicy: Delete<br><br>$ kubectl apply -f snapshot-class.yaml<br>volumesnapshotclass.snapshot.storage.k8s.io/snapshot-class created<br> |
There are two approaches to perform the PVC restore procedure: online, it allows to add nodes while the cluster is still running, and offline, which involves scaling down the cluster and performing the restore while all pods are stopped.
In this example, we’re re-joining existing nodes that would request SST using volume snapshots. The process for joining cluster members without downtime involves the following:
One important caveat is that the healthy pod used for the snapshot must be the number zero PXC member, pxc-0. This ensures that when the cluster is scaled down, the joiner nodes (e.g., pxc-1, pxc-2, etc.) are terminated, allowing their PVCs to be safely deleted and recreated from the snapshot. This is typically the most common scenario, as the pxc-0 pod is often the most up-to-date node in the cluster since, by default, the HAProxy service routes traffic to this member when it’s available, making it a reliable source for snapshotting.
Snapshots are crash-consistent by nature, they capture the state of the filesystem at a specific point in time without coordinating with the database to flush in-memory data to disk. When restored, InnoDB performs crash recovery to bring the database to a consistent state. To minimize the risk of data corruption or recovery failure, it’s critical to ensure that the instance used for the snapshot is fully ACID-compliant at the moment of capture.
This behavior is controlled by the innodb_flush_log_at_trx_commit variable. When set to 1, InnoDB writes and flushes the redo log to disk at every transaction commit, ensuring durability and reducing the chance of data loss during recovery.
By default, the PXC Operator sets innodb_flush_log_at_trx_commit to 2 to optimize performance. In terms of durability, a transaction on a PXC node is only considered committed after it has been replicated and certified by the cluster. While it may not yet be applied on the remote nodes, it has already been safely propagated, ensuring consistency across the cluster. This makes it generally safe to use the value 2, as you would need to lose all nodes simultaneously to lose up to one second of transactions.
We confirm the innodb_flush_log_at_trx_commit variable value by running the following command:
|
1 |
$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "select @@innodb_flush_log_at_trx_commit;"'<br>+----------------------------------+<br>| @@innodb_flush_log_at_trx_commit |<br>+----------------------------------+<br>| 2 |<br>+----------------------------------+<br> |
We’ll need to enforce stricter ACID compliance to take the snapshot, which may impact database performance due to an increased number of fsync operations:
|
1 |
$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global innodb_flush_log_at_trx_commit=1; select @@innodb_flush_log_at_trx_commit;"'<br>+----------------------------------+<br>| @@innodb_flush_log_at_trx_commit |<br>+----------------------------------+<br>| 1 |<br>+----------------------------------+ |
Next, ensure the Donor instance retains the required write-sets to serve an Incremental State Transfer (IST) to the Joiners after the snapshot is restored. This is done by freezing the Gcache with the following command:
|
1 |
$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global wsrep_provider_options="gcache.freeze_purge_at_seqno = now""' |
In write-intensive workloads, you may want to increase the pxc.livenessProbes.initialDelaySeconds from its default value of 300 seconds. This allows the instance more time to apply IST write sets before the liveness probe checks kick in, reducing the risk of premature pod restarts during recovery. Please note that this change will trigger a restart of the PXC pods, which is not the intended outcome of this procedure. This applies to both snapshot and regular XtraBackup SST. So, if you’ve previously handled SST under a heavy workload on this cluster, the pxc.livenessProbes.initialDelaySeconds setting should already be adjusted accordingly.
The next step is to create the sleep-forever file inside the pxc-0 data directory. This ensures the file is included in the snapshot and will be present on the Joiner nodes after restore. This will prevent the MySQL process from automatically running, giving the chance to adjust the node before it joins the cluster.
|
1 |
$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'touch /var/lib/mysql/sleep-forever; sync;' |
The following step is taking the snapshot from the pxc-0 pod PVC:
|
1 |
$ cat snapshot.yaml<br>apiVersion: snapshot.storage.k8s.io/v1<br>kind: VolumeSnapshot<br>metadata:<br> name: pxc-pvc-snapshot<br>spec:<br> volumeSnapshotClassName: snapshot-class<br> source:<br> persistentVolumeClaimName: datadir-cluster1-pxc-0<br><br>$ kubectl apply -f snapshot.yaml<br>volumesnapshot.snapshot.storage.k8s.io/pxc-pvc-snapshot created |
You can check the snapshot’s status. Once READYTOUSE changes to true, it will be ready.
|
1 |
$ kubectl get VolumeSnapshot -w<br>NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE<br>pxc-pvc-snapshot false datadir-cluster1-pxc-0 1Ti snapshot-class snapcontent-ded52414-2974-445f-b8a9-69610ee30da9 15s 16s<br>pxc-pvc-snapshot true datadir-cluster1-pxc-0 1Ti snapshot-class snapcontent-ded52414-2974-445f-b8a9-69610ee30da9 29s 30s |
Then, we must scale down the cluster to restore the snapshot. We will first need to set the spec.unsafeFlags.pxcSize to true to allow the cluster to scale down.
|
1 |
$ kubectl patch pxc cluster1 --type=merge -p '{"spec":{"unsafeFlags":{"pxcSize": true}}}'<br>perconaxtradbcluster.pxc.percona.com/cluster1 patched |
Once done, we can set only one replica for the PXC cluster:
|
1 |
$ kubectl scale --replicas=1 pxc/cluster1<br>perconaxtradbcluster.pxc.percona.com/cluster1 scaled |
We’ll see that only the pxc-0 pod is running:
|
1 |
$ kubectl get pods<br>NAME READY STATUS RESTARTS AGE<br>cluster1-haproxy-0 2/2 Running 6 (3h26m ago) 3h44m<br>cluster1-haproxy-1 2/2 Running 0 3h23m<br>cluster1-haproxy-2 2/2 Running 0 3h22m<br>cluster1-pxc-0 3/3 Running 1 (3h24m ago) 3h44m |
Now we can delete the cluster1-pxc-1 PVC:
|
1 |
$ kubectl delete pvc datadir-cluster1-pxc-1<br>persistentvolumeclaim "datadir-cluster1-pxc-1" deleted |
We can also restore the snapshot to a new PVC with the same name. The target PVC size should be at least the same as the original:
|
1 |
$ cat restore.yaml<br>apiVersion: v1<br>kind: PersistentVolumeClaim<br>metadata:<br> name: datadir-cluster1-pxc-1<br>spec:<br> accessModes:<br> - ReadWriteOnce<br> resources:<br> requests:<br> storage: 1024Gi<br> dataSource:<br> name: pxc-pvc-snapshot<br> kind: VolumeSnapshot<br> apiGroup: snapshot.storage.k8s.io<br><br>$ kubectl apply -f restore.yaml<br>persistentvolumeclaim/datadir-cluster1-pxc-1 created |
In this case, the restored snapshot PVC shows as Pending since the storage class volumeBindingMode is WaitForFirstConsumer. This means that it will wait until the pod starts to bind the PVC:
|
1 |
$ kubectl get pvc<br>NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE<br>datadir-cluster1-pxc-0 Bound pvc-1885ffe5-d99e-40c9-a05e-3c2f1443d504 1Ti RWO standard-rwo <unset> 9d<br>datadir-cluster1-pxc-1 Pending standard-rwo <unset> 14s<br>datadir-cluster1-pxc-2 Bound pvc-24d3aa1a-53fa-42b1-b075-dfc9af358979 1Ti RWO standard-rwo <unset> 3h49m |
Now we can scale up the cluster to start the pod pxc-1:
|
1 |
$ kubectl scale --replicas=2 pxc/cluster1<br>perconaxtradbcluster.pxc.percona.com/cluster1 scaled |
We see the datadir-cluster1-pxc-1 is bound:
|
1 |
$ kubectl get pvc<br>NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE<br>datadir-cluster1-pxc-0 Bound pvc-1885ffe5-d99e-40c9-a05e-3c2f1443d504 1Ti RWO standard-rwo <unset> 9d<br>datadir-cluster1-pxc-1 Bound pvc-d1cf2ed5-6491-4c6e-a1a2-92d41d899808 1Ti RWO standard-rwo <unset> 5m27s<br>datadir-cluster1-pxc-2 Bound pvc-24d3aa1a-53fa-42b1-b075-dfc9af358979 1Ti RWO standard-rwo <unset> 3h54m |
And the pod state shows as running:
|
1 |
$ kubectl get pods cluster1-pxc-1<br>NAME READY STATUS RESTARTS AGE<br>cluster1-pxc-1 3/3 Running 0 2m2s |
Note that since we added the sleep-forever file, the MySQL process is not running.
We’ll need to delete the auto.cnf file, as it contains the pxc-0 MySQL server_uuid. Additionally, we must remove the gvwstate.dat file, which stores the Galera Primary Component information and the Galera node’s UUID, also inherited from pxc-0. Finally, we delete the sleep-forever file to allow the container to start the MySQL process:
|
1 |
$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'rm /var/lib/mysql/auto.cnf; rm /var/lib/mysql/gvwstate.dat; rm /var/lib/mysql/sleep-forever;' |
We check the pods state:
|
1 |
$ kubectl get pods cluster1-pxc-1<br>NAME READY STATUS RESTARTS AGE<br>cluster1-pxc-1 3/3 Running 1 (60s ago) 9m25s |
You can check the wsrep_cluster_status status variable to confirm the node is now part of the Primary Component:
|
1 |
$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'<br>+----------------------+---------+<br>| Variable_name | Value |<br>+----------------------+---------+<br>| wsrep_cluster_status | Primary |<br>+----------------------+---------+ |
When using XtraBackup for SST, the process took approximately 75 minutes per instance to complete. In contrast, using volume snapshots, a node with a 500 GiB database was fully synced to the cluster in just 10 minutes.
You can reuse the same snapshot to repeat the process and add more Joiner nodes if necessary. This allows for efficient scaling without the overhead of creating new backups for each node.
Once the procedure is complete, we need to revert all the changes made to the cluster and to the pod pxc-0:
|
1 |
$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global innodb_flush_log_at_trx_commit=2; select @@innodb_flush_log_at_trx_commit;"'<br>$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "set global wsrep_provider_options="gcache.freeze_purge_at_seqno = -1""'<br>$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;'<br>$ kubectl patch pxc cluster1 --type=merge -p '{"spec":{"unsafeFlags":{"pxcSize": false}}}'<br>perconaxtradbcluster.pxc.percona.com/cluster1 patched<br> |
Finally, we delete the volume snapshot:
|
1 |
$ kubectl delete VolumeSnapshot pxc-pvc-snapshot<br>volumesnapshot.snapshot.storage.k8s.io "pxc-pvc-snapshot" deleted |
In case there is a remaining node in the cluster other than pxc-0 or in case we want to perform a safer procedure without taking into account durability or IST, we can perform the offline method, which has the following steps:
In this scenario, let’s assume that pxc-2 is the only node currently part of the Primary Component, while pxc-0 and pxc-1 require SST to rejoin the cluster.
Similar to the online method, we’ll need to create the sleep-forever file inside the healthy node’s data directory. This file will be present in the restored PVCs, allowing us to pause the Joiner nodes on startup and perform any necessary adjustments before they attempt to join the cluster.
|
1 |
$ kubectl exec -it cluster1-pxc-2 -c pxc -- sh -c 'touch /var/lib/mysql/sleep-forever; sync;' |
We will need to set the spec.unsafeFlags.pxcSize to true to allow the cluster to scale down.
|
1 |
$ kubectl patch pxc cluster1 --type=merge -p '{"spec":{"unsafeFlags":{"pxcSize": true}}}'<br>perconaxtradbcluster.pxc.percona.com/cluster1 patched |
Once done, we scaled down the replicas to 0 for the PXC cluster:
|
1 |
$ kubectl scale --replicas=0 pxc/cluster1<br>perconaxtradbcluster.pxc.percona.com/cluster1 scaled |
We check that all PXC pods are stopped:
|
1 |
$ kubectl get pods<br>NAME READY STATUS RESTARTS AGE<br>cluster1-haproxy-0 1/2 Running 22 (16m ago) 85m<br>cluster1-haproxy-1 1/2 Running 16 (16m ago) 70m<br>cluster1-haproxy-2 1/2 Running 12 (16m ago) 70m<br>percona-xtradb-cluster-operator-549d44ddf7-dqzd5 1/1 Running 0 3d2h |
Then, we take a snapshot from the healthy pod PVC. Since the database is stopped, this will be a database-consistent snapshot:
|
1 |
$ cat snapshot.yaml<br>apiVersion: snapshot.storage.k8s.io/v1<br>kind: VolumeSnapshot<br>metadata:<br> name: pxc-pvc-snapshot<br>spec:<br> volumeSnapshotClassName: snapshot-class<br> source:<br> persistentVolumeClaimName: datadir-cluster1-pxc-2<br><br>$ kubectl apply -f snapshot.yaml<br>volumesnapshot.snapshot.storage.k8s.io/pxc-pvc-snapshot created |
We’ll wait until the snapshot is ready to restore:
|
1 |
$ kubectl get VolumeSnapshot -w<br>NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE<br>pxc-pvc-snapshot false datadir-cluster1-pxc-2 1Ti snapshot-class snapcontent-76c31d76-6845-4238-ae33-a86f3fb0a61b 7s 8s<br>pxc-pvc-snapshot true datadir-cluster1-pxc-2 1Ti snapshot-class snapcontent-76c31d76-6845-4238-ae33-a86f3fb0a61b 3m34s 3m35s |
We check the PVC status:
|
1 |
$ kubectl get pvc<br>NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE<br>datadir-cluster1-pxc-0 Bound pvc-2ef7b661-0f19-47cf-ad32-23d2faa19da1 1Ti RWO standard-rwo <unset> 32m<br>datadir-cluster1-pxc-1 Bound pvc-002bc256-4093-4888-aabb-93d31e9e3d52 1Ti RWO standard-rwo <unset> 32m<br>datadir-cluster1-pxc-2 Bound pvc-b06c01ec-8fd6-4ab3-97da-d4990f3a75df 1Ti RWO standard-rwo <unset> 32m |
We delete the PVCs from the Joiner pods, in this case, pxc-0 and pxc-1:
|
1 |
$ kubectl delete pvc datadir-cluster1-pxc-0 datadir-cluster1-pxc-1<br>persistentvolumeclaim "datadir-cluster1-pxc-0" deleted<br>persistentvolumeclaim "datadir-cluster1-pxc-1" deleted<br> |
We restore the snapshot into pxc-0 and pxc-1 PVCs:
|
1 |
$ cat restore0.yaml<br>apiVersion: v1<br>kind: PersistentVolumeClaim<br>metadata:<br> name: datadir-cluster1-pxc-0<br>spec:<br> accessModes:<br> - ReadWriteOnce<br> resources:<br> requests:<br> storage: 1024Gi<br> dataSource:<br> name: pxc-pvc-snapshot<br> kind: VolumeSnapshot<br> apiGroup: snapshot.storage.k8s.io<br><br>$ kubectl apply -f restore0.yaml<br>persistentvolumeclaim/datadir-cluster1-pxc-0 created<br><br>$ cat restore1.yaml<br>apiVersion: v1<br>kind: PersistentVolumeClaim<br>metadata:<br> name: datadir-cluster1-pxc-1<br>spec:<br> accessModes:<br> - ReadWriteOnce<br> resources:<br> requests:<br> storage: 1024Gi<br> dataSource:<br> name: pxc-pvc-snapshot<br> kind: VolumeSnapshot<br> apiGroup: snapshot.storage.k8s.io<br><br>$ kubectl apply -f restore1.yaml<br>persistentvolumeclaim/datadir-cluster1-pxc-1 created |
We check that the PVCs are created but pending binding:
|
1 |
$ kubectl get pvc<br>NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE<br>datadir-cluster1-pxc-0 Pending standard-rwo <unset> 11s<br>datadir-cluster1-pxc-1 Pending standard-rwo <unset> 8s<br>datadir-cluster1-pxc-2 Bound pvc-b06c01ec-8fd6-4ab3-97da-d4990f3a75df 1Ti RWO standard-rwo <unset> 32m |
We scale up the cluster to start all PXC pods:
|
1 |
$ kubectl scale --replicas=3 pxc/cluster1<br>perconaxtradbcluster.pxc.percona.com/cluster1 scaled |
We wait until all PVCs are bound:
|
1 |
$ kubectl get pvc<br>NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE<br>datadir-cluster1-pxc-0 Bound pvc-63b25d46-983f-49b9-bffa-d10dff3ed861 1Ti RWO standard-rwo <unset> 5m50s<br>datadir-cluster1-pxc-1 Bound pvc-5aeea987-3a25-44d8-8776-723414dc1ee3 1Ti RWO standard-rwo <unset> 5m47s<br>datadir-cluster1-pxc-2 Bound pvc-b06c01ec-8fd6-4ab3-97da-d4990f3a75df 1Ti RWO standard-rwo <unset> 38m |
And wait until all PXC pods state is Running:
|
1 |
$ kubectl get pods<br>NAME READY STATUS RESTARTS AGE<br>cluster1-haproxy-0 1/2 Running 26 (2m55s ago) 98m<br>cluster1-haproxy-1 1/2 Running 21 (109s ago) 83m<br>cluster1-haproxy-2 1/2 Running 16 (2m53s ago) 83m<br>cluster1-pxc-0 3/3 Running 0 7m5s<br>cluster1-pxc-1 3/3 Running 0 3m43s<br>cluster1-pxc-2 3/3 Running 0 60s<br>percona-xtradb-cluster-operator-549d44ddf7-dqzd5 1/1 Running 0 3d3h |
Since we added the sleep-forever file, the pods did not start the MySQL process.
Next, we need to check the grastate.dat to know if it flags it as safe to bootstrap.
|
1 |
$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'cat /var/lib/mysql/grastate.dat'<br># GALERA saved state<br>version: 2.1<br>uuid: b28cd6f8-0f65-11f0-a4bc-aaaa4d86036e<br>seqno: 2422856<br>safe_to_bootstrap: 0 |
In this case, the safe_to_bootstrap value is set to 0, because when scaling down, the pxc-2 pod, the last active member of the Primary Component, was the first to be stopped. Meanwhile, the other pods (pxc-0 and pxc-1) were still connected and requesting SST, which prevents the cluster from marking any node as safe to bootstrap.
We’ll set safe_to_bootstrap to 1 on the pxc-0 node. This ensures that when the pod starts, it will bootstrap the cluster and become the primary member, allowing the cluster to initiate faster without waiting for other nodes. We’re also removing the auto.cnf file, since it contains the same server_uuid as the original pxc-2 node, from which the snapshot was taken.
|
1 |
$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'rm /var/lib/mysql/auto.cnf; sed -i "s/safe_to_bootstrap: 0/safe_to_bootstrap: 1/g" /var/lib/mysql/grastate.dat;' |
On pxc-1, we only remove the auto.cnf file:
|
1 |
$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'rm /var/lib/mysql/auto.cnf;' |
As for pxc-2, we don’t need to modify anything.
We remove the sleep-forever file in all pods to restart them:
|
1 |
$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;'<br>$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;'<br>$ kubectl exec -it cluster1-pxc-2 -c pxc -- sh -c 'rm /var/lib/mysql/sleep-forever;' |
We check the pods until all are in Running state:
|
1 |
$ kubectl get pods<br>NAME READY STATUS RESTARTS AGE<br>cluster1-haproxy-0 2/2 Running 28 (4m9s ago) 105m<br>cluster1-haproxy-1 2/2 Running 22 (6m33s ago) 91m<br>cluster1-haproxy-2 2/2 Running 18 (4m7s ago) 90m<br>cluster1-pxc-0 3/3 Running 1 (2m42s ago) 14m<br>cluster1-pxc-1 3/3 Running 2 (65s ago) 10m<br>cluster1-pxc-2 3/3 Running 2 (84s ago) 8m14s<br>percona-xtradb-cluster-operator-549d44ddf7-dqzd5 1/1 Running 0 3d3h |
We can also verify that each node has successfully joined the cluster by checking the wsrep_cluster_status status variable. A value of “Primary” confirms that the node is part of the Primary Component.
|
1 |
$ kubectl exec -it cluster1-pxc-0 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'<br>+----------------------+---------+<br>| Variable_name | Value |<br>+----------------------+---------+<br>| wsrep_cluster_status | Primary |<br>+----------------------+---------+<br><br>$ kubectl exec -it cluster1-pxc-1 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'<br>+----------------------+---------+<br>| Variable_name | Value |<br>+----------------------+---------+<br>| wsrep_cluster_status | Primary |<br>+----------------------+---------+<br><br>$ kubectl exec -it cluster1-pxc-2 -c pxc -- sh -c 'mysql -uroot -p${MYSQL_ROOT_PASSWORD} -e "show global status like "wsrep_cluster_status";"'<br>+----------------------+---------+<br>| Variable_name | Value |<br>+----------------------+---------+<br>| wsrep_cluster_status | Primary |<br>+----------------------+---------+<br> |
Finally, once all Joiner nodes are up and part of the Primary Component, we can safely delete the VolumeSnapshot to free up resources and avoid unnecessary storage costs.
|
1 |
$ kubectl delete VolumeSnapshot pxc-pvc-snapshot<br>volumesnapshot.snapshot.storage.k8s.io "pxc-pvc-snapshot" deleted |
In just 15 minutes, we successfully joined two nodes, each with a 500 GiB dataset, using the snapshot-based restore procedure. In contrast, performing the same operation with XtraBackup SST would take several hours, due to the time required to create, transfer, and apply a full physical backup.
This approach significantly reduces the time and resources required to scale the Percona Operator for MySQL based on Percona XtraDB Cluster in Kubernetes environments. By leveraging VolumeSnapshots, we eliminate the overhead of full backup and restore cycles, reduce network traffic, and accelerate adding nodes to the cluster. It’s a powerful alternative to traditional SST, especially in cloud-native deployments where time, cost, and efficiency matter.
Resources
RELATED POSTS