Unlocking the Power of Cloud Snapshots: Backup and Restore Your MongoDB Clusters on Kubernetes

There are various ways to backup and restore Percona Server for MongoDB clusters when you run them on Kubernetes. Percona Operator for MongoDB utilizes Percona Backup for MongoDB (PBM) to take physical and logical backups, continuously upload oplogs to object storage, and maintain the backup lifecycle.

Cloud providers and various storage solutions provide the capability to create volume snapshots. Using snapshots is useful for owners of large data sets with terabytes of data, as this way, you can rely on efficient storage to recover the data faster. In this blog post, we are going to look into how somebody can backup and restore MongoDB clusters managed by the Operator with snapshots. This is a proof of concept that will be fully automated in the Operator’s future releases.

The goal

Take the snapshot
1. Prepare the cluster for backup with Percona Backup for MongoDB (PBM).
2. Leverage Kubernetes Volume Snapshots that, on the infrastructure level, trigger cloud volume snapshots (for example, AWS EBS Snapshot).
Recover to the new cluster using these snapshots.

Consistency considerations

Snapshots don’t guarantee data consistency. With clusters that receive a lot of writes, you might find that not all data is written to the disk.

This is where Percona Backup for MongoDB, which we use for backups and restores in the Operator, steps in. It provides the interface for making snapshot-based physical backups and restores and ensures data consistency. As a result, database owners benefit from increased performance and reduced downtime and are sure that their data remains consistent.

Set it up

All manifests and other configuration files used in this blog post are stored in the blog-data/mongo-k8s-volume-snapshots git repository.

Prepare Percona Server for MongoDB

Deploy the Percona Operator for MongoDB using your favorite way. I will use regular kubectl and version 1.17.0 (the latest at the time of writing):

kubectl apply -f https://raw.githubusercontent.com/percona/percona-server-mongodb-operator/refs/tags/v1.17.0/deploy/bundle.yaml

1	kubectl apply -f https://raw.githubusercontent.com/percona/percona-server-mongodb-operator/refs/tags/v1.17.0/deploy/bundle.yaml

RPO considerations

Restoring from snapshot provides quite a poor Recovery Point Objective (RPO), as it depends on the schedule you take the snapshots. To improve RPO, we are going to upload oplogs to the object storage.

oplogs to the object storage

We have a special flag spec.backup.pitr.oplogOnly to enable only oplogs to upload to the object storage. The backup section in Custom Resource manifest would look like this:

  backup:
    enabled: true
    image: percona/percona-backup-mongodb:2.5.0
    pitr:
      enabled: true
      oplogOnly: true
      compressionType: gzip
      compressionLevel: 6
    storages:
      sp-test:
        type: s3
        s3:
          bucket: BUCKET
          credentialsSecret: SECRET_WITH_KEYS
          endpointUrl: OBJECT_STORAGE_URL

backup:

enabled: true

image: percona/percona-backup-mongodb:2.5.0

pitr:

enabled: true

oplogOnly: true

compressionType: gzip

compressionLevel: 6

storages:

sp-test:

type: s3

s3:

bucket: BUCKET

credentialsSecret: SECRET_WITH_KEYS

endpointUrl: OBJECT_STORAGE_URL

Read more about backup configuration in our documentation.

Apply the custom resource:

kubectl apply -f 00-cr.yaml

1	kubectl apply -f 00-cr.yaml

Take the snapshots

Volume Snapshot Class

To create snapshots, you need to have a Volume Snapshot Class. I’m running my experiments on GKE, and my snapshot class looks like this:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: gke-snapshot-class
driver: pd.csi.storage.gke.io
deletionPolicy: Delete

apiVersion: snapshot.storage.k8s.io/v1

kind: VolumeSnapshotClass

metadata:

name: gke-snapshot-class

driver: pd.csi.storage.gke.io

deletionPolicy: Delete

The driver depends on your cloud or storage provider. Create the snapshot class:

kubectl apply -f 02-snapshot-class.yaml

1	kubectl apply -f 02-snapshot-class.yaml

Prepare the cluster for backup

To prepare the cluster for backup, we need to run a short Percona Backup for MongoDB command – pbm – as described in the documentation. To do that, we will exec into one of the PBM containers:

% kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
demo-cluster1-rs0-0                                2/2     Running   0          117m
demo-cluster1-rs0-1                                2/2     Running   0          116m
demo-cluster1-rs0-2                                2/2     Running   0          115m
percona-server-mongodb-operator-8664c5b8fc-4mdkv   1/1     Running   0          155m

% kubectl exec -ti demo-cluster1-rs0-0 -c backup-agent bash

% kubectl get pods

NAME READY STATUS RESTARTS AGE

demo-cluster1-rs0-0 2/2 Running 0 117m

demo-cluster1-rs0-1 2/2 Running 0 116m

demo-cluster1-rs0-2 2/2 Running 0 115m

percona-server-mongodb-operator-8664c5b8fc-4mdkv 1/1 Running 0 155m

% kubectl exec -ti demo-cluster1-rs0-0 -c backup-agent bash

Now let’s run a pbm command that will prepare the cluster for backup – it opens the backup cursor and stores the metadata on the disk:

$ pbm backup -t external
Starting backup '2024-09-25T10:54:24Z'......Ready to copy data from:
- demo-cluster1-rs0-1.demo-cluster1-rs0.default.svc.cluster.local:27017
After the copy is done, run: pbm backup-finish 2024-09-25T10:54:24Z

$ pbm backup -t external

Starting backup '2024-09-25T10:54:24Z'......Ready to copy data from:

- demo-cluster1-rs0-1.demo-cluster1-rs0.default.svc.cluster.local:27017

After the copy is done, run: pbm backup-finish 2024-09-25T10:54:24Z

The output of this command tells you which node to use for a snapshot. In my case, it is demo-cluster1-rs0-1.demo-cluster1-rs0.default.svc.cluster.local:27017.

Take the snapshots

In Kubernetes you can create a Persistent Volume snapshot through a VolumeSnapshot resource. It should reference both Persistent Volume Claim (PVC) and a Volume Snapshot Class. For example (see 02-snapshot.yaml):

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: snapshot-demo-cluster1-rs0-1
spec:
  volumeSnapshotClassName: gke-snapshot-class
  source:
    persistentVolumeClaimName: mongod-data-demo-cluster1-rs0-1

apiVersion: snapshot.storage.k8s.io/v1

kind: VolumeSnapshot

metadata:

name: snapshot-demo-cluster1-rs0-1

spec:

volumeSnapshotClassName: gke-snapshot-class

source:

persistentVolumeClaimName: mongod-data-demo-cluster1-rs0-1

You will create a single snapshot per replica set. Apply the manifest to create the snapshot:

kubectl apply -f 02-snapshot.yaml

1	kubectl apply -f 02-snapshot.yaml

Check if snapshots were created:

% kubectl get volumesnapshots
NAME                           READYTOUSE   SOURCEPVC                         SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS        SNAPSHOTCONTENT                                    CREATIONTIME   AGE
snapshot-demo-cluster1-rs0-0   true         mongod-data-demo-cluster1-rs0-1                           3Gi           gke-snapshot-class   snapcontent-34a398ed-4408-454f-bcad-8b2ba8f22a18   42s            43s

% kubectl get volumesnapshots

NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE

snapshot-demo-cluster1-rs0-0 true mongod-data-demo-cluster1-rs0-1 3Gi gke-snapshot-class snapcontent-34a398ed-4408-454f-bcad-8b2ba8f22a18 42s 43s

Close the cursor

Now we need to go back to the PBM container and finish the backup, which closes any open backup cursors:

$ pbm backup-finish 2024-09-25T10:54:24Z
Command sent. Check `pbm describe-backup 2024-09-25T10:54:24Z` for the result.

1 2	$ pbm backup-finish 2024-09-25T10:54:24Z Command sent. Check `pbm describe-backup 2024-09-25T10:54:24Z` for the result.

Restore

There are a few caveats you need to know about the restoration:

It is not possible to do an in-place restore with snapshots. It can be:
1. A completely new cluster
2. The existing cluster, but you will need to pause it and delete the existing volumes
You must also backup the Secrets (TLS keys and users). This is not different from any other way of backing up in the Operator. We recommend using some Kubernetes Secret storage, for example, Vault.

Let’s look at the use case where you want to recover the existing cluster. The way it can be done is the following:

1. Delete the cluster

kubectl delete -f 00-cr.yaml

1	kubectl delete -f 00-cr.yaml

2. Delete Persistent Volume Claims (PVC) that belong to the cluster

kubectl delete pvc -l app.kubernetes.io/instance=demo-cluster1

1	kubectl delete pvc -l app.kubernetes.io/instance=demo-cluster1

3. Create PVCs from the snapshot (see 03-volumes-from-snapshot.yaml). You need to create PVCs with the same name as the Operator would. In our case, we use the same names as we had before:

% kubectl apply -f 03-volumes-from-snapshots.yaml 
persistentvolumeclaim/mongod-data-demo-cluster1-rs0-0 created
persistentvolumeclaim/mongod-data-demo-cluster1-rs0-1 created
persistentvolumeclaim/mongod-data-demo-cluster1-rs0-2 created

% kubectl apply -f 03-volumes-from-snapshots.yaml

persistentvolumeclaim/mongod-data-demo-cluster1-rs0-0 created

persistentvolumeclaim/mongod-data-demo-cluster1-rs0-1 created

persistentvolumeclaim/mongod-data-demo-cluster1-rs0-2 created

Now we can start the cluster:

kubectl apply -f 00-cr.yaml

1	kubectl apply -f 00-cr.yaml

The cluster is now restored and has the data that was captured when you took the snapshots.

Point-in-time recovery

As we explained above, the Recovery Point Objective (RPO) can be improved by storing oplogs in the object storage separately.

To recover the data from oplogs, you will need to exec into the backup-agent container again:

% kubectl exec -ti demo-cluster1-rs0-0 -c backup-agent bash

1	% kubectl exec -ti demo-cluster1-rs0-0 -c backup-agent bash

Check if oplog chunks are stored:

$ pbm status
...


  PITR chunks [131.70KB]:
    2024-09-27T08:04:05Z - 2024-09-27T08:07:05Z (no base snapshot)

$ pbm status

...

PITR chunks [131.70KB]:

2024-09-27T08:04:05Z - 2024-09-27T08:07:05Z (no base snapshot)

You can recover by using the following command (get the timestamp from pbm status output, but adjust to your timezone):

pbm oplog-replay --start 2024-09-27T11:04:05 --end 2024-09-27T11:07:05

1	pbm oplog-replay --start 2024-09-27T11:04:05 --end 2024-09-27T11:07:05

Now you will have the latest data.

Conclusion

Even though this snapshot-based backup and restore solution is currently a proof of concept, it still demonstrates the Percona Operator for MongoDB‘s flexibility and adaptability in managing large datasets, especially in scenarios where traditional logical or physical backups might not be ideal.

While the process may involve a few manual steps, it underscores the Operator’s commitment to providing comprehensive data protection options. Future releases will focus on streamlining and automating this process further, making snapshot-based backup and recovery even more seamless and user-friendly.

In the meantime, for those dealing with massive datasets where efficient storage and rapid recovery are paramount, this PoC offers a valuable tool for safeguarding critical data.

MySQL 5.7
Support

Compare Percona to Leading Database Solutions

Software
Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Unlocking the Power of Cloud Snapshots: Backup and Restore Your MongoDB Clusters on Kubernetes

The goal

Consistency considerations

Set it up

Prepare Percona Server for MongoDB

RPO considerations

Take the snapshots

Restore

Point-in-time recovery

Conclusion

Related Blog Articles

RECOMMENDED ARTICLES

Unlocking Secure Connections: SSL/TLS Support in Percona Toolkit

Cloud-Native MySQL High Availability: Understanding Virtually SYNC and ASYNC Replication

Expanding Our Reach: Percona Server for MongoDB Now Officially Supports Rocky Linux 8 and 9!

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Unlocking the Power of Cloud Snapshots: Backup and Restore Your MongoDB Clusters on Kubernetes

The goal

Consistency considerations

Set it up

Prepare Percona Server for MongoDB

RPO considerations

Take the snapshots

Restore

Point-in-time recovery

Conclusion

About the Author

Share This Post!

Stay up to date with the Percona Blog

Related Blog Articles

RECOMMENDED ARTICLES

Unlocking Secure Connections: SSL/TLS Support in Percona Toolkit

Cloud-Native MySQL High Availability: Understanding Virtually SYNC and ASYNC Replication

Expanding Our Reach: Percona Server for MongoDB Now Officially Supports Rocky Linux 8 and 9!

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7
Support

Software
Downloads