There are various ways to backup and restore Percona Server for MongoDB clusters when you run them on Kubernetes. Percona Operator for MongoDB utilizes Percona Backup for MongoDB (PBM) to take physical and logical backups, continuously upload oplogs to object storage, and maintain the backup lifecycle. 

Cloud providers and various storage solutions provide the capability to create volume snapshots. Using snapshots is useful for owners of large data sets with terabytes of data, as this way, you can rely on efficient storage to recover the data faster. In this blog post, we are going to look into how somebody can backup and restore MongoDB clusters managed by the Operator with snapshots. This is a proof of concept that will be fully automated in the Operator’s future releases.

The goal

  1. Take the snapshot
    1. Prepare the cluster for backup with Percona Backup for MongoDB (PBM).
    2. Leverage Kubernetes Volume Snapshots that, on the infrastructure level, trigger cloud volume snapshots (for example, AWS EBS Snapshot).
  2. Recover to the new cluster using these snapshots.

Backup and Restore Your MongoDB Clusters on Kubernetes

Consistency considerations

Snapshots don’t guarantee data consistency. With clusters that receive a lot of writes, you might find that not all data is written to the disk.

This is where Percona Backup for MongoDB, which we use for backups and restores in the Operator, steps in. It provides the interface for making snapshot-based physical backups and restores and ensures data consistency. As a result, database owners benefit from increased performance and reduced downtime and are sure that their data remains consistent.

Set it up

All manifests and other configuration files used in this blog post are stored in the blog-data/mongo-k8s-volume-snapshots git repository.

Prepare Percona Server for MongoDB

Deploy the Percona Operator for MongoDB using your favorite way. I will use regular kubectl and version 1.17.0 (the latest at the time of writing):

RPO considerations

Restoring from snapshot provides quite a poor Recovery Point Objective (RPO), as it depends on the schedule you take the snapshots. To improve RPO, we are going to upload oplogs to the object storage. 

oplogs to the object storage

We have a special flag spec.backup.pitr.oplogOnly  to enable only oplogs to upload to the object storage. The backup section in Custom Resource manifest would look like this:

Read more about backup configuration in our documentation

Apply the custom resource:

Take the snapshots

Volume Snapshot Class

To create snapshots, you need to have a Volume Snapshot Class. I’m running my experiments on GKE, and my snapshot class looks like this:

The driver depends on your cloud or storage provider. Create the snapshot class:

Prepare the cluster for backup

To prepare the cluster for backup, we need to run a short Percona Backup for MongoDB command – pbm – as described in the documentation. To do that, we will exec into one of the PBM containers:

Now let’s run a pbm  command that will prepare the cluster for backup – it opens the backup cursor and stores the metadata on the disk:

The output of this command tells you which node to use for a snapshot. In my case, it is demo-cluster1-rs0-1.demo-cluster1-rs0.default.svc.cluster.local:27017

Take the snapshots

In Kubernetes you can create a Persistent Volume snapshot through a VolumeSnapshot resource. It should reference both Persistent Volume Claim (PVC) and a Volume Snapshot Class. For example (see 02-snapshot.yaml):

You will create a single snapshot per replica set. Apply the manifest to create the snapshot:

Check if snapshots were created:

Close the cursor

Now we need to go back to the PBM container and finish the backup, which closes any open backup cursors:

Restore

There are a few caveats you need to know about the restoration:

  1. It is not possible to do an in-place restore with snapshots. It can be:
    1. A completely new cluster
    2. The existing cluster, but you will need to pause it and delete the existing volumes
  2. You must also backup the Secrets (TLS keys and users). This is not different from any other way of backing up in the Operator. We recommend using some Kubernetes Secret storage, for example, Vault. 

Let’s look at the use case where you want to recover the existing cluster. The way it can be done is the following:

1. Delete the cluster

2. Delete Persistent Volume Claims (PVC) that belong to the cluster

3. Create PVCs from the snapshot (see 03-volumes-from-snapshot.yaml). You need to create PVCs with the same name as the Operator would. In our case, we use the same names as we had before:

Now we can start the cluster:

The cluster is now restored and has the data that was captured when you took the snapshots.

Point-in-time recovery

As we explained above, the Recovery Point Objective (RPO) can be improved by storing oplogs in the object storage separately. 

To recover the data from oplogs, you will need to exec into the backup-agent container again:

Check if oplog chunks are stored:

You can recover by using the following command (get the timestamp from pbm status  output, but adjust to your timezone):

Now you will have the latest data.

Conclusion

Even though this snapshot-based backup and restore solution is currently a proof of concept, it still demonstrates the Percona Operator for MongoDB‘s flexibility and adaptability in managing large datasets, especially in scenarios where traditional logical or physical backups might not be ideal.

While the process may involve a few manual steps, it underscores the Operator’s commitment to providing comprehensive data protection options. Future releases will focus on streamlining and automating this process further, making snapshot-based backup and recovery even more seamless and user-friendly.

In the meantime, for those dealing with massive datasets where efficient storage and rapid recovery are paramount, this PoC offers a valuable tool for safeguarding critical data.

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments