Backup and Restore of MongoDB Deployment on Kubernetes

backup MongoDB KubernetesEvery database environment should require a robust backup strategy as its a fundamental requirement to running a successful and robust database. No matter the size of the database, the function of the application, or how technologically advanced a company is, backups are now a requirement for everyone.

As a Solutions Engineers, we speak to database users from all types of companies, ranging from startups to the most complex database environments being used today. Interestingly enough, while talking about backups, we hear several concerning statements such as, “We never used backups in the past, so we don’t need them in this new environment”, “cloud services never fail” (hint – they do), and “this cluster is too big to fail.” It’s never an issue until something happens to your environment and you are unable to recover data and put your entire company at risk.

The adoption of the databases on Kubernetes (K8s) and other cloud-native platforms is definitely on the rise. There are multiple tools and approaches to deploying MongoDB on K8s. Assuming that you already have your MongoDB up and running on K8s, how do you implement a backup strategy? It could be a healthy amount of error-prone, manual work. You’d need to configure K8s Jobs from scratch, configure mongodump and all its parameters, setup a PersistentVolume, accompanying Claims. Additionally, you would need to take care about your PersistentVolume durability. What about streaming backup to some remote storage or, dealing with the backup of a complex, sharded cluster?

There are free tools that help you to streamline the entire MongoDB deployment process on K8s, including backups. The critical nature of backups doesn’t have to cost you any money to implement neither. Percona offers the following free enterprise-grade solutions for MongoDB:

Let’s explore how to take and restore backups using PSMDB Operator deployed on AWS Elastic Kubernetes Service (AWS EKS).

Architecture

When deployed outside of K8s, PBM requires a running pbm-agent process on each node (next to the mongod instance) in the cluster/replica set. PBM uses it’s own ‘control collections’ in the admin database to store config and relay commands from the user (who uses the “pbm” CLI) to the pbm-agent processes. PBM’s backups are ‘logical’ style, the same as mongodump. This means the data is copied using a database driver connection rather than copying the underlying data files on disk.

When you deploy PSMDB cluster using PSMDB Operator, pbm-agent is automatically deployed in each pod as a sidecar container next to the mongod container. PSMDB Operator writes commands to the PBM control collections directly (in a way, replacing what outside of the K8s deployments pbm CLI does) controlling the entire backup process. PSMDB Operator supports two types of backups: on-demand and scheduled backups, both being controlled entirely by PSMDB Operator.

Backups taken by the PSMDB Operator can be stored in any S3 compatible storage, be it AWS S3, Google Cloud Storage, or locally deployed cloud-native MinIO storage. The backup contains a metadata file, a dump of all collections from your database, and an oplog dump covering the timespan of the backup.

Configuration

We have a running PSMDB replica set deployed with PSMDB Operator in AWS EKS K8s cluster. Please note that we use PSMDB Operator v1.4.0 (the newest release at the moment of writing of this article). To deploy it, we followed these instructions and used all default settings. 

Check if your PSMDB cluster is running correctly:

Access secrets: deploy/backup-s3.yaml

Let’s start by adding our AWS access and secret access keys. The operator will use these keys to access your S3 bucket (all cloud providers have different methods of distributing these keys). The keys that you put into K8s secrets must be base64 encoded. You can encode your keys by running echo -n YOUR_KEY’ | base64  in bash CLI:

Create a secret in K8s cluster with the following command:

S3 bucket: deploy/cr.yaml

Next, we need to edit storagessection in the deploy/cr.yaml file so we can send our backups to an S3 bucket.

Apply changes:

We are ready to take backups now!

On-Demand Backup

On-demand backup can be taken at any point in time. Pbm-control tool is distributed together with the operator code. Based on the requested details it will use pre-configured storage to store the on-demand backup.

Let’s start with editing deploy/backup/backup.yaml to ensure we are all set. psmdbCluster should match our cluster name and storageName the storage defined in previous steps.

Run backup by the following command:

If we set up everything correctly, our backup should be uploaded to the S3 bucket. To check its status run:

If you look at your S3 AWS console, you should see backup files there:

Automated Backup

The second type of backup is a scheduled backup. The backup:tasks section of the deploy/cr.yaml file can be edited to schedule fully automatically executed backups. We can configure backups using the UNIX cron string format. Let’s say we want to take backups every day at midnight. Let’s edit the deploy/cr.yaml file:

Apply changes:

Restoring Backups

To restore a backup, we need to find the backup name. We can obtain a list of all backups by using the following command:

Backup restoration configuration is in deploy/backup/restore.yaml. Let’s ensure there’s the appropriate backup name specified:

To restore the backup, we execute the following command:

To check the backup status use:

 

Percona Kubernetes Operator for MongoDB utilizing Percona Backup for MongoDB is a Kubernetes-idiomatic way to run, backup, and restore a MongoDB replica set. If you’d like to learn more about these tools, check out our documentation.


Learn more about the history of Oracle, the growth of MongoDB, and what really qualifies software as open source. If you are a DBA, or an executive looking to adopt or renew with MongoDB, this is a must-read!

Download “Is MongoDB the New Oracle?”

Share this post

Leave a Reply