Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Restoring a Snapshot of a Sharded MongoDB Cluster to a Kubernetes-Based MongoDB Environment

May 9, 2023

Author

Share this Post:

Many MongoDB clusters use storage-level snapshots to provide fast and reliable backups. In this blog post, you’ll learn how to restore such a snapshot from a traditional VM-based sharded MongoDB cluster to a freshly deployed Percona Operator for MongoDB cluster on Kubernetes.

Background story

I recently worked with a company running a large four-shard MongoDB Enterprise Server database cluster on VMs on premises that decided to migrate to the Google Cloud Platform. After careful consideration and pros-and-con evaluation, the customer chose to go with Percona Distribution for MongoDB run on Kubernetes – specifically using Percona Operator for MongoDB. There are four main factors that contributed to this decision:

1. The total cost of ownership: K8s resources are significantly cheaper than running a popular DBaaS on the same cloud

1. Ease of deployments: Percona Operator for MongoDB makes day-1 and day-2 operations a breeze

1. Preference for open-source tools and cloud-native solutions: applications are already migrated to Kubernetes – running the entire solution that way simplifies operations.

1. This approach frees the deployment from cloud (and any other vendor) lock-in.

To start realistic compatibility and performance testing, we needed to restore the filesystem snapshot backup that was stored on NetApp volume in GCP into a Google Kubernetes Engine (GKE) cluster running Percona Operator for MongoDB. It’s not a trivial task due to the requirements for restoring a sharded cluster backup into a new cluster. I will show you why and how we did it.

Requirements

Let’s look at the overall requirements for the cluster where I want to restore the snapshot to:

1. The same major (ideally minor, too) MongoDB version

1. The same number of shards

1. The same name of all shards

1. The same name of the Config RS

1. The same hostnames of all nodes (this is how Config RS connects to the specific shard)

1. The same MongoDB configuration with regards to how files are stored on the disk

This seems to be straightforward; however, there are a couple of special considerations to make when it comes to the environment controlled by Percona Operator for MongoDB, specifically:

- You can’t control the name of Config RS

- You can’t change the hostnames and those will certainly be different within Kubernetes

- Percona Operator needs specific users to be present in your cluster in order to control it – and those won’t be present in your backup.

All of the above makes it impossible to simply adjust the Operator configuration and copy all the files from your snapshot backup in the specific volumes.

Plan

The high-level plan consists of the following steps:

1. Deploy the cluster on K8s

1. Restore snapshot files
  1. 1. Pause the cluster on K8s
  1. 1. Mount storage volumes to a separate VM
  1. 1. Copy snapshot files to respective volumes

1. Prepare each replica set in the standalone mode (hostnames, sharding configuration, users)

1. Start the cluster on K8s and initialize each replica set

The following approach is applicable regardless of what “flavor” of MongoDB your source environment uses. This can be MongoDB Enterprise Server, MongoDB Community Edition, or Percona Server for MongoDB.

Step one: Deploy the cluster on K8s

In order to deploy Percona Server for MongoDB (PSMDB) on the K8s cluster, follow the documentation. Before you execute the last step (deploying cr.yaml), however, make sure to adjust the following elements of the configuration. This will make the cluster “fit” the one that we took the backup from.

1. Set spec.image to a specific image version. It needs to be the version that matches the major version of the source cluster, for example:
  
  YAML
  
  spec: image: percona/percona-server-mongodb:5.0.11-10
  
  1
  2
  
  spec:
  image: percona/percona-server-mongodb:5.0.11-10

1. Create as many shard replica sets as in the source cluster. Copy the default replica set definition (entire section) in spec.replsets[]. For example, if your source cluster has two shards with the names “shard1” and “shard2” (names must match the ones from the source cluster):
  
  YAML
  
  replsets: - name: shard1 size: 3 [...] - name: shard2 size: 3 [...]
  
  1
  2
  3
  4
  5
  6
  7
  
  replsets:
  - name: shard1
     size: 3
     [...]
  - name: shard2
     size: 3
     [...]
  
  Unfortunately, you can’t set the name of Config RS. We’ll deal with that later.

If your source cluster WiredTiger configuration is different than default, adjust the mongod configuration of each replica set to match it. Specifically, two MongoDB configuration items are critical: storage.directoryPerDB and storage.wiredTiger.engineConfig.directoryForIndexes. You can do it in the following way:

replsets:
  - name: shard1
    size: 3
    configuration: |
      storage:
        directoryPerDB: true
      wiredTiger:
        engineConfig:
          directoryForIndexes: true

replsets:

- name: shard1

size: 3

configuration: |

storage:

directoryPerDB: true

wiredTiger:

engineConfig:

directoryForIndexes: true

1. Save changes and start the cluster using the modified cr.yaml file

Step two: Restore snapshot files

Your cluster should be started at this point and it is important that all PersistentVolumes required for it be created. You can check your cluster state with kubectl get psmdb, see all deployed pods with kubectl get pods, or check PVs with kubectl get pv. In this step, you need to mount volumes of all your database nodes to an independent VM as we’ll make changes in MongoDB standalone mode. You need those VMs temporarily to perform the required operations.

1. Pause the PSMDB cluster on K8s by setting spec.pause: true in your cr.yaml file. That will cause the entire cluster to stop and make it possible to mount volumes elsewhere.

1. Check zones where your PerstistentVolumes were created. You can use kubectl describe pv pv_name or find it in the cloud console.

Create a VM in each zone, then mount volumes corresponding to PersistentVolumes to a VM in its zone. It’s critical that you can easily identify the volume’s purpose (which replica set in your MongoDB cluster it refers to and which node – primary or secondary). As an example, this is how you can mount a volume to your VM using gcloud API:

# attach disk to an instance
gcloud compute instances attach-disk vm_name 
--disk disk_name 
--zone=us-central1-a

# check volume on vm
sudo lsblk

# create a mountpoint
sudo mkdir -p /dir_name (e.g. rs0-primary)

# mount the volume
sudo mount -o discard,defaults /dev/sdb /dir_name

# attach disk to an instance

gcloud compute instances attach-disk vm_name

--disk disk_name

--zone=us-central1-a

# check volume on vm

sudo lsblk

# create a mountpoint

sudo mkdir -p /dir_name (e.g. rs0-primary)

# mount the volume

sudo mount -o discard,defaults /dev/sdb /dir_name

1. Delete all files from each volume mounted to your VMs – primaries, and secondaries.

1. Copy files from the snapshot to the respective volume, directly to the main directory of the mounted volume. Do it just for volumes related to primary nodes (leave secondaries empty).

1. Install Percona Server for MongoDB on each of your VMs. Check installation instructions and install the same version as your Mongo cluster on K8s. Don’t start the server yet!

Step three: Prepare each replica set in the standalone mode

Now, you need to start PSMDB for each volume with data (each primary of each replica set, including Config RS) separately. We will then log in to mongo shell and edit the cluster configuration manually so that when we bring volumes with data back to Kubernetes, it can start successfully.

Execute the steps below for each replica set, including Config RS:

1. On the VM, when your Config RS primary volume is mounted, edit /etc/mongod.conf. Specifically, adjust storage.dbPath (to the directory where you mounted the volume), storage.directoryPerDB, and storage.wiredTiger.engineConfig.directoryForIndexes.

1. Start PSMDB with sudo mongod –config /etc/mongod.conf

1. Connect to PSMDB with mongo command (authentication is not required). Once you successfully log in, we can start making changes to the cluster configuration.

1. Delete the local replica set configuration with the following commands:
  
  Shell
  
  use local; db.dropDatabase();
  
  1
  2
  
  use local;
  db.dropDatabase();

[This step applies only to Config RS]
Replace shards configuration. Execute the following command for each shard separately (you can list them all with db.shards.find() ). Replace in the following string: shard_name, cluster_name, namespace_name with values specific to your cluster:

use config;

db.shards.updateOne(
  { "_id" : "shard_name" },
  { $set :
    { "host" : "shard_name/cluster_name-shard_name-0.cluster_name-shard_name.namespace_name.svc.cluster.local:27017,cluster_name-shard_name-1.cluster_name-shard_name.namespace_name.svc.cluster.local:27017,cluster_name-shard_name-2.cluster_name-shard_name.namespace_name.svc.cluster.local:27017"}
});

use config;

db.shards.updateOne(

{ "_id" : "shard_name" },

{ $set :

{ "host" : "shard_name/cluster_name-shard_name-0.cluster_name-shard_name.namespace_name.svc.cluster.local:27017,cluster_name-shard_name-1.cluster_name-shard_name.namespace_name.svc.cluster.local:27017,cluster_name-shard_name-2.cluster_name-shard_name.namespace_name.svc.cluster.local:27017"}

});

1. [This step applies to shard RS only] Clear shard metadata
  
  Shell
  
  use admin db.system.version.deleteOne( { _id: "minOpTimeRecovery" } )
  
  1
  2
  
  use admin
  db.system.version.deleteOne( { _id: "minOpTimeRecovery" } )

[This step applies to shard RS only] Replace Config RS connection string in shardIdentity with the following command. Replace cluster_name, namespace_name with your values.

use admin

db.system.version.updateOne(
 { "_id" : "shardIdentity" },
 { $set :
   { "configsvrConnectionString" : "cfg/cluster_name-cfg-0.cluster_name-cfg.namespace_name.svc.cluster.local:27017,cluster_name-cfg-1.cluster_name-cfg.namespace_name.svc.cluster.local:27017,cluster_name-cfg-2.cluster_name-cfg.namespace_name.svc.cluster.local:27017"}
 });

use admin

db.system.version.updateOne(

{ "_id" : "shardIdentity" },

{ $set :

{ "configsvrConnectionString" : "cfg/cluster_name-cfg-0.cluster_name-cfg.namespace_name.svc.cluster.local:27017,cluster_name-cfg-1.cluster_name-cfg.namespace_name.svc.cluster.local:27017,cluster_name-cfg-2.cluster_name-cfg.namespace_name.svc.cluster.local:27017"}

});

Percona MongoDB Operator requires system-level MongoDB users to be present in a database. We must create those users for the operator, as our backup doesn’t have them. If you haven’t changed default secrets.yaml during the deployment of the cluster, you can find default passwords either in the file or here in the documentation. To create required users and roles, use the Mongo commands below:

use admin;
//drop user in case they already exist in your backup
db.dropUser("userAdmin");
db.dropUser("clusterAdmin");
db.dropUser("clusterMonitor");
db.dropUser("backup");
db.dropUser("databaseAdmin");

//create missing role
db.createRole({"role" : "explainRole",
               "privileges" : [{"resource": {"db" : "","collection" : "system.profile"},
                "actions" :   ["collStats","dbHash","dbStats","find","listCollections","listIndexes"]}],
                roles: []});

//create system users
db.createUser({ user: "userAdmin",
                pwd: "userAdmin123456",
                roles: [ { role: "userAdminAnyDatabase", db: "admin" }]
              });

db.createUser({ user: "clusterAdmin",
               pwd: "clusterAdmin123456",
               roles: [ { role: "clusterAdmin", db: "admin" }]
              });

db.createUser({ user: "clusterMonitor",
                pwd: "clusterMonitor123456",
                roles: [{ role: "explainRole", db: "admin" },
                        { role: "read", db: "local" },
                        { role: "clusterMonitor", db: "admin" }]
              });

db.createUser({ user: "databaseAdmin",
                pwd: "databaseAdmin123456",
                roles: [{ role: "readWriteAnyDatabase", db: "admin" },
                        { role: "readAnyDatabase", db: "admin" },
                        { role: "dbAdminAnyDatabase", db: "admin" },
                        { role: "backup", db: "admin" },
                        { role: "restore", db: "admin" },
                        { role: "clusterMonitor", db: "admin" },]
              });

db.createUser({ user: "backup",
                pwd: "backup123456",
                roles: [{ role: "backup", db: "admin" },
                        { role: "restore", db: "admin" },
                        { role: "clusterMonitor", db: "admin" }]
              });

use admin;

//drop user in case they already exist in your backup

db.dropUser("userAdmin");

db.dropUser("clusterAdmin");

db.dropUser("clusterMonitor");

db.dropUser("backup");

db.dropUser("databaseAdmin");

//create missing role

db.createRole({"role" : "explainRole",

"privileges" : [{"resource": {"db" : "","collection" : "system.profile"},

"actions" : ["collStats","dbHash","dbStats","find","listCollections","listIndexes"]}],

roles: []});

//create system users

db.createUser({ user: "userAdmin",

pwd: "userAdmin123456",

roles: [ { role: "userAdminAnyDatabase", db: "admin" }]

});

db.createUser({ user: "clusterAdmin",

pwd: "clusterAdmin123456",

roles: [ { role: "clusterAdmin", db: "admin" }]

});

db.createUser({ user: "clusterMonitor",

pwd: "clusterMonitor123456",

roles: [{ role: "explainRole", db: "admin" },

{ role: "read", db: "local" },

{ role: "clusterMonitor", db: "admin" }]

});

db.createUser({ user: "databaseAdmin",

pwd: "databaseAdmin123456",

roles: [{ role: "readWriteAnyDatabase", db: "admin" },

{ role: "readAnyDatabase", db: "admin" },

{ role: "dbAdminAnyDatabase", db: "admin" },

{ role: "backup", db: "admin" },

{ role: "restore", db: "admin" },

{ role: "clusterMonitor", db: "admin" },]

});

db.createUser({ user: "backup",

pwd: "backup123456",

roles: [{ role: "backup", db: "admin" },

{ role: "restore", db: "admin" },

{ role: "clusterMonitor", db: "admin" }]

});

1. Shut down the server now using db.shutdownServer();

1. The operator runs mongod process with “mongodb” user (not as a standard “mongod”!). Therefore, we need to fix permissions before we unmount the volume. Add the following line to your /etc/passwd file:
  
  Shell
  
  mongodb:x:1001:0:Default Application User:/home/mongodb:/sbin/nologin
  
  1
  
  mongodb:x:1001:0:Default Application User:/home/mongodb:/sbin/nologin

1. Set permissions:
  
  Shell
  
  cd /your_mountpoint sudo chown -R mongodb:1001 ./
  
  1
  2
  
  cd /your_mountpoint
  sudo chown -R mongodb:1001 ./

1. Unmount the volume and detach it from the VM. As an example, this is how you can unmount a volume from your VM using gcloud API:
  
  Shell
  
  sudo umount /dir_name gcloud compute instances detach-disk vm_name --disk disk_name --zone=us-central1-a
  
  1
  2
  3
  4
  5
  
  sudo umount /dir_name
  
  gcloud compute instances detach-disk vm_name
  --disk disk_name
  --zone=us-central1-a

Step four: Start the cluster on K8s and initialize each replica set

You’re ready to get back to K8s. Start the cluster (it will start with previously used volumes). It will be in a pending state because we broke (intentionally) replica sets. Only one pod per Replica Set will start. You must initialize replica sets one by one.

To unpause the PSMDB cluster set spec.pause: false in your cr.yaml file and apply it with kubectl. Then, repeat the steps below for all replica sets, starting with Config RS.

1. Login to the shell of “pod 0” of the replica set with kubectl exec –stdin –tty cluster_name-cfg-0 — /bin/bash (for Config RS)

1. Login to PSMDB with mongo command

1. Authenticate as clusterAdmin using the following command (assuming you used default passwords):
  
  Shell
  
  use admin; db.auth("clusterAdmin", "clusterAdmin123456");
  
  1
  2
  
  use admin;
  db.auth("clusterAdmin", "clusterAdmin123456");

Initialize the replica set as below. Replace cluster_name and namespace_name and shard_name with your own values.

# version for Config RS

rs.initiate(
   {
     _id: "cfg",
      members: [
         { _id: 0, host : "cluster_name-cfg-0.cluster_name-cfg.namespace_name.svc.cluster.local:27017" },
         { _id: 1, host : "cluster_name-cfg-1.cluster_name-cfg.namespace_name.svc.cluster.local:27017" },
         { _id: 2, host : "cluster_name-cfg-2.cluster_name-cfg.namespace_name.svc.cluster.local:27017" },
      ]
   }
);

# version for other RS (shards)
rs.initiate(
   {
     _id: "shard_name",
      members: [
         { _id: 0, host : "cluster_name-shard_name-0.cluster_name-shard_name.namespace_name.svc.cluster.local:27017" },
         { _id: 1, host : "cluster_name-shard_name-1.cluster_name-shard_name.namespace_name.svc.cluster.local:27017" },
         { _id: 2, host : "cluster_name-shard_name-2.cluster_name-shard_name.namespace_name.svc.cluster.local:27017" },
      ]
   }
);

# version for Config RS

rs.initiate(

{

_id: "cfg",

members: [

{ _id: 0, host : "cluster_name-cfg-0.cluster_name-cfg.namespace_name.svc.cluster.local:27017" },

{ _id: 1, host : "cluster_name-cfg-1.cluster_name-cfg.namespace_name.svc.cluster.local:27017" },

{ _id: 2, host : "cluster_name-cfg-2.cluster_name-cfg.namespace_name.svc.cluster.local:27017" },

]

}

);

# version for other RS (shards)

rs.initiate(

{

_id: "shard_name",

members: [

{ _id: 0, host : "cluster_name-shard_name-0.cluster_name-shard_name.namespace_name.svc.cluster.local:27017" },

{ _id: 1, host : "cluster_name-shard_name-1.cluster_name-shard_name.namespace_name.svc.cluster.local:27017" },

{ _id: 2, host : "cluster_name-shard_name-2.cluster_name-shard_name.namespace_name.svc.cluster.local:27017" },

]

}

);

1. After a few seconds, your node will become PRIMARY. You can check the health of the replica set using the rs.status() command. Remember that if your dataset is large, the initial synchronization process may take a long time (as with any MongoDB deployment).

That’s it! You now successfully restored a snapshot backup into Percona Server for MongoDB deployed on K8s with Percona Operator. To verify that you have successfully done that run kubectl get pods or kubectl get psmdb – the output should be similar to the one below.

$ kubectl get psmdb
NAME              ENDPOINT                                        STATUS   AGE
my-cluster-name   my-cluster-name-mongos.ns-0.svc.cluster.local   ready    156m

$ kubectl get psmdb

NAME ENDPOINT STATUS AGE

my-cluster-name my-cluster-name-mongos.ns-0.svc.cluster.local ready 156m

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

Learn More About Percona Kubernetes Operators