Percona Backup for MongoDB (PBM) supports snapshot-based physical backups. This is made possible by the backup cursor functionality present in Percona Server for MongoDB.
The flow of snapshot-based physical backup consists of these stages:
- Preparing the database – done by PBM
- Taking the snapshots – done by the user/client app
- Completing the backup – done by PBM
A more detailed description of what happens on each step can be found in the PBM documentation.
Let’s see a practical example of performing an actual backup. For this demo, I have created a 2-shard MongoDB cluster (each shard consisting of a 3-node replica set) deployed on Google Cloud Platform instances. Each instance has an extra persistent disk attached to it for the MongoDB data. PBM agent is installed on each instance as per the documentation.
Manual example
First, we start an external backup. This can be done from any host running the PBM client.
1 2 3 4 5 6 |
# pbm backup -t external Starting backup '2024-09-26T15:21:53Z'..........Ready to copy data from: - my-test-env-mongodb-shard00svr1:27018 - my-test-env-mongodb-shard01svr1:27018 - my-test-env-mongodb-cfg01:27019 After the copy is done, run: pbm backup-finish 2024-09-26T15:21:53Z |
As we can see, PBM selects one node of each shard and one config server to be backed up.
At this point, we need to create snapshots of the hosts listed above. We could use the Google Cloud web console or run gcloud commands, as below.
Let’s list the attached disks of the first instance:
1 2 3 |
# gcloud compute instances describe "my-test-env-mongodb-shard00svr1" --zone northamerica-northeast1-b --format="get(disks[].source)" https://www.googleapis.com/compute/v1/projects/my-gcp-project/zones/northamerica-northeast1-b/disks/my-test-env-mongodb-shard00svr1;https://www.googleapis.com/compute/v1/projects/my-gcp-project/zones/northamerica-northeast1-b/disks/my-test-env-mongodb-shard00svr1-data |
Once we know the attached disks, we can snapshot them. For example:
1 |
gcloud compute disks snapshot "my-test-env-mongodb-shard00svr1-data" --zone="northamerica-northeast1-b" --snapshot-names="my-test-env-mongodb-shard00svr1-data-snap" |
After repeating the above commands for all the instances listed, we need to complete the backup:
1 2 |
# pbm backup-finish 2024-09-26T15:21:53Z Command sent. Check `pbm describe-backup 2024-09-26T15:21:53Z` for the result. |
Finally, let’s check the backup status:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# pbm describe-backup 2024-09-26T15:21:53Z name: "2024-09-26T15:21:53Z" opid: 66f57c110c75d10bcd29893f type: external last_write_time: "2024-09-26T15:21:56Z" last_transition_time: "2024-09-26T15:23:48Z" mongodb_version: 7.0.12-7 fcv: "7.0" pbm_version: 2.6.0 status: done size_h: 0 B replsets: - name: shard0 status: done node: my-test-env-mongodb-shard00svr1:27018 last_write_time: "2024-09-26T15:21:47Z" last_transition_time: "2024-09-26T15:23:47Z" security: {} - name: shard1 status: done node: my-test-env-mongodb-shard01svr1:27018 last_write_time: "2024-09-26T15:21:53Z" last_transition_time: "2024-09-26T15:23:47Z" security: {} - name: mongo-cfg status: done node: my-test-env-mongodb-cfg01:27019 last_write_time: "2024-09-26T15:21:56Z" last_transition_time: "2024-09-26T15:23:47Z" configsvr: true security: {} |
Automating the process
Now, the manual approach is ok but let’s see how we can automate the above.
Creating a service account
I haven’t mentioned authentication yet. We are going to use a service account with the required permissions to take snapshots. Another possibility could be using an IAM user with an associated HMAC key pair. Here are the steps to create the service account:
1. Set your project ID
1 |
gcloud config set project my-gcp-project |
2. Create the service account
1 2 3 |
gcloud iam service-accounts create snapshot-creator-sa --description="Service account to create snapshots" --display-name="Snapshot Creator Service Account" |
3. Assign Compute Admin and Compute Storage Admin roles
1 2 3 |
gcloud projects add-iam-policy-binding my-gcp-project --role="roles/compute.admin" |
1 2 3 |
gcloud projects add-iam-policy-binding my-gcp-project --role="roles/compute.storageAdmin" |
4. Generate and download the service account key file
1 2 |
gcloud iam service-accounts keys create snapshot-creator-key.json --iam-account=snapshot-creator-sa@my-gcp-project.iam.gserviceaccount.com |
The service account file should look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
{ "type": "service_account", "project_id": "my-gcp-project", "private_key_id": "*********************", "private_key": "-----BEGIN PRIVATE KEY-----n************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************n-----END PRIVATE KEY-----n", "client_id": "1069678999695160482161", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/snapshot-creator-sa%40my-gcp-project.iam.gserviceaccount.com", "universe_domain": "googleapis.com" } |
With the service account ready, we can now start working on our automation.
Sample script
To automate the process, we loop through the list of hosts selected by PBM, get all the attached volumes, and create snapshots of them with descriptive names.
The following script requires gcloud CLI to be installed. Keep in mind this is just a proof of concept, so don’t use it for production environments, as there is only basic error checking.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
#!/bin/bash # Set the path to the service account JSON key file SERVICE_ACCOUNT_KEY="service-account-key.json" # Authenticate using the service account key export GOOGLE_APPLICATION_CREDENTIALS="$SERVICE_ACCOUNT_KEY" # Set the GCP project PROJECT_ID="my-gcp-project" gcloud auth activate-service-account --key-file="$SERVICE_ACCOUNT_KEY" gcloud config set project "$PROJECT_ID" # Run the pbm backup command once and capture its output pbm_output=$(pbm backup -t external | tee /dev/tty) # Extract the backup timestamp from pbm_output backup_timestamp=$(echo "$pbm_output" | grep -oP "'K[0-9T:-]+" | head -1) # Get the first timestamp # Format the timestamp for snapshot names (replace ':' with '-' and 'T' with 't') formatted_timestamp=$(echo "$backup_timestamp" | sed 's/:/-/g; s/T/t/g') # Extract hostnames to be snapshotted from the output hostnames=$(echo "$pbm_output" | grep -oP '(?<=- ).*?(?=:)') # Extract the pbm backup-finish command from the output finish_command=$(echo "$pbm_output" | grep -oP 'pbm backup-finish S+') # Function to create snapshots of disks for a given instance create_snapshots() { local instance_name=$1 local zone=$2 # Get attached disks for the instance disks=$(gcloud compute instances describe "$instance_name" --zone="$zone" --format="get(disks[].source)") # Loop through each disk and create a snapshot for disk in $disks; do disk_name=$(basename "$disk") snapshot_name="${disk_name}-${formatted_timestamp}" echo "Creating snapshot for disk: $disk_name" gcloud compute disks snapshot "$disk_name" --zone="$zone" --snapshot-names="$snapshot_name" | tee /dev/tty if [ $? -eq 0 ]; then echo "Snapshot $snapshot_name created for disk $disk_name" snapshot_names+=("$snapshot_name") # Add to snapshot_names array else echo "Failed to create snapshot for disk $disk_name" fi done } snapshot_names=() # Loop through each hostname and find the corresponding GCE instance for hostname in $hostnames; do instance_info=$(gcloud compute instances list --filter="name:$hostname" --format="get(name,zone)" | tee /dev/tty) instance_name=$(echo "$instance_info" | awk '{print $1}') zone=$(echo "$instance_info" | awk '{print $2}') create_snapshots "$instance_name" "$zone" done echo "Snapshot creation completed." # Run the pbm backup-finish command echo "Running: $finish_command" finish_output=$(eval "$finish_command" | tee /dev/tty) # Extract the backup timestamp from the pbm backup-finish output backup_finish_timestamp=$(echo "$finish_output" | grep -oP '(?<=pbm describe-backup )S+' | tr -d '`') # Run the pbm describe-backup command to get the final backup status describe_command="pbm describe-backup $backup_finish_timestamp" echo "Running: $describe_command" eval "$describe_command" |
Looking ahead
Percona Backup for MongoDB provides the interface for making snapshot-based physical backups and restores. Database owners benefit from faster and more cost-efficient backups while ensuring that their data remains consistent.
This is the first stage of snapshot-based backups, and automated snapshot-based backups are planned for the future. If you have any suggestions for feature requests or bug reports, make sure to let us know by creating a ticket in our public issue tracker. Pull requests are also more than welcome!
Continue on to part two of this series.
MongoDB Performance Tuning is a collection of insights, strategies, and best practices from Percona’s MongoDB experts. Use it to diagnose — and correct — the issues that may be affecting your database’s performance.
Download MongoDB Performance Tuning today!