We recently covered the use case of Percona Backup for MongoDB and Disk Snapshots in Google Cloud Platform; now it’s time to do the same for Amazon AWS.
For this demo, I have created a 2-shard MongoDB cluster (where each shard is a 3-node PSA Replica Set) deployed on EC2 instances. Each instance has an extra EBS volume attached for storing the MongoDB data, and the PBM agent is installed as per the documentation.
Manual example
Let’s start an external backup. This can be done from any host running pbm client.
1 2 3 4 5 6 |
# pbm backup -t external Starting backup '2024-09-30T15:43:27Z'............Ready to copy data from: - aws-test-mongodb-shard01svr1:27018 - aws-test-mongodb-shard00svr1:27018 - aws-test-mongodb-cfg00:27019 After the copy is done, run: pbm backup-finish 2024-09-30T15:43:27Z |
As we can see, PBM selects one node of each shard and one config server to be backed up.
At this point, we need to create snapshots of all the hosts listed above. We could use the AWS console or run aws commands as shown below.
Let’s list the attached disks of one of the instances:
1 2 3 |
# INSTANCE_ID=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=aws-test-mongodb-cfg00" --query "Reservations[].Instances[].InstanceId" --output text) # aws ec2 describe-instances --instance-id $INSTANCE_ID --query "Reservations[].Instances[].BlockDeviceMappings[].Ebs.VolumeId" --output text vol-0715a55d37fe260c6 vol-09bf698a54b681717 |
Once we know the volume IDs of the attached disks, we can snapshot them. For example:
1 |
aws ec2 create-snapshot --volume-id "vol-0715a55d37fe260c6" --description "vol-0715a55d37fe260c6-2024-09-30-15-43-27" --tag-specifications "ResourceType=snapshot,Tags=[{Key=Name,Value=aws-test-mongodb-cfg00}]" |
After repeating the procedure for all the instances listed, we need to complete the backup:
1 2 |
# pbm backup-finish 2024-09-30T15:43:27Z Command sent. Check `pbm describe-backup 2024-09-30T15:43:27Z` for the result. |
Finally, let’s check the backup status:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# pbm describe-backup 2024-09-30T15:43:27Z name: "2024-09-30T15:43:27Z" opid: 66fac71fae39bea3c314d15f type: external last_write_time: "2024-09-30T15:43:29Z" last_transition_time: "2024-09-30T15:43:44Z" mongodb_version: 7.0.14-8 fcv: "7.0" pbm_version: 2.6.0 status: done size_h: 0 B replsets: - name: shard1 status: done node: aws-test-mongodb-shard01svr1:27018 last_write_time: "2024-09-30T15:43:29Z" last_transition_time: "2024-09-30T15:43:44Z" security: {} - name: shard0 status: done node: aws-test-mongodb-shard00svr1:27018 last_write_time: "2024-09-30T15:43:28Z" last_transition_time: "2024-09-30T15:43:44Z" security: {} - name: mongo-cfg status: done node: aws-test-mongodb-cfg00:27019 last_write_time: "2024-09-30T15:43:29Z" last_transition_time: "2024-09-30T15:43:43Z" configsvr: true security: {} |
Automating the backup
We covered the manual approach, but let’s see how we can automate the backup process.
Creating an IAM role
We are going to use an instance profile with the required permissions to handle backup/restore. Another possibility could be creating an IAM user with an associated access/secret key pair.
Here are the steps to create the service account:
1. Create an IAM policy that grants the required permissions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
aws iam create-policy --policy-name EBS_Snapshot_Policy --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ec2:DescribeInstances", "ec2:DescribeVolumes", "ec2:DescribeSnapshots", "ec2:CreateSnapshot", "ec2:CreateVolume", "ec2:DescribeTags", "ec2:CreateTags", "ec2:DetachVolume", "ec2:AttachVolume" ], "Resource": "*" } ] }' |
2. Create an IAM role that can be assumed by EC2 instances
1 2 3 4 5 6 7 8 9 10 11 12 |
aws iam create-role --role-name EBS_Snapshot_Role --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }' |
3. Attach the policy to the created role
1 2 |
# YOUR_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) aws iam attach-role-policy --role-name EBS_Snapshot_Role --policy-arn arn:aws:iam::$YOUR_ACCOUNT_ID:policy/EBS_Snapshot_Policy |
4. Create an instance profile and add the role to it
1 2 |
aws iam create-instance-profile --instance-profile-name EBS_Snapshot_Profile aws iam add-role-to-instance-profile --instance-profile-name EBS_Snapshot_Profile --role-name EBS_Snapshot_Role |
5. Attach the role to our EC2 instance. In this case, I am using aws-test-mongodb-cfg00 to run the backup.
1 2 |
INSTANCE_ID=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=aws-test-mongodb-cfg00" --query "Reservations[].Instances[].InstanceId" --output text) aws ec2 associate-iam-instance-profile --instance-id $INSTANCE_ID --iam-instance-profile Name=EBS_Snapshot_Profile |
We can verify the role is correctly attached to the EC2 instance by querying the metadata:
1 2 |
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/ EBS_Snapshot_Role |
With the role ready, we can now start working on automation.
Sample script
To automate the process, we basically need to loop through the list of hosts selected by PBM, get all the attached volumes, and create snapshots with descriptive names.
The following script requires AWS CLI to be installed. Keep in mind it is just a proof of concept, so don’t use it for production environments, as there is only basic error checking.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
#!/bin/bash # Run the pbm backup command once and capture its output echo "Running 'pbm backup -t external' command..." pbm_output=$(pbm backup -t external | tee /dev/tty) # Extract the backup timestamp from pbm_output backup_timestamp=$(echo "$pbm_output" | grep -oP "'K[0-9T:-]+" | head -1) # Format the timestamp for snapshot names (replace ':' with '-' and 'T' with '-') formatted_timestamp=$(echo "$backup_timestamp" | sed 's/:/-/g; s/T/-/g') # Extract hostnames from the output hostnames=$(echo "$pbm_output" | grep -oP '(?<=- ).*?(?=:)') # Extract the pbm backup-finish command from the output finish_command=$(echo "$pbm_output" | grep -oP 'pbm backup-finish S+') # Function to create snapshots of EBS volumes for a given instance create_snapshots() { local instance_id=$1 local instance_name=$2 # Get attached volumes for the instance volumes=$(aws ec2 describe-instances --instance-id "$instance_id" --query "Reservations[].Instances[].BlockDeviceMappings[].Ebs.VolumeId" --output text) # Create snapshots for each attached volume for volume in $volumes; do volume_name=$(aws ec2 describe-volumes --volume-ids $volume --query 'Volumes[0].Tags[?Key==`Name`].Value' --output text) if [ $volume_name = "None" ]; then volume_name=$instance_name fi snapshot_name="${volume}-${formatted_timestamp}" # Use extracted timestamp echo "Creating snapshot for volume: $volume_name - $volume" # Capture the snapshot creation output and extract the SnapshotId snapshot_output=$(aws ec2 create-snapshot --volume-id "$volume" --description "$snapshot_name" --tag-specifications "ResourceType=snapshot,Tags=[{Key=Name,Value=$volume_name}]" --query "SnapshotId" --output text) snapshot_ids+=("$snapshot_output") # Add the SnapshotId to the snapshot_ids array else echo "Failed to create snapshot for volume $volume" fi done } snapshot_ids=() # Loop through each hostname and find the corresponding AWS instance for hostname in $hostnames; do instance_id=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=$hostname" --query "Reservations[].Instances[].InstanceId" --output text) create_snapshots "$instance_id" "$hostname" done echo "Snapshot creation completed." # Run the pbm backup-finish command echo "Running: $finish_command" finish_output=$(eval "$finish_command" | tee /dev/tty) # Extract the backup timestamp from the pbm backup-finish output backup_finish_timestamp=$(echo "$finish_output" | grep -oP '(?<=pbm describe-backup )S+' | tr -d '`') # Run the pbm describe-backup command to get the backup status describe_command="pbm describe-backup $backup_finish_timestamp" echo "Running: $describe_command" eval "$describe_command" |
Conclusion
Percona Backup for MongoDB provides the interface for making snapshot-based physical backups and restores. Database owners benefit from faster and more cost-efficient backups while ensuring that their data remains consistent.
We have seen how the process works in detail for an AWS environment. Check out part two on restore and point-in-time recovery.
If you have any suggestions for feature requests or bug reports, please let us know by creating a ticket in our public issue tracker. Pull requests are also welcome!
MongoDB Performance Tuning is a collection of insights, strategies, and best practices from Percona’s MongoDB experts. Use it to diagnose — and correct — the issues that may be affecting your database’s performance.
Download MongoDB Performance Tuning today!