In current times, there is a high degree of focus on ensuring the availability and recovery of your production data. This can be challenging at times when using DBaaS solutions in the public cloud space, for example, when using AWS Aurora. Relying solely on a single cloud provider for database services can pose significant risks. Recent incidents, such as the outage experienced by UniSuper on Google Cloud, highlight the potential pitfalls of depending exclusively on one cloud platform for critical data services.
I have encountered multiple discussions surrounding the need for an external replica of a public cloud database instance. In this blog post, we’ll explain why an external replica can be a beneficial addition to your environment and improve your data recovery strategy beyond the provided point-in-time recovery options.
We will use AWS Aurora as an example, though the principles/scenarios discussed apply to other cloud database providers as well. We’re discussing the advantages of having an external replica for an AWS Aurora environment. Firstly, let’s review the high-level options provided with native backups and recovery for Aurora.
AWS RDS Aurora backup and recovery
- Aurora backs up cluster volume and retains it until the specified backup retention period. (max retention 35 days).
- Backups are stored in S3.
- Recovery here is done by spinning a new instance from the snapshot.
Aurora point-in-time recovery
Using the available “LatestRestorableTime” from “EarliestRestorableTime,” you can specify a specific time for the restoration.
Backtracking an Aurora DB cluster
- Aurora allows you to backtrack a DB cluster to a specific time without restoring data from a backup.
- This feature is limited by the “target backtrack window,” the amount of time specified for backtracking, and the “actual backtrack window,” the amount of time available for backtracking depending on the available storage for the database change records.
- There are multiple limitations to backtracking, but I’d mention two of them explicitly here:
- It can’t selectively backtrack a single table or a single data update.
- It causes a brief DB instance disruption. Reference: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Managing.Backtrack.html
All of the above points from the Aurora recovery scenario hint that everything is mapped against time. One cannot restore to a specific transaction when multiple transactions are ongoing simultaneously. Consider that certain businesses may require highly granular recovery options. These in-cloud solutions, although robust, might not address all recovery needs or offer the flexibility necessary for certain critical applications.
External replica in Aurora architecture
Let’s introduce an external delayed replica on our Aurora cluster and set up the following backup configuration on the replica server.
- Daily full backup using Percona XtraBackup (pushed to S3)
- Daily logical backup using mydumper
- Binary logs backup
A daily full physical backup will allow a quick restore from the backup. Mydumper is a logical backup solution that takes consistent backup per table file and can also be used for the restoration of a single table. Binary logs can be used for point-in-time recovery for a specific transaction.
In a hypothetical situation where you want to restore a single table, either due to a bad DML statement or an accidental drop, the recovery options are as follows:
- In case of having an Aurora cluster, a full backup (snapshot) restore from the Aurora cluster and extract the table. Later play the binary logs until the disaster time. But this will take time depending on how large the dataset is and also how busy it is based on the binary logs size.
- If we have Aurora backtracking enabled, Aurora will be able to backtrack to that particular disaster time. But in the case of a busy system with multiple writes in parallel, the backtrack may also roll back other transactional changes since the granularity is on time and not per transaction.
- Finally, if the architecture consists of an external delayed replica, recovery will be to sync replication until disaster, stop replication, use mysqldump to backup the desired table (or respective data), and restore that on the Aurora instance. However, in the case of regular replication, you might need to do a full backup restore, point-in-time restore, and then a full mysqldump of the table.
Pro-Tip: Use mydumper / myloader for parallel logical backup and restore instead of single threaded mysqldump.
Thus, with the delayed external replica recovery, it is possible without having to do a full restore with PITR, but instead stop replication or advance it to the point of failure and then backup/restore. Also, the second advantage here is a transaction-level PITR. If you need something with higher granularity than restoring to a particular second, then this is really your only option.
Also, in a case where we might need to consider migrating away from Aurora, having an external replica will ease the pain of restoring the full database to a new instance using a logical backup.
Summary table
Scenario | Aurora Recovery | External Delayed Replica Recovery |
Single Table Recovery | Restore a full backup (snapshot), extract the table, and then apply binary logs. This process can be time-consuming. | Stop replication, use mysqldump to back up the desired table, and restore it to the Aurora instance. |
Granular PITR | Backtracking is limited to time-based recovery, potentially affecting other concurrent transactions. | Provides transaction-level PITR by syncing replication until the disaster point, stopping replication, and restoring. |
Migration Away from Aurora | May require full restore and logical backup for migration. | Simplifies migration with an external replica, easing the process of restoring to a new instance using logical backups. |
Conclusion
It is important to consider implementing an external (delayed) replica in your architecture. This provides more control and flexibility for your data recovery strategy. Whether for disaster recovery, reporting, or simplifying cloud migrations for multi-cloud strategies, this approach ensures that your data management is both resilient and adaptable to changing requirements.
P.S.:
- Do you use AWS RDS / Aurora? Are you considering an external replica? Let me know in the comments.
- In case you’re wondering about choosing between RDS and Aurora, my friend Ananias has an excellent post to help you.
- Support for RDS Services: Percona provides support for RDS services, and you might be interested in these case studies:
Lookout Uses Percona’s Cloud Expertise to Reduce Footprint and Maintain Uptime
When a more fully customized solution is required, most of our customers usually prefer the use of AWS EC2 instances supported by our managed services offering.