The CTO has issued the decree: “We are moving to the cloud!” Great, so now what do we do? When it comes to migrating from on-premise to the cloud, there are many factors to consider and decisions that need to be made. First (and probably most important) on that list: managed DBaaS or setup and manage our own infrastructure?
Type of Cloud
Managed DBaaS options are great but come with some limitations. The main two questions that should be considered here are around staff/experience and current architecture/database design. In cases where there is a very limited database (or operations) team, a DBaaS is a great choice. Much of the operational infrastructure is already in place with general best practices in place. However, a big tradeoff comes around flexibility. In cases where the current infrastructure is complex (for reasons right or wrong), a DBaaS is definitely the wrong choice as you will be extremely limited in what you can do.
Here are some high-level considerations for determining if DBaaS is a good fit:
- Simple schema
- Standard existing architecture (master/slave, VIP failover)
- Standard CRUD workload (no elaborate procedures or business logic in the database)
- Fairly easy to shard/partition data (vertical and/or horizontal)
- Management considerations (DBaaS is “managed”, but still requires manual work that is different from on-premise installations)
- Minor upgrades/maintenance windows
- Query analysis
- Some standard operations (such as point-in-time recovery) are different in DBaaS
Naturally, this list isn’t all inclusive and the best fit will vary based on your specific use case, but it can serve as a good starting point. Also, the complexity of managing a production-grade infrastructure in the cloud manually isn’t trivial and many current on-premise techniques will need to be modified.
Once you have landed on a destination, the most important phase begins. Testing! I can’t emphasize enough how important testing is for migration into the cloud. But what should be tested and how? Here are the main things that need to be tested thoroughly before the migration:
This aspect of the testing should be fairly straightforward. Regardless of the specifics of your migration (DBaaS or self-managed), you’ll want to take a current snapshot of your database. Ideally, this would be done using Percona’s Xtrabackup tool (procedure here). However, mysqldump (or mydumper/myloader) is also a viable option if a binary snapshot isn’t supported by your target DBaaS. The biggest downside to a logical dump is the time it takes to restore, but it does allow you to make tweaks to the schema prior to the reload.
One of the biggest issues when doing a restore into a DBaaS is having triggers or procedures defined by the root user. In many cases (such as Amazon RDS), the root user isn’t available and this will cause the import to fail. As a best practice, you shouldn’t be creating procedures or triggers with the root user, but unfortunately, it isn’t uncommon. Once you have verified that the snapshot is successfully restored in your target cloud environment, you can move onto the workload and performance testing.
In many cases, these can be tested in parallel and both are equally important. In preparation to test query compatibility and performance, you will want to capture live query traffic immediately following the snapshot used to test schema compatibility. It is important to make sure the capture is done right after a snapshot to ensure an accurate representation of the schema in time that matches the traffic. Testing concurrent write performance on a traffic set that will never lock or modify any rows won’t be realistic when compared with traffic that is actually modifying production data.
The best way to capture traffic is by using the slow query log (as it is most compatible with other tools and shares the most information). Ideally, you can set the long_query_time = 0 to capture all query traffic, but your production traffic may make that challenging and adjustments may need to be made.
From Percona Toolkit, pt-upgrade is the primary tool to test that there are no differences in the workload. The tool checks to make sure that the following are the same on both the current server and the new server:
- Row Count
- Row Data
- Query Time (difference should be within an order a magnitude)
Assuming this tool shows no errors, you’ll want to ensure similar performance. If your organization has a way to generate a realistic load (note that being realistic is important here), an extra step would be to compare performance metrics between the current environment and the target environment. The easiest way to compare is to use a visualization tool such as Percona Monitoring and Management to ensure no detrimental metrics (lower throughput, major differences in IO/CPU, etc) are displayed.
Now that all the testing has happened, results have been signed off on, and the stakeholders are happy, it is time to actually move your data to the cloud. The first step is to restore a snapshot of the current production. Again, this can be done using a logical snapshot (mysqldump) or a binary snapshot (xtrabackup if possible). Just like setting up any other slave server, capturing the binlog coordinates is also needed.
With the snapshot loaded in the target cloud server, it is time to set up replication from the current production environment to the new cloud server. This is one very important step in the migration process – there needs to be a secure connection set up between the cloud server and production. This can be achieved in a few ways:
- VPN tunnel to the cloud provider
- This is the easiest method, assuming you have a VPN device available onsite
- SSH tunnel
- In this method, you create an SSH tunnel locally on the source server and connect to that port from the cloud server
In either scenario, you will use the replication coordinates from your snapshot and start replication from that point. Again, various cloud deployments may be slightly different in terms of how this is set up (RDS, for example, uses the CALL mysql.rds_set_external_master procedure), but the concept is generally the same.
It is very important that this initial replication link between the cloud server and the current production database is encrypted. Even with “non-critical” data, it is a best practice to ensure that your database traffic is encrypted if it needs to be sent over the WAN!
Finally, a replica is in the cloud, is in sync, and you are ready to pull the trigger. Through all of the testing, this should hopefully be a “trivial non-event”. In the case where the entire environment is moving to the cloud (application servers included), the new instances should be already be configured to use the new cloud server for a data source. For the migration to go live, traffic simply needs to hit the new application servers in the cloud.
In the other scenario of keeping the application servers local while just moving the database tier to the cloud, you would simply need to update your application servers to point to the new database server. Like the replication piece, if the application servers will live outside of the cloud, SSL connectivity is still a best practice.
Thankfully, this process isn’t too different from a standard migration to a new server. The biggest differences could potentially be found when migrating from on-premise to a DBaaS solution, but even then the process for executing the migration is similar. Also, planning for cloud component failure (which again is a standard best practice, but equally important in the cloud) is something that should be considered in the plan.