Google Cloud Platform: MySQL at Scale with Reliable HA Webinar Q&A

Earlier in November, we had a chance to present the “Google Cloud Platform: MySQL at Scale with Reliable HA.” We discussed different approaches to hosting MySQL in Google Cloud Platform with the available options’ pros and cons. This webinar was recorded and can be viewed here at any time. We had several great questions, which we would like to address and elaborate on the answers given during the webinar.

MySQL at Scale with Reliable HA

Q: What is your view on Cloud SQL High Availability in Google Cloud?

A: Google Cloud SQL provides High Availability through regional instances. If your Cloud SQL database is regional, it means that there’s a standby instance in another zone within the same region. Both instances (primary and standby) are kept synced through synchronous replication on the persistent disk level. Thanks to this approach, in case of an unexpected failover, no data is lost. The biggest disadvantage of this approach is that you have to pay for standby resources even though you can’t use the standby instance for any traffic, which means you double your costs with no performance benefits. Failover typically takes more than 30 seconds.

To sum up, High Availability in Google Cloud SQL is reliable but can be expensive, and failover time is not always enough for critical applications.

 

Q: How would one migrate from Google Cloud SQL to AWS RDS?

A: The easiest way to migrate if you can afford downtime is stopping the write workload to the Cloud SQL instance, taking a logical backup (mysql or mydumper), restoring it on AWS RDS, and then moving the entire workload to AWS RDS. In most cases, it’s not enough. The situation is more complex when you want to make it with no (or minimal) downtime.

To avoid downtime, you need to establish replication between your Cloud SQL (source) and RDS instances (replica). Cloud SQL can be used as a source instance for external replicas, as described in this documentation. You can take a logical backup from running a Cloud SQL instance (e.g., using mydumper), restore it to RDS and establish the replication between Cloud SQL and RDS. Using an external source for RDS is described here. It’s typically a good idea to use a VPN connection between both cloud regions to ensure your connection is secure and the database is not exposed to the public internet. Once replication is established, the steps are as follows:

  • Stop write traffic on Google Cloud SQL instance
  • Wait for the replication to catch up (synch all binlogs)
  • Make RDS instance writable and stop Cloud SQL -> RDS replication
  • Move write traffic to the RDS instance
  • Decommission your Cloud SQL instance

AWS DMS service can also be used as an intermediary in this operation.

 

Q: Is replication possible cross-cloud, e.g., Google Cloud SQL to AWS RDS, AWS RDS to Google Cloud SQL? If GCP is down, will RDS act as a primary and vice versa?

A: In general, replication between clouds is possible (see the previous question). Both Google Cloud SQL and AWS RDS can act as source and replica, including external instances as a part of your replication topology. High-availability solutions, though, in both cases, are very specific for a cloud provider implementation, and they can’t cooperate. So it’s not possible to automatically failover from RDS to GCP and vice versa. For such setups, we would recommend custom installation on Google Compute Instance and AWS EC2 with Percona Managed Database Services – if you don’t want to manage such a complex setup on your own.



Q: How did you calculate IOPS and throughput for the storage options?

A: We did not calculate the presented values in any way. Those are taken directly from Google Cloud Platform Documentation.


Q: How does GCP achieve synchronous replication?

A: Synchronous replication is possible only between the source and respective standby instance; it’s impossible to have synchronous replication between the primary and your read replicas. Each instance has its own persistent disk. Those disks are kept in sync – so replication happens on the storage layer, not the database layer. There are no implementation details about how it works available.


Q: Could you explain how to keep the primary instance available and writable during the maintenance window?

A: It’s not possible to guarantee the primary instance availability. Remember that even if you choose your maintenance window when you can accept downtime, it may or may not be followed (it’s just a preference). Maintenance events can happen at any point in time if they’re critical and may not be finished during the assigned window. If that’s not possible to accept by your application, we recommend designing a highly-available solution, e.g., with Percona XtraDB Cluster on Google Compute Engine instances instead. Such a solution won’t have such maintenance window problems.

Share this post

Comments (2)

  • Art van Scheppingen Reply

    A few words from our experience with CloudSQL HA.
    > Q: What is your view on Cloud SQL High Availability in Google Cloud?
    > A: […] Failover typically takes more than 30 seconds.
    As the whole instance dies/crashes, the database process dies and the failover will then start up in crash recovery mode. The failover time is then determined by the amount of incomplete transactions in the redo log and the unmerged commited transactions from the double write buffer. If your CloudSQL instance is large and has many writes going to it, the crash recovery will be longer and it will take longer to fail over. Also keep in mind that in this mode there is no loading of the bufferpool dump done yet, so everything will have to go to disk. We have seen failover happening in minutes rather than seconds.

    > Q: Could you explain how to keep the primary instance available and writable during the maintenance window?
    > A: It’s not possible to guarantee the primary instance availability.
    During maintenance both instances will be updated. The failover instance will be updated first and only after a successful update the primary will be updated but no failover will happen. This means there will be unavailability of the primary during maintenance on CloudSQL.

    January 7, 2021 at 3:38 am
    • Michal Nosek Reply

      Thank you, Art, for providing more insights!

      January 7, 2021 at 2:05 pm

Leave a Reply