Using gh-ost with Amazon Aurora for MySQL

Using gh-ost with Amazon Aurora for MySQL

PREVIOUS POST
NEXT POST

gh-ost online schema changeIn this day and age of agile methodologies and fast-paced development methods, database schema changes are as inevitable as the growth of data. We have been thoroughly familiar with the effectivity of pt-online-schema-change for the last couple of years but it has one limitation. If the table is too hot to acquire metadata locks to add or remove the necessary TRIGGERs, online schema change will be almost impossible.

However, gh-ost overcomes this limitation by simply using the binary log stream and acting as a regular replication client to capture changes. It reflects those changes into a shadow table, whilst also copying currently untouched data in the background. On a regular Aurora cluster, we do not have the option of making certain schema changes from a read replica simply because Aurora read replicas are, well, exclusively read-only for us mortals. We can take advantage though of the fact that we can still use gh-ost to accomplish this task with an absolute requirement that the cluster binlog_format is set to ROW. Aside from that, just follow along with the procedures below and customize the commands to your use case.

Using gh-ost for online schema change

First, the environment we tested with for this post:

  • Aurora for MySQL 5.6, r4.4xlarge, one read-replica
  • Single sysbench table, 20 million rows running oltp_update_index test, 2 threads
  • EC2 instance to serve as controller and bastion to execute the sysbench and gh-ost processes

Our sysbench command:

Our gh-ost command:

Let’s look at some of the interesting bits on this command:

  • -allow-on-master  simply indicates to the tool to run everything against the master.
  • -database , -host , -user , -ask-pass  and -table  are the target instance, credentials and table to make the modifications on.
  • -critical-load , -max-load  tell gh-ost to watch for these statuses and either abort the process when critical thresholds are reached, or throttle itself when max values are reached.
  • -postpone-cutover-flag  is my favorite, it prevents the tool from switching to the new table as soon as the copy process is complete. This gives the operator better control of when the switch would occur e.g. off-peak hours. The tool will create this file when the option is specified.
  • -throttle-additional-flag-file  when this file exists, gh-ost pauses both applying binary logs and copying rows, be careful if you have short binary log retention periods.
  • -nice-ratio  allows the tool to artificially throttle per every rows copied. You can interpret this as percentage relative to last chunk copy time. If the last chunk copy time took 10 seconds, the tool will sleep about 1 second out of 10 seconds. Depending on the traffic on the table, you can be as aggressive as you see fit.

On the test dataset above, with the gh-ost command, the process took about 6.5mins.

There was a noticeable impact on the sysbench test as soon as we started. The 390 second mark is when the copy process completed and only the applier process is running (binlog stream).  The 469 second mark is after the process fully completed (gh-ost killed).

With a -nice-ratio  setting of 0.4 the performance impact is reduced, but the overall completion time increased as expected:

With gh-ost, one interesting feature is that aside from load, replica lag and custom queries/hooks to throttle writes, it will also automatically throttle based on the queue of binlog events it has not applied yet (1000 write events). When we increased the number of threads for sysbench from 2 to 4 on our tests, this had an effect of freezing the copy process altogether. The high churn rate on the table causes the process to get really slow, especially as changes need to be captured from the replication stream.

This is one of the reasons why doing schema changes from an async replica makes sense as it would otherwise allow you to throttle the change and copy threads as fast as possible without affecting production workload.

Increasing the -dml-batch-size  from the default 10 to 100 helped speed up processing of the binlog event queue.

gh-ost supports interactive commands

My favorite part of the process is gh-ost accepting interactive commands. You can change a number of configuration options dynamically.  For example, changing -dml-batch-size  on the fly allows you to tune based on variable workloads and not lose progress.

Another nice thing is that when -postpone-cut-over-flag-file is set, it will automatically create the file. It’s kind of intuitive since specifying the option implicitly assumes you do not want to cutover as soon as copy/apply processes are complete thus creating the file for you and achieving the intended behavior.

PREVIOUS POST
NEXT POST

Share this post

Leave a Reply