Using gh-ost with Amazon Aurora for MySQL

gh-ost online schema changeIn this day and age of agile methodologies and fast-paced development methods, database schema changes are as inevitable as the growth of data. We have been thoroughly familiar with the effectivity of pt-online-schema-change for the last couple of years but it has one limitation. If the table is too hot to acquire metadata locks to add or remove the necessary TRIGGERs, online schema change will be almost impossible.

However, gh-ost overcomes this limitation by simply using the binary log stream and acting as a regular replication client to capture changes. It reflects those changes into a shadow table, whilst also copying currently untouched data in the background. On a regular Aurora cluster, we do not have the option of making certain schema changes from a read replica simply because Aurora read replicas are, well, exclusively read-only for us mortals. We can take advantage though of the fact that we can still use gh-ost to accomplish this task with an absolute requirement that the cluster binlog_format is set to ROW. Aside from that, just follow along with the procedures below and customize the commands to your use case.

Using gh-ost for online schema change

First, the environment we tested with for this post:

  • Aurora for MySQL 5.6, r4.4xlarge, one read-replica
  • Single sysbench table, 20 million rows running oltp_update_index test, 2 threads
  • EC2 instance to serve as controller and bastion to execute the sysbench and gh-ost processes

Our sysbench command:

Our gh-ost command:

Let’s look at some of the interesting bits on this command:

  • -allow-on-master  simply indicates to the tool to run everything against the master.
  • -database , -host , -user , -ask-pass  and -table  are the target instance, credentials and table to make the modifications on.
  • -critical-load , -max-load  tell gh-ost to watch for these statuses and either abort the process when critical thresholds are reached, or throttle itself when max values are reached.
  • -postpone-cutover-flag  is my favorite, it prevents the tool from switching to the new table as soon as the copy process is complete. This gives the operator better control of when the switch would occur e.g. off-peak hours. The tool will create this file when the option is specified.
  • -throttle-additional-flag-file  when this file exists, gh-ost pauses both applying binary logs and copying rows, be careful if you have short binary log retention periods.
  • -nice-ratio  allows the tool to artificially throttle per every rows copied. You can interpret this as percentage relative to last chunk copy time. If the last chunk copy time took 10 seconds, the tool will sleep about 1 second out of 10 seconds. Depending on the traffic on the table, you can be as aggressive as you see fit.

On the test dataset above, with the gh-ost command, the process took about 6.5mins.

There was a noticeable impact on the sysbench test as soon as we started. The 390 second mark is when the copy process completed and only the applier process is running (binlog stream).  The 469 second mark is after the process fully completed (gh-ost killed).

With a -nice-ratio  setting of 0.4 the performance impact is reduced, but the overall completion time increased as expected: