In this day and age of agile methodologies and fast-paced development methods, database schema changes are as inevitable as the growth of data. We have been thoroughly familiar with the effectivity of pt-online-schema-change for the last couple of years but it has one limitation. If the table is too hot to acquire metadata locks to add or remove the necessary TRIGGERs, online schema change will be almost impossible.
However, gh-ost overcomes this limitation by simply using the binary log stream and acting as a regular replication client to capture changes. It reflects those changes into a shadow table, whilst also copying currently untouched data in the background. On a regular Aurora cluster, we do not have the option of making certain schema changes from a read replica simply because Aurora read replicas are, well, exclusively read-only for us mortals. We can take advantage though of the fact that we can still use gh-ost to accomplish this task with an absolute requirement that the cluster binlog_format is set to ROW. Aside from that, just follow along with the procedures below and customize the commands to your use case.
Using gh-ost for online schema change
First, the environment we tested with for this post:
- Aurora for MySQL 5.6, r4.4xlarge, one read-replica
- Single sysbench table, 20 million rows running oltp_update_index test, 2 threads
- EC2 instance to serve as controller and bastion to execute the sysbench and gh-ost processes
Our sysbench command:
1 2 3 4 5 |
sysbench --db-driver=mysql --mysql-user=percona --mysql-password=p3rc0na_ --mysql-db=ghost --tables=1 --table-size=10000000 --auto-inc=off --threads=2 --time=0 --rand-type=pareto --report-interval=1 --mysql-host=revin-aurora.can0nprz8rtd.us-east-1.rds.amazonaws.com oltp_update_index run 2>&1 | tee sysbench.log |
Our gh-ost command:
1 2 3 4 5 6 7 8 9 |
gh-ost -allow-on-master -assume-rbr -exact-rowcount -critical-load Threads_running=400 -critical-load-hibernate-seconds 60 -database ghost -max-load Threads_running=100 -nice-ratio 0.1 -chunk-size 5000 -ask-pass -table sbtest1 -user percona -host revin-aurora.can0nprz8rtd.us-east-1.rds.amazonaws.com -postpone-cut-over-flag-file /home/ubuntu/gh-ost-sentinel -throttle-additional-flag-file /home/ubuntu/gh-ost-throttle_ghost.sbtest1 -alter 'ADD COLUMN ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP' -verbose -execute 2>&1 | tee gh-ost.log |
Let’s look at some of the interesting bits on this command:
- -allow-on-master simply indicates to the tool to run everything against the master.
- -database , -host , -user , -ask-pass and -table are the target instance, credentials and table to make the modifications on.
- -critical-load , -max-load tell gh-ost to watch for these statuses and either abort the process when critical thresholds are reached, or throttle itself when max values are reached.
- -postpone-cutover-flag is my favorite, it prevents the tool from switching to the new table as soon as the copy process is complete. This gives the operator better control of when the switch would occur e.g. off-peak hours. The tool will create this file when the option is specified.
- -throttle-additional-flag-file when this file exists, gh-ost pauses both applying binary logs and copying rows, be careful if you have short binary log retention periods.
- -nice-ratio allows the tool to artificially throttle per every rows copied. You can interpret this as percentage relative to last chunk copy time. If the last chunk copy time took 10 seconds, the tool will sleep about 1 second out of 10 seconds. Depending on the traffic on the table, you can be as aggressive as you see fit.
On the test dataset above, with the gh-ost command, the process took about 6.5mins.
1 2 3 |
2018-05-30 14:19:40 INFO Row copy complete Copy: 10000000/10000000 100.0%; Applied: 110170; Backlog: 0/1000; Time: 6m20s(total), 6m20s(copy); streamer: mysql-bin-changelog.000043:26623659; State: migrating; ETA: due Copy: 10000000/10000000 100.0%; Applied: 111250; Backlog: 526/1000; Time: 6m25s(total), 6m20s(copy); streamer: mysql-bin-changelog.000043:49727477; State: postponing cut-over; ETA: due |
There was a noticeable impact on the sysbench test as soon as we started. The 390 second mark is when the copy process completed and only the applier process is running (binlog stream). The 469 second mark is after the process fully completed (gh-ost killed).
1 2 3 4 5 6 7 8 9 10 11 12 13 |
[ 9s ] thds: 2 tps: 599.04 qps: 599.04 (r/w/o: 0.00/599.04/0.00) lat (ms,95%): 4.41 err/s: 0.00 reconn/s: 0.00 [ 10s ] thds: 2 tps: 269.00 qps: 269.00 (r/w/o: 0.00/269.00/0.00) lat (ms,95%): 13.46 err/s: 0.00 reconn/s: 0.00 [ 11s ] thds: 2 tps: 229.93 qps: 229.93 (r/w/o: 0.00/229.93/0.00) lat (ms,95%): 19.65 err/s: 0.00 reconn/s: 0.00 ... [ 389s ] thds: 2 tps: 245.99 qps: 245.99 (r/w/o: 0.00/245.99/0.00) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00 [ 390s ] thds: 2 tps: 358.01 qps: 358.01 (r/w/o: 0.00/358.01/0.00) lat (ms,95%): 10.27 err/s: 0.00 reconn/s: 0.00 [ 391s ] thds: 2 tps: 434.89 qps: 434.89 (r/w/o: 0.00/434.89/0.00) lat (ms,95%): 7.70 err/s: 0.00 reconn/s: 0.00 [ 392s ] thds: 2 tps: 470.10 qps: 470.10 (r/w/o: 0.00/470.10/0.00) lat (ms,95%): 6.21 err/s: 0.00 reconn/s: 0.00 ... [ 468s ] thds: 2 tps: 459.93 qps: 459.93 (r/w/o: 0.00/459.93/0.00) lat (ms,95%): 7.30 err/s: 0.00 reconn/s: 0.00 [ 469s ] thds: 2 tps: 553.09 qps: 553.09 (r/w/o: 0.00/553.09/0.00) lat (ms,95%): 4.91 err/s: 0.00 reconn/s: 0.00 [ 470s ] thds: 2 tps: 526.84 qps: 526.84 (r/w/o: 0.00/526.84/0.00) lat (ms,95%): 4.74 err/s: 0.00 reconn/s: 0.00 [ 471s ] thds: 2 tps: 592.18 qps: 592.18 (r/w/o: 0.00/592.18/0.00) lat (ms,95%): 4.57 err/s: 0.00 reconn/s: 0.00 |
With a -nice-ratio setting of 0.4 the performance impact is reduced, but the overall completion time increased as expected:
1 2 3 4 5 6 7 8 |
[ 12s ] thds: 2 tps: 595.85 qps: 595.85 (r/w/o: 0.00/595.85/0.00) lat (ms,95%): 4.57 err/s: 0.00 reconn/s: 0.00 [ 13s ] thds: 2 tps: 586.14 qps: 586.14 (r/w/o: 0.00/586.14/0.00) lat (ms,95%): 4.82 err/s: 0.00 reconn/s: 0.00 [ 14s ] thds: 2 tps: 560.85 qps: 560.85 (r/w/o: 0.00/560.85/0.00) lat (ms,95%): 5.28 err/s: 0.00 reconn/s: 0.00 [ 15s ] thds: 2 tps: 271.09 qps: 271.09 (r/w/o: 0.00/271.09/0.00) lat (ms,95%): 16.12 err/s: 0.00 reconn/s: 0.00 [ 16s ] thds: 2 tps: 337.00 qps: 337.00 (r/w/o: 0.00/337.00/0.00) lat (ms,95%): 11.24 err/s: 0.00 reconn/s: 0.00 [ 17s ] thds: 2 tps: 285.02 qps: 285.02 (r/w/o: 0.00/285.02/0.00) lat (ms,95%): 17.32 err/s: 0.00 reconn/s: 0.00 [ 18s ] thds: 2 tps: 346.95 qps: 346.95 (r/w/o: 0.00/346.95/0.00) lat (ms,95%): 11.45 err/s: 0.00 reconn/s: 0.00 [ 19s ] thds: 2 tps: 305.01 qps: 305.01 (r/w/o: 0.00/305.01/0.00) lat (ms,95%): 15.27 err/s: 0.00 reconn/s: 0.00 |
1 |
Copy: 10000000/10000000 100.0%; Applied: 201060; Backlog: 0/1000; Time: 9m36s(total), 9m36s(copy); streamer: mysql-bin-changelog.000060:36359774; State: migrating; ETA: due |
With gh-ost, one interesting feature is that aside from load, replica lag and custom queries/hooks to throttle writes, it will also automatically throttle based on the queue of binlog events it has not applied yet (1000 write events). When we increased the number of threads for sysbench from 2 to 4 on our tests, this had an effect of freezing the copy process altogether. The high churn rate on the table causes the process to get really slow, especially as changes need to be captured from the replication stream.
1 |
Copy: 430000/9872189 4.4%; Applied: 420423; Backlog: 1000/1000; Time: 9m0s(total), 9m0s(copy); streamer: mysql-bin-changelog.000020:95528700; State: migrating; ETA: 3h17m37s |
This is one of the reasons why doing schema changes from an async replica makes sense as it would otherwise allow you to throttle the change and copy threads as fast as possible without affecting production workload.
Increasing the -dml-batch-size from the default 10 to 100 helped speed up processing of the binlog event queue.
gh-ost supports interactive commands
My favorite part of the process is gh-ost accepting interactive commands. You can change a number of configuration options dynamically. For example, changing -dml-batch-size on the fly allows you to tune based on variable workloads and not lose progress.
1 2 3 4 5 6 7 8 9 |
ubuntu@ip-10-1-2-95:~$ echo 'dml-batch-size=100' | nc -U /tmp/gh-ost.ghost.sbtest1.sock # Migrating `ghost`.`sbtest1`; Ghost table is `ghost`.`_sbtest1_gho` # Migrating ip-172-17-3-77:3306; inspecting ip-172-17-3-77:3306; executing on ip-10-1-2-95 # Migration started at Wed May 30 14:48:05 +0000 2018 # chunk-size: 5000; max-lag-millis: 1500ms; dml-batch-size: 100; max-load: Threads_running=100; critical-load: Threads_running=400; nice-ratio: 0.100000 # throttle-additional-flag-file: /home/ubuntu/gh-ost-throttle_ghost.sbtest1 # postpone-cut-over-flag-file: /home/ubuntu/gh-ost-sentinel [set] # Serving on unix socket: /tmp/gh-ost.ghost.sbtest1.sock Copy: 2975000/10000000 29.8%; Applied: 72353; Backlog: 1000/1000; Time: 2m15s(total), 2m15s(copy); streamer: mysql-bin-changelog.000067:30693074; State: migrating; ETA: 5m19s |
Another nice thing is that when -postpone-cut-over-flag-file is set, it will automatically create the file. It’s kind of intuitive since specifying the option implicitly assumes you do not want to cutover as soon as copy/apply processes are complete thus creating the file for you and achieving the intended behavior.