Creating an External Replica of AWS Aurora MySQL with Mydumper

Oftentimes, we need to replicate between Amazon Aurora and an external MySQL server. The idea is to start by taking a point-in-time copy of the dataset. Next, we can configure MySQL replication to roll it forward and keep the data up-to-date.

This process is documented by Amazon, however, it relies on the mysqldump method to create the initial copy of the data. If the dataset is in the high GB/TB range, this single-threaded method could take a very long time. Similarly, there are ways to improve the import phase (which can easily take 2x the time of the export).

Let’s explore some tricks to significantly improve the speed of this process.

Preparation Steps

The first step is to enable binary logs in Aurora. Go to the Cluster-level parameter group and make sure binlog_format is set to ROW. There is no log_bin option in Aurora (in case you are wondering), simply setting binlog_format is enough. The change requires a restart of the writer instance, so it, unfortunately, means a few minutes of downtime.

We can check if a server is generating binary logs as follows:

Otherwise, you will get an error:

We also need to ensure a proper binary log retention period. For example, if we expect the initial data export/import to take one day, we can set the retention period to something like three days to be on the safe side. This will help ensure we can roll forward the restored data.

The next step is creating a temporary cluster to take the export. We need to do this for a number of reasons: first to avoid overloading the actual production cluster by our export process, also because mydumper relies on FLUSH TABLES WITH READ LOCK to get a consistent backup, which in Aurora is not possible (due to the lack of SUPER privilege).

Go to the RDS console and restore a snapshot that was created AFTER the date/time where you enabled the binary logs. The restored cluster should also have binlog_format set, so select the correct Cluster parameter group.

Next, capture the binary log position for replication. This is done by inspecting the Recent events section in the console. After highlighting your new temporary writer instance in the console, you should see something like this:

So now we have the information to prepare the CHANGE MASTER command to use at the end of the process.

Exporting the Data

To get the data out of the temporary instance, follow these steps:

  1. Backup the schema
  2. Save the user privileges
  3. Backup the data

This gives us added flexibility; we can do some schema changes, add indexes, or extract only a subset of the data.

Let’s create a configuration file with the login details, for example:

For the schema backup, use mydumper to do a no-rows export:

To get the user privileges I normally like to use pt-show-grants. Aurora is, however, hiding the password hashes when you run SHOW GRANTS statement, so pt-show-grants will print incomplete statements e.g.:

We can still gather the hashes and replace them manually in the pt-show-grants output if there is a small-ish number of users.

Finally, run mydumper to export the data:

The number of threads should match the number of CPUs of the instance running mydumper. In the skip.txt file, you can include any tables that you don’t want to copy. The –rows argument will give you the ability to split tables in chunks of X number of rows. Each chunk can run in parallel, so it is a huge speed bump for big tables.

Importing the Data

We need to stand up a MySQL instance to do the data import. In order to speed up the process as much as possible, I suggest doing a number of optimizations to my.cnf as follows: