Creating an External Replica of AWS Aurora MySQL with Mydumper

Oftentimes, we need to replicate between Amazon Aurora and an external MySQL server. The idea is to start by taking a point-in-time copy of the dataset. Next, we can configure MySQL replication to roll it forward and keep the data up-to-date.

This process is documented by Amazon, however, it relies on the mysqldump method to create the initial copy of the data. If the dataset is in the high GB/TB range, this single-threaded method could take a very long time. Similarly, there are ways to improve the import phase (which can easily take 2x the time of the export).

Let’s explore some tricks to significantly improve the speed of this process.

Preparation Steps

The first step is to enable binary logs in Aurora. Go to the Cluster-level parameter group and make sure binlog_format is set to ROW. There is no log_bin option in Aurora (in case you are wondering), simply setting binlog_format is enough. The change requires a restart of the writer instance, so it, unfortunately, means a few minutes of downtime.

We can check if a server is generating binary logs as follows:

Otherwise, you will get an error:

We also need to ensure a proper binary log retention period. For example, if we expect the initial data export/import to take one day, we can set the retention period to something like three days to be on the safe side. This will help ensure we can roll forward the restored data.

The next step is creating a temporary cluster to take the export. We need to do this for a number of reasons: first to avoid overloading the actual production cluster by our export process, also because mydumper relies on FLUSH TABLES WITH READ LOCK to get a consistent backup, which in Aurora is not possible (due to the lack of SUPER privilege).

Go to the RDS console and restore a snapshot that was created AFTER the date/time where you enabled the binary logs. The restored cluster should also have binlog_format set, so select the correct Cluster parameter group.

Next, capture the binary log position for replication. This is done by inspecting the Recent events section in the console. After highlighting your new temporary writer instance in the console, you should see something like this:

So now we have the information to prepare the CHANGE MASTER command to use at the end of the process.

Exporting the Data

To get the data out of the temporary instance, follow these steps:

  1. Backup the schema
  2. Save the user privileges
  3. Backup the data

This gives us added flexibility; we can do some schema changes, add indexes, or extract only a subset of the data.

Let’s create a configuration file with the login details, for example:

For the schema backup, use mydumper to do a no-rows export:

To get the user privileges I normally like to use pt-show-grants. Aurora is, however, hiding the password hashes when you run SHOW GRANTS statement, so pt-show-grants will print incomplete statements e.g.:

We can still gather the hashes and replace them manually in the pt-show-grants output if there is a small-ish number of users.

Finally, run mydumper to export the data:

The number of threads should match the number of CPUs of the instance running mydumper. In the skip.txt file, you can include any tables that you don’t want to copy. The –rows argument will give you the ability to split tables in chunks of X number of rows. Each chunk can run in parallel, so it is a huge speed bump for big tables.

Importing the Data

We need to stand up a MySQL instance to do the data import. In order to speed up the process as much as possible, I suggest doing a number of optimizations to my.cnf as follows:

Note that mydumper is smart enough to turn off the binary log for the importer threads.

After the import is complete, it is important to revert these settings to “safer” values: innodb_doublewriteinnodb_flush_log_at_trx_commit, sync_binlog, and also enable performance_schema again.

The next step is to create an empty schema by running myloader:

At this point, we can easily introduce modifications like adding indexes, since the tables are empty. We can also restore the users at this time:

Now we are ready to restore the actual data using myloader. It is recommended to run this inside a screen session:

The rule of thumb here is to use half the number of vCPU threads. I also normally like to reduce mydumper default transaction size (1000) to avoid long transactions, but your mileage may vary.

After the import process is done, we can leverage faster methods (like snapshots or Percona Xtrabackup) to seed any remaining external replicas.

Setting Up Replication

The final step is setting up replication from the actual production cluster (not the temporary one!) to your external instance.

It is a good idea to create a dedicated user for this process in the source instance, as follows:

Now we can start replication, using the binary log coordinates that we captured before:

Final Words

Unfortunately, there is no quick and easy method to get a large dataset out of an Aurora cluster. We have seen how mydumper and myloader can save a lot of time when creating external replicas, by introducing parallel operations. We also reviewed some good practices and configuration tricks for speeding up the data loading phase as much as possible.


Optimize your database performance with Percona Monitoring and Management, a free, open source database monitoring tool. Designed to work with Amazon RDS MySQL and Amazon Aurora MySQL with a specific dashboard for monitoring Amazon Aurora MySQL using Cloudwatch and direct sampling of MySQL metrics.

Visit the Demo

Share this post

Comment (1)

  • Fernando Mattera Reply

    Nice post Iván!

    September 17, 2020 at 6:54 pm

Leave a Reply