MySQL 8.0.22: Asynchronous Replication Automatic Connection (IO Thread) Failover

MySQL 8.0.22 Asynchronous Replication Automatic ConnectionMySQL 8.0.22 was released on Oct 19, 2020, and came with nice features and a lot of bug fixes. Now, you can configure your async replica to choose the new source in case the existing source connection (IO thread) fails. In this blog, I am going to explain the entire process involved in this configuration with a use case.

Overview

This feature is very helpful to keep your replica server in sync in case of current source fails. 

To activate asynchronous connection failover, we need to set the “SOURCE_CONNECTION_AUTO_FAILOVER=1” on the “CHANGE MASTER” statement.

Once the IO connection fails, it will try to connect the existing source based on the “MASTER_RETRY_COUNT, MASTER_CONNECT_RETRY”. Then only it will do the failover. 

The feature will only work when the IO connection is failed, maybe the source crashed or stopped, or any network failures. This will not work if the replica is manually stopped using the “STOP REPLICA”. 

We have two new functions, which will help to add and delete the server entries from the source list. 

  • asynchronous_connection_failover_add_source → Arguments (‘channel’,’host’,port,’network_namespace’,weight)
  • asynchronous_connection_failover_delete_source — Arguments (‘channel’,’host’,port,’network_namespace)

The source servers need to be configured in the table “mysql.replication_asynchronous_connection_failover”. We can also use the table “performance_schema.replication_asynchronous_connection_failover” to view the available servers in source list.

Requirements

  • GTID should be enabled on all the servers.
  • Regarding auto-positioning purpose, MASTER_AUTO_POSITION should be enabled on the replica ( CHANGE MASTER ).
  • The user and password should be the same on all the source servers.
  • Replication user and password must be set for the channel using the CHANGE MASTER .. FOR CHANNEL statement. 

Use Case

I have two data centers and three servers (dc1, dc2, report).

  • “dc1” and “report” servers are in the same data center.
  • “dc2” is in a different data center.
  • “dc1” and “dc2” are in active-passive async replication setup (dc1 – active, dc2 – passive)
  • “report” is configured as an async replica under “dc1” for reporting purposes.

Here, my requirement is, if the active node “dc1” is failed, I need to configure the “report” server under “dc2” to get the live data without manual work after a failure happens.

Configuration for Automatic Connection Failover

I have installed MySQL 8.0.22 on all three servers and configured the active – passive replication between “dc1” and “dc2”.

At dc1,

At dc2,

Now, I need to configure the “report” server as an async replica under “dc1” with automatic failover options.

At report,

 

source_connection_auto_failover : To activate the automatic failover feature.

master_retry_count, master_connect_retry : The default setting is huge ( master_retry_count = 86400, master_connect_retry = 60 ), with that we need to wait 60 days ( 86400 * 60 /60/60/24 ) for the failover. So, i reduced the settings to 30 seconds ( 10 *3 )

You can see the replication is started and the failover settings are applied. The current primary source is “dc1”. Now, I am going to use the function to add the server details into the source list for the failover to dc2.

At “report”,

It shows the source list is updated with dc2 details. We are good to perform the failover now. 

I am going to shut down the MySQL service on dc1. 

At dc1, 

At report server,

The IO thread is in “connecting” state. This means it is trying to establish the connection to the existing source (dc1) based on the “master_retry_count” and “master_connect_retry” settings. 

 After 30 seconds,

You can see the source_host was changed to “dc2”. So, the server “report” performed the auto failover and connected to “dc2”. 

From the error log,

The first three lines say it tried to connect the existing primary source “dc1” in a 10 seconds interval. There was no response from “dc1”, so it does the failover to “dc2” (connected to master ‘Autofailover@dc2:3306’).

It works perfectly!

Is Failback Possible?

Let’s experiment with the below two scenarios,

  • What happens if the primary node comes back online?
  • Does it perform a failback in case the server with higher weight comes back online?

What happens if the primary node comes back online?

I am going to start the “dc1”, which was shut down earlier to test the failover.

At “dc1”,

Let’s see the replication on the “report” server.

No changes. It is still connected to “dc2”. Failback has not happened.

Does it perform a failback in case the server with higher weight comes back online?

To test this, again I shut down the MySQL on “dc1” and updated the source list on the “report” server (dc1 weight > dc2 weight).

You can see the server “dc1” is configured with a higher weight (70). Now I am going to start the MySQL service on “dc1”.

At “dc1”,

At “report” server,

No changes, so once the failover is done to the new source, the automatic failback will not happen until the new source goes down. 

From MySQL doc:

Once the replica has succeeded in making a connection, it does not change the connection unless the new source stops or there is a network failure. This is the case even if the source that became unavailable and triggered the connection change becomes available again and has a higher priority setting.

This solution is also very helpful in (cluster + async replica) environments. You can automatically switch the connection to another cluster node, in case the existing source cluster node fails. If your network is not stable, you need to consider to set the proper retry settings, because you may face the frequent failover with low thresholds.

Share this post

Comment (1)

  • hslakhan Reply

    Very interesting and useful as I have this need for a cluster at work.

    October 27, 2020 at 5:24 am

Leave a Reply