MySQL 8.0.22 was released on Oct 19, 2020, and came with nice features and a lot of bug fixes. Now, you can configure your async replica to choose the new source in case the existing source connection (IO thread) fails. In this blog, I am going to explain the entire process involved in this configuration with a use case.
Overview
This feature is very helpful to keep your replica server in sync in case of current source fails.
To activate asynchronous connection failover, we need to set the “SOURCE_CONNECTION_AUTO_FAILOVER=1” on the “CHANGE MASTER” statement.
Once the IO connection fails, it will try to connect the existing source based on the “MASTER_RETRY_COUNT, MASTER_CONNECT_RETRY”. Then only it will do the failover.
The feature will only work when the IO connection is failed, maybe the source crashed or stopped, or any network failures. This will not work if the replica is manually stopped using the “STOP REPLICA”.
We have two new functions, which will help to add and delete the server entries from the source list.
- asynchronous_connection_failover_add_source → Arguments (‘channel’,’host’,port,’network_namespace’,weight)
- asynchronous_connection_failover_delete_source — Arguments (‘channel’,’host’,port,’network_namespace)
The source servers need to be configured in the table “mysql.replication_asynchronous_connection_failover”. We can also use the table “performance_schema.replication_asynchronous_connection_failover” to view the available servers in source list.
Requirements
- GTID should be enabled on all the servers.
- Regarding auto-positioning purpose, MASTER_AUTO_POSITION should be enabled on the replica ( CHANGE MASTER ).
- The user and password should be the same on all the source servers.
- Replication user and password must be set for the channel using the CHANGE MASTER .. FOR CHANNEL statement.
Use Case
I have two data centers and three servers (dc1, dc2, report).
- “dc1” and “report” servers are in the same data center.
- “dc2” is in a different data center.
- “dc1” and “dc2” are in active-passive async replication setup (dc1 – active, dc2 – passive)
- “report” is configured as an async replica under “dc1” for reporting purposes.
Here, my requirement is, if the active node “dc1” is failed, I need to configure the “report” server under “dc2” to get the live data without manual work after a failure happens.
Configuration for Automatic Connection Failover
I have installed MySQL 8.0.22 on all three servers and configured the active – passive replication between “dc1” and “dc2”.
1 2 3 4 |
[root@dc1 ~]# mysql -e "select @@version, @@version_comment\G" *************************** 1. row *************************** @@version: 8.0.22 @@version_comment: MySQL Community Server - GPL |
At dc1,
1 2 3 4 5 |
mysql> show replica status\G Source_Host: dc2 Replica_IO_Running: Yes Replica_SQL_Running: Yes 1 row in set (0.00 sec) |
At dc2,
1 2 3 4 5 |
mysql> show replica status\G Source_Host: dc1 Replica_IO_Running: Yes Replica_SQL_Running: Yes 1 row in set (0.00 sec) |
Now, I need to configure the “report” server as an async replica under “dc1” with automatic failover options.
At report,
1 2 3 4 5 6 7 8 9 10 11 |
mysql> change master to -> master_user='Autofailover', -> master_password='Autofailover@321', -> master_host='dc1', -> master_auto_position=1, -> get_master_public_key=1, -> source_connection_auto_failover=1, -> master_retry_count=3, -> master_connect_retry=10 -> for channel "herc7"; Query OK, 0 rows affected, 2 warnings (0.03 sec) |
source_connection_auto_failover : To activate the automatic failover feature.
master_retry_count, master_connect_retry : The default setting is huge ( master_retry_count = 86400, master_connect_retry = 60 ), with that we need to wait 60 days ( 86400 * 60 /60/60/24 ) for the failover. So, i reduced the settings to 30 seconds ( 10 *3 )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
mysql> start replica for channel "herc7"; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> show replica status\G Source_Host: dc1 Connect_Retry: 10 Replica_IO_Running: Yes Replica_SQL_Running: Yes Seconds_Behind_Source: 0 Last_IO_Error: Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates Source_Retry_Count: 3 Last_IO_Error_Timestamp: Auto_Position: 1 Channel_Name: herc7 1 row in set (0.00 sec) |
You can see the replication is started and the failover settings are applied. The current primary source is “dc1”. Now, I am going to use the function to add the server details into the source list for the failover to dc2.
At “report”,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
mysql> select asynchronous_connection_failover_add_source('herc7', 'dc2', 3306, '', 50); +------------------------------------------------------------------------------+ | asynchronous_connection_failover_add_source('herc7', 'dc2', 3306, '', 50) | +------------------------------------------------------------------------------+ | The UDF asynchronous_connection_failover_add_source() executed successfully. | +------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> select * from mysql.replication_asynchronous_connection_failover\G *************************** 1. row *************************** Channel_name: herc7 Host: dc2 Port: 3306 Network_namespace: Weight: 50 1 row in set (0.00 sec) |
It shows the source list is updated with dc2 details. We are good to perform the failover now.
I am going to shut down the MySQL service on dc1.
At dc1,
1 2 3 |
[root@dc1 ~]# service mysqld stop Redirecting to /bin/systemctl stop mysqld.service [root@dc1 ~]# |
At report server,
1 2 3 4 5 6 7 8 9 10 11 12 13 |
mysql> show replica status\G Source_Host: dc1 Connect_Retry: 10 Replica_IO_Running: Connecting Replica_SQL_Running: Yes Seconds_Behind_Source: NULL Last_IO_Error: error reconnecting to master 'Autofailover@dc1:3306' - retry-time: 10 retries: 2 message: Can't connect to MySQL server on 'dc1' (111) Replica_SQL_Running_State: Slave has read all relay log; waiting for more updates Source_Retry_Count: 3 Last_IO_Error_Timestamp: 201019 21:32:26 Auto_Position: 1 Channel_Name: herc7 1 row in set (0.00 sec) |
The IO thread is in “connecting” state. This means it is trying to establish the connection to the existing source (dc1) based on the “master_retry_count” and “master_connect_retry” settings.
After 30 seconds,