In this blog post, we will be discussing the PXC Replication Manager script/tool which basically facilitates both source and replica failover when working with multiple PXC clusters, across different DC/Networks connected via asynchronous replication mechanism.

Such topologies emerge from requirements like database version upgrades, reporting or streaming for applications, separate disaster recovery or backup solutions, and multi-source data needs etc.

We will try to further understand the usage with the help of a demo. So, let’s dive into the practical scenario below.

Topology

DC1:
172.31.78.120 DC1-1
172.31.70.222 DC1-2

DC2
172.31.75.60  DC2-1
172.31.71.144 DC2-2

DC1:

172.31.78.120 DC1-1

172.31.70.222 DC1-2

DC2

172.31.75.60 DC2-1

172.31.71.144 DC2-2

Above, we have two separate PXC clusters, which will be connected via async replication. Basically, both the clusters will act as both source/replica for each other.

Minimal configuration for PXC/Galera and async replication

DC1-1:

[mysqld]
server-id=1
log-bin=mysql-bin
log_replica_updates
gtid_mode = ON
enforce_gtid_consistency=ON

### Galera specific
wsrep_cluster_address=gcomm://172.31.78.120,172.31.70.222
wsrep_node_address=172.31.78.120
wsrep_cluster_name=DC1
wsrep_node_name=DC1-1
wsrep_sst_method=xtrabackup-v2

[mysqld]

server-id=1

log-bin=mysql-bin

log_replica_updates

gtid_mode = ON

enforce_gtid_consistency=ON

### Galera specific

wsrep_cluster_address=gcomm://172.31.78.120,172.31.70.222

wsrep_node_address=172.31.78.120

wsrep_cluster_name=DC1

wsrep_node_name=DC1-1

wsrep_sst_method=xtrabackup-v2

DC1-2:

[mysqld]
server-id=1
log-bin=mysql-bin
log_replica_updates
gtid_mode = ON
enforce_gtid_consistency=ON

### Galera specific
wsrep_cluster_address=gcomm://172.31.78.120,172.31.70.222
wsrep_node_address=172.31.78.120
wsrep_cluster_name=DC1
wsrep_node_name=DC1-2
wsrep_sst_method=xtrabackup-v2

[mysqld]

server-id=1

log-bin=mysql-bin

log_replica_updates

gtid_mode = ON

enforce_gtid_consistency=ON

### Galera specific

wsrep_cluster_address=gcomm://172.31.78.120,172.31.70.222

wsrep_node_address=172.31.78.120

wsrep_cluster_name=DC1

wsrep_node_name=DC1-2

wsrep_sst_method=xtrabackup-v2

DC2-1:

[mysqld]
server-id=2
log-bin=mysql-bin
log_replica_updates
gtid_mode = ON
enforce_gtid_consistency=ON

### Galera specific
wsrep_cluster_address=gcomm://172.31.75.60,172.31.71.144
wsrep_node_address=172.31.75.60
wsrep_cluster_name=DC2
wsrep_node_name=DC2-1
wsrep_sst_method=xtrabackup-v2

[mysqld]

server-id=2

log-bin=mysql-bin

log_replica_updates

gtid_mode = ON

enforce_gtid_consistency=ON

### Galera specific

wsrep_cluster_address=gcomm://172.31.75.60,172.31.71.144

wsrep_node_address=172.31.75.60

wsrep_cluster_name=DC2

wsrep_node_name=DC2-1

wsrep_sst_method=xtrabackup-v2

DC2-2:

[mysqld]
server-id=2
log-bin=mysql-bin
log_replica_updates
gtid_mode = ON
enforce_gtid_consistency=ON

### Galera specific
wsrep_cluster_address=gcomm://172.31.75.60,172.31.71.144
wsrep_node_address=172.31.71.144
wsrep_cluster_name=DC2
wsrep_node_name=DC2-2
wsrep_sst_method=xtrabackup-v2

[mysqld]

server-id=2

log-bin=mysql-bin

log_replica_updates

gtid_mode = ON

enforce_gtid_consistency=ON

### Galera specific

wsrep_cluster_address=gcomm://172.31.75.60,172.31.71.144

wsrep_node_address=172.31.71.144

wsrep_cluster_name=DC2

wsrep_node_name=DC2-2

wsrep_sst_method=xtrabackup-v2

Bootstrap/PXC Ready

Both clusters should be bootstrapped separately, and the other PXC nodes started normally to sync via SST.

First Node:

shell> systemctl start mysql@bootstrap.service

1	shell> systemctl start mysql@bootstrap.service

Second/other Nodes:

shell> systemctl start mysql

1	shell> systemctl start mysql

At this stage, both clusters should be active, and running but not connected to each other in any way.

Manual Asynchronous Replication setup

We need to set up a one-time manual async replication in order for automation to work later on with the help of the Replication Manager script.

We will be taking a dump from the DC1 [DC1-1] node.

mysql-DC1-1> mysqldump -u root -p --source-data=2 --single-transaction -R -A -E > dump.sql

1	mysql-DC1-1> mysqldump -u root -p --source-data=2 --single-transaction -R -A -E > dump.sql

And, then restore the dump on the DC2 [DC2-1] node.

mysql-DC2-1> mysql -u root -p < dump.sql

1	mysql-DC2-1> mysql -u root -p < dump.sql

Once restoration is finished, we can establish the replication.

mysql-DC2-1>  change replication source to GET_SOURCE_PUBLIC_KEY =1, source_host='172.31.78.120', source_user='repl', source_password='replpass', SOURCE_AUTO_POSITION = 1 FOR CHANNEL 'DC2-DC1';

1	mysql-DC2-1> change replication source to GET_SOURCE_PUBLIC_KEY =1, source_host='172.31.78.120', source_user='repl', source_password='replpass', SOURCE_AUTO_POSITION = 1 FOR CHANNEL 'DC2-DC1';

mysql-DC2-1>  start replica FOR CHANNEL 'DC2-DC1';

1	mysql-DC2-1> start replica FOR CHANNEL 'DC2-DC1';

Note: Ensure the user for replication purposes is available on the source node.

mysql-DC1-1> CREATE USER 'repl'@'%' identified by 'Rep@1234';
mysql-DC1-1> GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl'@'%';

1 2	mysql-DC1-1> CREATE USER 'repl'@'%' identified by 'Rep@1234'; mysql-DC1-1> GRANT REPLICATION CLIENT, REPLICATION SLAVE ON . TO 'repl'@'%';

Similarly, in order to set up a circular/multi-source asynchronous replication we can establish the async replication channel on DC1 as well.

Async replication established on [DC1-2] connects to source [DC2-2].

mysql-DC1-2> change replication source to GET_SOURCE_PUBLIC_KEY =1, source_host='172.31.71.144', source_user='repl', source_password='replpass', SOURCE_AUTO_POSITION = 1 FOR CHANNEL 'DC1-DC2';

mysql-DC1-2> start replica FOR CHANNEL 'DC1-DC2';

mysql-DC1-2> change replication source to GET_SOURCE_PUBLIC_KEY =1, source_host='172.31.71.144', source_user='repl', source_password='replpass', SOURCE_AUTO_POSITION = 1 FOR CHANNEL 'DC1-DC2';

mysql-DC1-2> start replica FOR CHANNEL 'DC1-DC2';

At this stage, both clusters [DC1 and DC2] act as both source and replicas. Any writes on any of the cluster nodes will sync to others.

Now, we will set up the PXC replication manager to control the failover events inside both clusters.

PXC Replication Manager Setup/Configuration

1) First, we need to create some manual tables on DC1 [DC1-1], which will capture the cluster information and the other associated metadata needed for failover decision making.

mysql-DC1-1> create database if not exists percona;

1	mysql-DC1-1> create database if not exists percona;

mysql-DC1-1> use percona;

CREATE TABLE if not exists `replication` (
 `host` varchar(40) NOT NULL,
 `weight` int(11) NOT NULL DEFAULT 0,
 `localIndex` int(11) DEFAULT NULL,
 `isReplica` enum('No','Yes','Proposed','Failed') DEFAULT 'No',
 `lastUpdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE     CURRENT_TIMESTAMP,
 `lastHeartbeat` timestamp NOT NULL DEFAULT '1971-01-01 00:00:00',
 `connectionName` varchar(64) NOT NULL,
 `currentSource`   varchar(64),
  PRIMARY KEY (`connectionName`,`host`),
  KEY `idx_host` (`host`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `cluster` (
  `cluster` varchar(31) NOT NULL,
  `masterCandidates` varchar(255) NOT NULL,
  `replCreds` varchar(255) NOT NULL,
   PRIMARY KEY (`cluster`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `link` (
  `clusterSlave` varchar(31) NOT NULL,
  `clusterMaster` varchar(31) NOT NULL,
  PRIMARY KEY (`clusterSlave`,`clusterMaster`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `weight` (
 `cluster` varchar(31) NOT NULL,
 `nodename` varchar(255) NOT NULL,
 `weight` int NOT NULL DEFAULT 0, 
  PRIMARY KEY (`cluster`,`nodename`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

mysql-DC1-1> use percona;

CREATE TABLE if not exists `replication` (

`host` varchar(40) NOT NULL,

`weight` int(11) NOT NULL DEFAULT 0,

`localIndex` int(11) DEFAULT NULL,

`isReplica` enum('No','Yes','Proposed','Failed') DEFAULT 'No',

`lastUpdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

`lastHeartbeat` timestamp NOT NULL DEFAULT '1971-01-01 00:00:00',

`connectionName` varchar(64) NOT NULL,

`currentSource` varchar(64),

PRIMARY KEY (`connectionName`,`host`),

KEY `idx_host` (`host`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `cluster` (

`cluster` varchar(31) NOT NULL,

`masterCandidates` varchar(255) NOT NULL,

`replCreds` varchar(255) NOT NULL,

PRIMARY KEY (`cluster`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `link` (

`clusterSlave` varchar(31) NOT NULL,

`clusterMaster` varchar(31) NOT NULL,

PRIMARY KEY (`clusterSlave`,`clusterMaster`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `weight` (

`cluster` varchar(31) NOT NULL,

`nodename` varchar(255) NOT NULL,

`weight` int NOT NULL DEFAULT 0,

PRIMARY KEY (`cluster`,`nodename`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

2) Next, we need to insert the desired information inside these tables. Please note that we will not perform any manual writes on the [replication] table. This table will be managed by the replication manager script.

The below table will contain PXC cluster and replication credential related information.

mysql-DC1-1> INSERT INTO `cluster` VALUES ('DC1','172.31.78.120 172.31.70.222 ','source_user='repl', source_password='replpass'');
mysql-DC1-1> INSERT INTO `cluster` VALUES ('DC2','172.31.75.60 172.31.71.144','source_user='repl', source_password='replpass'');

1 2	mysql-DC1-1> INSERT INTO `cluster` VALUES ('DC1','172.31.78.120 172.31.70.222 ','source_user='repl', source_password='replpass''); mysql-DC1-1> INSERT INTO `cluster` VALUES ('DC2','172.31.75.60 172.31.71.144','source_user='repl', source_password='replpass'');

mysql-DC1-1> select * from cluster;
+---------+------------------------------+------------------------------------------------+
| cluster | masterCandidates             | replCreds                                      |
+---------+------------------------------+------------------------------------------------+
| DC1     | 172.31.78.120 172.31.70.222  | source_user='repl', source_password='replpass' |
| DC2     | 172.31.75.60 172.31.71.144   | source_user='repl', source_password='replpass' |
+---------+------------------------------+------------------------------------------------+
2 rows in set (0.00 sec)

mysql-DC1-1> select * from cluster;

+---------+------------------------------+------------------------------------------------+

| cluster | masterCandidates | replCreds |

+---------+------------------------------+------------------------------------------------+

| DC1 | 172.31.78.120 172.31.70.222 | source_user='repl', source_password='replpass' |

| DC2 | 172.31.75.60 172.31.71.144 | source_user='repl', source_password='replpass' |

+---------+------------------------------+------------------------------------------------+

2 rows in set (0.00 sec)

The below is the mapping table which decides which cluster is participating as source/replica.

mysql-DC1-1> INSERT INTO `link` VALUES ('DC2','DC1');
mysql-DC1-1> INSERT INTO `link` VALUES ('DC1','DC2');

1 2	mysql-DC1-1> INSERT INTO `link` VALUES ('DC2','DC1'); mysql-DC1-1> INSERT INTO `link` VALUES ('DC1','DC2');

 mysql-DC1-1> select * from link;
  +--------------+---------------+
  | clusterSlave | clusterMaster |
  +--------------+---------------+
  | DC1          | DC2           |
  | DC2          | DC1           |
  +--------------+---------------+
  2 rows in set (0.00 sec)

mysql-DC1-1> select * from link;

+--------------+---------------+

| clusterSlave | clusterMaster |

+--------------+---------------+

| DC1 | DC2 |

| DC2 | DC1 |

+--------------+---------------+

2 rows in set (0.00 sec)

The below table will assign particular weight for each node to be eligible for source/replica failover.

mysql-DC1-1> INSERT INTO `weight` VALUES('DC1','DC1-1',10); 
mysql-DC1-1> INSERT INTO `weight` VALUES('DC1','DC1-2',11); 
mysql-DC1-1> INSERT INTO `weight` VALUES('DC2','DC2-1',10); 
mysql-DC1-1> INSERT INTO `weight` VALUES('DC2','DC2-2',11);

mysql-DC1-1> INSERT INTO `weight` VALUES('DC1','DC1-1',10);

mysql-DC1-1> INSERT INTO `weight` VALUES('DC1','DC1-2',11);

mysql-DC1-1> INSERT INTO `weight` VALUES('DC2','DC2-1',10);

mysql-DC1-1> INSERT INTO `weight` VALUES('DC2','DC2-2',11);

mysql-DC1-1> select * from weight;
  +---------+----------+--------+
  | cluster | nodename | weight |
  +---------+----------+--------+
  | DC1     | DC1-1    |     10 |
  | DC1     | DC1-2    |     11 |
  | DC2     | DC2-1    |     10 |
  | DC2     | DC2-2    |     11 |
  +---------+----------+--------+
  4 rows in set (0.00 sec)

mysql-DC1-1> select * from weight;

+---------+----------+--------+

| cluster | nodename | weight |

+---------+----------+--------+

| DC1 | DC1-1 | 10 |

| DC1 | DC1-2 | 11 |

| DC2 | DC2-1 | 10 |

| DC2 | DC2-2 | 11 |

+---------+----------+--------+

4 rows in set (0.00 sec)

Note – The above details will sync to all connected cluster and async nodes.

3) Finally, we have to download and set up the replication-manager script via cron so that it can do continuous monitoring on the topology and perform the failover when required.

shell> cd /usr/local/bin
shell> git clone https://github.com/percona/replication-manager.git
shell> cp replication-manager/replication_manager.sh .

shell> cd /usr/local/bin

shell> git clone https://github.com/percona/replication-manager.git

shell> cp replication-manager/replication_manager.sh .

The script by default takes inputs from – [/root/.my.cnf] so we must ensure the credentials are added there.

[client]
user=root
password=Root@1234

[client]

user=root

password=Root@1234

It’s not required to use the super/root user only. We can provide the minimum below grants also with any customized user.

E.g,

mysql> Create user 'rep_manager'@'localhost' identified by 'Repmgr@1234'; 
mysql> GRANT REPLICATION_SLAVE_ADMIN,SUPER, REPLICATION CLIENT,RELOAD on *.* TO 'rep_manager'@'localhost';
mysql> GRANT ALL on percona.* TO 'rep_manager'@'localhost';

mysql> Create user 'rep_manager'@'localhost' identified by 'Repmgr@1234';

mysql> GRANT REPLICATION_SLAVE_ADMIN,SUPER, REPLICATION CLIENT,RELOAD on *.* TO 'rep_manager'@'localhost';

mysql> GRANT ALL on percona.* TO 'rep_manager'@'localhost';

Once the above steps are done, we can test the script. It will output the verbose information about the events in the log file – /tmp/replication_manager.log. By default this file does not exist so we must create it.

shell> sh -x /usr/local/bin/replication_manager.sh

1	shell> sh -x /usr/local/bin/replication_manager.sh

Lastly, we can put the script inside a cron as per the requirement. We should activate the script on all participating nodes as the role can be switched at each failover.

crontab -l
* * * * * /usr/local/bin/replication_manager.sh

1 2	crontab -l * * * * * /usr/local/bin/replication_manager.sh

Testing Replica Failover

Below we can see the table – “replication” showing the current status of which nodes are replica’s at the moment.

mysql> select * from percona.replication;
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
| host  | weight | localIndex | isReplica | lastUpdate          | lastHeartbeat       | connectionName | currentSource |
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
| DC1-1 |     10 |          0 | No        | 2025-12-19 13:15:02 | 2025-12-19 13:15:02 | DC1-DC2        |               |
| DC1-2 |     11 |          1 | Yes       | 2025-12-19 13:15:02 | 2025-12-19 13:15:02 | DC1-DC2        | 172.31.71.144 |
| DC2-1 |     10 |          0 | Yes       | 2025-12-19 13:15:02 | 2025-12-19 13:15:02 | DC2-DC1        | 172.31.78.120 |
| DC2-2 |     11 |          1 | No        | 2025-12-19 13:00:02 | 2025-12-19 13:00:02 | DC2-DC1        | 172.31.78.120 |
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
4 rows in set (0.00 sec)

mysql> select * from percona.replication;

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

| DC1-1 | 10 | 0 | No | 2025-12-19 13:15:02 | 2025-12-19 13:15:02 | DC1-DC2 | |

| DC1-2 | 11 | 1 | Yes | 2025-12-19 13:15:02 | 2025-12-19 13:15:02 | DC1-DC2 | 172.31.71.144 |

| DC2-1 | 10 | 0 | Yes | 2025-12-19 13:15:02 | 2025-12-19 13:15:02 | DC2-DC1 | 172.31.78.120 |

| DC2-2 | 11 | 1 | No | 2025-12-19 13:00:02 | 2025-12-19 13:00:02 | DC2-DC1 | 172.31.78.120 |

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

4 rows in set (0.00 sec)

DC2-1 (Replica):

mysql> show replica statusG;
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 172.31.78.120
                  Source_User: repl
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000021
          Read_Source_Log_Pos: 40240
               Relay_Log_File: ip-172-31-75-60-relay-bin-dc2@002ddc1.000012
                Relay_Log_Pos: 40451
        Relay_Source_Log_File: binlog.000021
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes

mysql> show replica statusG;

*************************** 1. row ***************************

Replica_IO_State: Waiting for source to send event

Source_Host: 172.31.78.120

Source_User: repl

Source_Port: 3306

Connect_Retry: 60

Source_Log_File: binlog.000021

Read_Source_Log_Pos: 40240

Relay_Log_File: ip-172-31-75-60-relay-bin-dc2@002ddc1.000012

Relay_Log_Pos: 40451

Relay_Source_Log_File: binlog.000021

Replica_IO_Running: Yes

Replica_SQL_Running: Yes

DC1-1 (Source):

mysql> show replicas;
+-----------+------+------+-----------+--------------------------------------+
| Server_Id | Host | Port | Source_Id | Replica_UUID                         |
+-----------+------+------+-----------+--------------------------------------+
|       101 |      | 3306 |         1 | 10342271-d01f-11f0-8814-16ffc2ab7fdd |
+-----------+------+------+-----------+--------------------------------------+
1 row in set (0.00 sec)

mysql> show replicas;

+-----------+------+------+-----------+--------------------------------------+

+-----------+------+------+-----------+--------------------------------------+

| 101 | | 3306 | 1 | 10342271-d01f-11f0-8814-16ffc2ab7fdd |

+-----------+------+------+-----------+--------------------------------------+

1 row in set (0.00 sec)

Now we will stop the database service of one of the replicas [DC2-1] and observe the behaviour.

mysql-DC2-1> systemctl stop mysql

1	mysql-DC2-1> systemctl stop mysql

The other remaining PXC node [DC2-2] is now reflected in the “Proposed” stage and the previous replica[DC2-1] is shown as Stopped or with “No” status as below.

mysql-DC2-2> select * from percona.replication;
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
| host  | weight | localIndex | isReplica | lastUpdate          | lastHeartbeat       | connectionName | currentSource |
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
| DC2-1 |     10 |          0 | No        | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1        | 172.31.78.120 |
| DC2-2 |     11 |          0 | Proposed  | 2025-12-19 15:16:01 | 2025-12-19 15:16:01 | DC2-DC1        |               |
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
2 rows in set (0.00 sec)

mysql-DC2-2> select * from percona.replication;

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

| DC2-1 | 10 | 0 | No | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1 | 172.31.78.120 |

| DC2-2 | 11 | 0 | Proposed | 2025-12-19 15:16:01 | 2025-12-19 15:16:01 | DC2-DC1 | |

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

2 rows in set (0.00 sec)

Finally, after some time frame, we can see Node [DC2-2] becomes the new replica.

mysql-DC2-2> select * from percona.replication;
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
| host  | weight | localIndex | isReplica | lastUpdate          | lastHeartbeat       | connectionName | currentSource |
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
| DC2-1 |     10 |          0 | No        | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1        | 172.31.78.120 |
| DC2-2 |     11 |          0 | Yes       | 2025-12-19 16:45:02 | 2025-12-19 16:45:02 | DC2-DC1        | 172.31.78.120 |
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
2 rows in set (0.00 sec)

mysql-DC2-2> select * from percona.replication;

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

| DC2-1 | 10 | 0 | No | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1 | 172.31.78.120 |

| DC2-2 | 11 | 0 | Yes | 2025-12-19 16:45:02 | 2025-12-19 16:45:02 | DC2-DC1 | 172.31.78.120 |

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

2 rows in set (0.00 sec)

mysql-DC2-2> show replica statusG;
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 172.31.78.120
                  Source_User: repl
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000001
          Read_Source_Log_Pos: 69068
               Relay_Log_File: ip-172-31-71-144-relay-bin-dc2@002ddc1.000002
                Relay_Log_Pos: 68553
        Relay_Source_Log_File: binlog.000001
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes

mysql-DC2-2> show replica statusG;

*************************** 1. row ***************************

Replica_IO_State: Waiting for source to send event

Source_Host: 172.31.78.120

Source_User: repl

Source_Port: 3306

Connect_Retry: 60

Source_Log_File: binlog.000001

Read_Source_Log_Pos: 69068

Relay_Log_File: ip-172-31-71-144-relay-bin-dc2@002ddc1.000002

Relay_Log_Pos: 68553

Relay_Source_Log_File: binlog.000001

Replica_IO_Running: Yes

Replica_SQL_Running: Yes

DC1-1 (Source):

mysql-DC1-1> show replicas;
+-----------+------+------+-----------+--------------------------------------+
| Server_Id | Host | Port | Source_Id | Replica_UUID                         |
+-----------+------+------+-----------+--------------------------------------+
|        10 |      | 3306 |         1 | 2cdc0f10-cc93-11f0-9fc9-16ffee718857 |
+-----------+------+------+-----------+--------------------------------------+
1 row in set (0.00 sec)

mysql-DC1-1> show replicas;

+-----------+------+------+-----------+--------------------------------------+

+-----------+------+------+-----------+--------------------------------------+

| 10 | | 3306 | 1 | 2cdc0f10-cc93-11f0-9fc9-16ffee718857 |

+-----------+------+------+-----------+--------------------------------------+

1 row in set (0.00 sec)

The replication manager log file – /tmp/replication_manager.log would also show information regarding changing replica nodes and updating replication tables.

+ grep -c ERROR /tmp/mysql_error
+ '[' 0 -gt 0 ']'
+ '[' a172.31.78.120 == a ']'
+ '[' aYes == a ']'
+ '[' Yes == Yes ']'
+ [[ Yes == Yes ]]
+ [[ Yes == Yes ]]
+ /usr/bin/mysql --connect_timeout=10 -B -e '
                    update percona.replication
                     set isReplica='''Yes''', localIndex=0, lastHeartbeat=now(),currentSource='''172.31.78.120'''
                     where connectionName = '''DC2-DC1'''
                     and host = '''DC2-2''''
++ /usr/bin/mysql --connect_timeout=10 -B -BN -e 'select count(*) from percona.replication where isReplica = '''Yes''' and connectionName = '''DC2-DC1''' and unix_timestamp(lastHeartbeat) > unix_timestamp() - 179'
+ slaveCount=1
+ '[' 1 -gt 1 ']'

+ grep -c ERROR /tmp/mysql_error

+ '[' 0 -gt 0 ']'

+ '[' a172.31.78.120 == a ']'

+ '[' aYes == a ']'

+ '[' Yes == Yes ']'

+ [[ Yes == Yes ]]

+ /usr/bin/mysql --connect_timeout=10 -B -e '

update percona.replication

set isReplica='''Yes''', localIndex=0, lastHeartbeat=now(),currentSource='''172.31.78.120'''

where connectionName = '''DC2-DC1'''

and host = '''DC2-2''''

++ /usr/bin/mysql --connect_timeout=10 -B -BN -e 'select count(*) from percona.replication where isReplica = '''Yes''' and connectionName = '''DC2-DC1''' and unix_timestamp(lastHeartbeat) > unix_timestamp() - 179'

+ slaveCount=1

+ '[' 1 -gt 1 ']'

Testing Source Failover

Now here we will test what happens when the source [DC1-1] itself is down or not available.

Before source failover:

mysql-DC2-2> select * from percona.replication;
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
| host  | weight | localIndex | isReplica | lastUpdate          | lastHeartbeat       | connectionName | currentSource |
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
| DC2-1 |     10 |          0 | No        | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1        | 172.31.78.120 |
| DC2-2 |     11 |          0 | Yes       | 2025-12-19 17:07:02 | 2025-12-19 17:07:02 | DC2-DC1        | 172.31.78.120 |
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
2 rows in set (0.00 sec)

mysql-DC2-2> select * from percona.replication;

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

| DC2-1 | 10 | 0 | No | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1 | 172.31.78.120 |

| DC2-2 | 11 | 0 | Yes | 2025-12-19 17:07:02 | 2025-12-19 17:07:02 | DC2-DC1 | 172.31.78.120 |

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

2 rows in set (0.00 sec)

mysql-DC2-2> show replica statusG;
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 172.31.78.120
                  Source_User: repl
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000001
          Read_Source_Log_Pos: 88214
               Relay_Log_File: ip-172-31-71-144-relay-bin-dc2@002ddc1.000002
                Relay_Log_Pos: 87699
        Relay_Source_Log_File: binlog.000001
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes

mysql-DC2-2> show replica statusG;

*************************** 1. row ***************************

Replica_IO_State: Waiting for source to send event

Source_Host: 172.31.78.120

Source_User: repl

Source_Port: 3306

Connect_Retry: 60

Source_Log_File: binlog.000001

Read_Source_Log_Pos: 88214

Relay_Log_File: ip-172-31-71-144-relay-bin-dc2@002ddc1.000002

Relay_Log_Pos: 87699

Relay_Source_Log_File: binlog.000001

Replica_IO_Running: Yes

Replica_SQL_Running: Yes

Stopping the source:

mysql-DC1-1> systemctl stop mysql

1	mysql-DC1-1> systemctl stop mysql

After source failover:

Once the source [DC1-1] is down , the replica [DC2-2] is connected via another source [DC1-2].

mysql-DC2-2> select * from percona.replication;
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
| host  | weight | localIndex | isReplica | lastUpdate          | lastHeartbeat       | connectionName | currentSource |
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
| DC2-1 |     10 |          0 | No        | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1        | 172.31.78.120 |
| DC2-2 |     11 |          0 | Yes       | 2025-12-19 17:17:02 | 2025-12-19 17:17:02 | DC2-DC1        | 172.31.70.222 |
+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+
2 rows in set (0.00 sec)

mysql-DC2-2> select * from percona.replication;

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

| DC2-1 | 10 | 0 | No | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1 | 172.31.78.120 |

| DC2-2 | 11 | 0 | Yes | 2025-12-19 17:17:02 | 2025-12-19 17:17:02 | DC2-DC1 | 172.31.70.222 |

+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+

2 rows in set (0.00 sec)

mysql-DC2-2> show replica statusG;
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 172.31.70.222
                  Source_User: repl
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binlog.000002
          Read_Source_Log_Pos: 83664
               Relay_Log_File: ip-172-31-71-144-relay-bin-dc2@002ddc1.000004
                Relay_Log_Pos: 83875
        Relay_Source_Log_File: binlog.000002
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes

mysql-DC2-2> show replica statusG;

*************************** 1. row ***************************

Replica_IO_State: Waiting for source to send event

Source_Host: 172.31.70.222

Source_User: repl

Source_Port: 3306

Connect_Retry: 60

Source_Log_File: binlog.000002

Read_Source_Log_Pos: 83664

Relay_Log_File: ip-172-31-71-144-relay-bin-dc2@002ddc1.000004

Relay_Log_Pos: 83875

Relay_Source_Log_File: binlog.000002

Replica_IO_Running: Yes

Replica_SQL_Running: Yes

Important considerations

As the replication manager script is still in technical preview, it would be advisable to test it thoroughly before using it in production. The topology we used above is for demo purposes and we suggest using a minimum of 3 PXC nodes in production. Also avoid doing writes from multiple nodes in the cluster to lessen the risk of inconsistency issues.
It will support only with the PXC/MariaDB based Galera environment. For isolated async nodes it won’t work and we have to rely on async replication failover.
The script depends on GTID based replication. The auto failover won’t work when using binary log file/position based replication.

References

Final Thought

We discussed some use cases of the PXC replication manager script, which is capable of handling both source and replica failovers within PXC/MariaDB-based Galera clusters. The script has certain limitations as it does not work with standard asynchronous replicas. It is applicable only when the nodes are part of a PXC cluster.

The script is also capable of handling more complex topologies, such as multi-source replication, where a single cluster synchronizes data from multiple sources using multiple replication channels. We may explore these scenarios in some separate blog posts.

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Using PXC Replication Manager to Auto Manage Both Source and Replica Failover in Galera-Based Environments

Topology

Minimal configuration for PXC/Galera and async replication

DC1-1:

DC1-2:

DC2-1:

DC2-2:

Bootstrap/PXC Ready

First Node:

Second/other Nodes:

Manual Asynchronous Replication setup

PXC Replication Manager Setup/Configuration

Testing Replica Failover

DC2-1 (Replica):

DC1-1 (Source):

DC1-1 (Source):

Testing Source Failover

Before source failover:

Stopping the source:

After source failover:

Important considerations

References

Final Thought

About the Author

Share This Post!

Stay up to date with the Percona Blog

Related Blog Articles

RECOMMENDED ARTICLES

A Failing Unit Test, a Mysterious TCMalloc Misconfiguration, and a 60% Performance Gain in Docker

What Exactly Is the MySQL Ecosystem?

Security Advisory: A Series of CVEs Affecting Valkey

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7
Support

Software
Downloads