In this blog post, we will be discussing the PXC Replication Manager script/tool which basically facilitates both source and replica failover when working with multiple PXC clusters, across different DC/Networks connected via asynchronous replication mechanism.
Such topologies emerge from requirements like database version upgrades, reporting or streaming for applications, separate disaster recovery or backup solutions, and multi-source data needs etc.
We will try to further understand the usage with the help of a demo. So, let’s dive into the practical scenario below.
|
1 |
DC1:<br>172.31.78.120 DC1-1<br>172.31.70.222 DC1-2<br><br>DC2<br>172.31.75.60 DC2-1<br>172.31.71.144 DC2-2 |
Above, we have two separate PXC clusters, which will be connected via async replication. Basically, both the clusters will act as both source/replica for each other.
|
1 |
[mysqld]<br>server-id=1<br>log-bin=mysql-bin<br>log_replica_updates<br>gtid_mode = ON<br>enforce_gtid_consistency=ON<br><br>### Galera specific<br>wsrep_cluster_address=gcomm://172.31.78.120,172.31.70.222<br>wsrep_node_address=172.31.78.120<br>wsrep_cluster_name=DC1<br>wsrep_node_name=DC1-1<br>wsrep_sst_method=xtrabackup-v2 |
|
1 |
[mysqld]<br>server-id=1<br>log-bin=mysql-bin<br>log_replica_updates<br>gtid_mode = ON<br>enforce_gtid_consistency=ON<br><br>### Galera specific<br>wsrep_cluster_address=gcomm://172.31.78.120,172.31.70.222<br>wsrep_node_address=172.31.78.120<br>wsrep_cluster_name=DC1<br>wsrep_node_name=DC1-2<br>wsrep_sst_method=xtrabackup-v2 |
|
1 |
[mysqld]<br>server-id=2<br>log-bin=mysql-bin<br>log_replica_updates<br>gtid_mode = ON<br>enforce_gtid_consistency=ON<br><br>### Galera specific<br>wsrep_cluster_address=gcomm://172.31.75.60,172.31.71.144<br>wsrep_node_address=172.31.75.60<br>wsrep_cluster_name=DC2<br>wsrep_node_name=DC2-1<br>wsrep_sst_method=xtrabackup-v2 |
|
1 |
[mysqld]<br>server-id=2<br>log-bin=mysql-bin<br>log_replica_updates<br>gtid_mode = ON<br>enforce_gtid_consistency=ON<br><br>### Galera specific<br>wsrep_cluster_address=gcomm://172.31.75.60,172.31.71.144<br>wsrep_node_address=172.31.71.144<br>wsrep_cluster_name=DC2<br>wsrep_node_name=DC2-2<br>wsrep_sst_method=xtrabackup-v2 |
Both clusters should be bootstrapped separately, and the other PXC nodes started normally to sync via SST.
|
1 |
shell> systemctl start mysql@bootstrap.service |
|
1 |
shell> systemctl start mysql |
At this stage, both clusters should be active, and running but not connected to each other in any way.
We need to set up a one-time manual async replication in order for automation to work later on with the help of the Replication Manager script.
|
1 |
mysql-DC1-1> mysqldump -u root -p --source-data=2 --single-transaction -R -A -E > dump.sql<br> |
|
1 |
mysql-DC2-1> mysql -u root -p < dump.sql |
|
1 |
mysql-DC2-1> change replication source to GET_SOURCE_PUBLIC_KEY =1, source_host='172.31.78.120', source_user='repl', source_password='replpass', SOURCE_AUTO_POSITION = 1 FOR CHANNEL 'DC2-DC1';<br> |
|
1 |
mysql-DC2-1> start replica FOR CHANNEL 'DC2-DC1';<br> |
Note: Ensure the user for replication purposes is available on the source node.
|
1 |
mysql-DC1-1> CREATE USER 'repl'@'%' identified by 'Rep@1234';<br>mysql-DC1-1> GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl'@'%';<br> |
Similarly, in order to set up a circular/multi-source asynchronous replication we can establish the async replication channel on DC1 as well.
|
1 |
mysql-DC1-2> change replication source to GET_SOURCE_PUBLIC_KEY =1, source_host='172.31.71.144', source_user='repl', source_password='replpass', SOURCE_AUTO_POSITION = 1 FOR CHANNEL 'DC1-DC2';<br><br>mysql-DC1-2> start replica FOR CHANNEL 'DC1-DC2';<br> |
At this stage, both clusters [DC1 and DC2] act as both source and replicas. Any writes on any of the cluster nodes will sync to others.
Now, we will set up the PXC replication manager to control the failover events inside both clusters.
1) First, we need to create some manual tables on DC1 [DC1-1], which will capture the cluster information and the other associated metadata needed for failover decision making.
|
1 |
mysql-DC1-1> create database if not exists percona; |
|
1 |
mysql-DC1-1> use percona;<br><br>CREATE TABLE if not exists `replication` (<br> `host` varchar(40) NOT NULL,<br> `weight` int(11) NOT NULL DEFAULT 0,<br> `localIndex` int(11) DEFAULT NULL,<br> `isReplica` enum('No','Yes','Proposed','Failed') DEFAULT 'No',<br> `lastUpdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,<br> `lastHeartbeat` timestamp NOT NULL DEFAULT '1971-01-01 00:00:00',<br> `connectionName` varchar(64) NOT NULL,<br> `currentSource` varchar(64),<br> PRIMARY KEY (`connectionName`,`host`),<br> KEY `idx_host` (`host`)<br>) ENGINE=InnoDB DEFAULT CHARSET=latin1;<br><br>CREATE TABLE `cluster` (<br> `cluster` varchar(31) NOT NULL,<br> `masterCandidates` varchar(255) NOT NULL,<br> `replCreds` varchar(255) NOT NULL,<br> PRIMARY KEY (`cluster`)<br>) ENGINE=InnoDB DEFAULT CHARSET=latin1;<br><br>CREATE TABLE `link` (<br> `clusterSlave` varchar(31) NOT NULL,<br> `clusterMaster` varchar(31) NOT NULL,<br> PRIMARY KEY (`clusterSlave`,`clusterMaster`)<br>) ENGINE=InnoDB DEFAULT CHARSET=latin1;<br><br>CREATE TABLE `weight` (<br> `cluster` varchar(31) NOT NULL,<br> `nodename` varchar(255) NOT NULL,<br> `weight` int NOT NULL DEFAULT 0, <br> PRIMARY KEY (`cluster`,`nodename`)<br>) ENGINE=InnoDB DEFAULT CHARSET=latin1; |
2) Next, we need to insert the desired information inside these tables. Please note that we will not perform any manual writes on the [replication] table. This table will be managed by the replication manager script.
|
1 |
mysql-DC1-1> INSERT INTO `cluster` VALUES ('DC1','172.31.78.120 172.31.70.222 ','source_user='repl', source_password='replpass'');<br>mysql-DC1-1> INSERT INTO `cluster` VALUES ('DC2','172.31.75.60 172.31.71.144','source_user='repl', source_password='replpass'');<br> |
|
1 |
mysql-DC1-1> select * from cluster;<br>+---------+------------------------------+------------------------------------------------+<br>| cluster | masterCandidates | replCreds |<br>+---------+------------------------------+------------------------------------------------+<br>| DC1 | 172.31.78.120 172.31.70.222 | source_user='repl', source_password='replpass' |<br>| DC2 | 172.31.75.60 172.31.71.144 | source_user='repl', source_password='replpass' |<br>+---------+------------------------------+------------------------------------------------+<br>2 rows in set (0.00 sec)<br> |
|
1 |
mysql-DC1-1> INSERT INTO `link` VALUES ('DC2','DC1');<br>mysql-DC1-1> INSERT INTO `link` VALUES ('DC1','DC2'); |
|
1 |
mysql-DC1-1> select * from link;<br> +--------------+---------------+<br> | clusterSlave | clusterMaster |<br> +--------------+---------------+<br> | DC1 | DC2 |<br> | DC2 | DC1 |<br> +--------------+---------------+<br> 2 rows in set (0.00 sec)<br> |
|
1 |
mysql-DC1-1> INSERT INTO `weight` VALUES('DC1','DC1-1',10); <br>mysql-DC1-1> INSERT INTO `weight` VALUES('DC1','DC1-2',11); <br>mysql-DC1-1> INSERT INTO `weight` VALUES('DC2','DC2-1',10); <br>mysql-DC1-1> INSERT INTO `weight` VALUES('DC2','DC2-2',11); <br> |
|
1 |
mysql-DC1-1> select * from weight;<br> +---------+----------+--------+<br> | cluster | nodename | weight |<br> +---------+----------+--------+<br> | DC1 | DC1-1 | 10 |<br> | DC1 | DC1-2 | 11 |<br> | DC2 | DC2-1 | 10 |<br> | DC2 | DC2-2 | 11 |<br> +---------+----------+--------+<br> 4 rows in set (0.00 sec)<br> |
Note – The above details will sync to all connected cluster and async nodes.
3) Finally, we have to download and set up the replication-manager script via cron so that it can do continuous monitoring on the topology and perform the failover when required.
|
1 |
shell> cd /usr/local/bin<br>shell> git clone https://github.com/percona/replication-manager.git<br>shell> cp replication-manager/replication_manager.sh .<br> |
|
1 |
[client]<br>user=root<br>password=Root@1234<br> |
E.g,
|
1 |
mysql> Create user 'rep_manager'@'localhost' identified by 'Repmgr@1234'; <br>mysql> GRANT REPLICATION_SLAVE_ADMIN,SUPER, REPLICATION CLIENT,RELOAD on *.* TO 'rep_manager'@'localhost';<br>mysql> GRANT ALL on percona.* TO 'rep_manager'@'localhost';<br> |
|
1 |
shell> sh -x /usr/local/bin/replication_manager.sh<br> |
|
1 |
crontab -l<br>* * * * * /usr/local/bin/replication_manager.sh<br> |
|
1 |
mysql> select * from percona.replication;<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>| host | weight | localIndex | isReplica | lastUpdate | lastHeartbeat | connectionName | currentSource |<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>| DC1-1 | 10 | 0 | No | 2025-12-19 13:15:02 | 2025-12-19 13:15:02 | DC1-DC2 | |<br>| DC1-2 | 11 | 1 | Yes | 2025-12-19 13:15:02 | 2025-12-19 13:15:02 | DC1-DC2 | 172.31.71.144 |<br>| DC2-1 | 10 | 0 | Yes | 2025-12-19 13:15:02 | 2025-12-19 13:15:02 | DC2-DC1 | 172.31.78.120 |<br>| DC2-2 | 11 | 1 | No | 2025-12-19 13:00:02 | 2025-12-19 13:00:02 | DC2-DC1 | 172.31.78.120 |<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>4 rows in set (0.00 sec)<br> |
|
1 |
mysql> show replica statusG;<br>*************************** 1. row ***************************<br> Replica_IO_State: Waiting for source to send event<br> Source_Host: 172.31.78.120<br> Source_User: repl<br> Source_Port: 3306<br> Connect_Retry: 60<br> Source_Log_File: binlog.000021<br> Read_Source_Log_Pos: 40240<br> Relay_Log_File: ip-172-31-75-60-relay-bin-dc2@002ddc1.000012<br> Relay_Log_Pos: 40451<br> Relay_Source_Log_File: binlog.000021<br> Replica_IO_Running: Yes<br> Replica_SQL_Running: Yes<br> |
|
1 |
mysql> show replicas;<br>+-----------+------+------+-----------+--------------------------------------+<br>| Server_Id | Host | Port | Source_Id | Replica_UUID |<br>+-----------+------+------+-----------+--------------------------------------+<br>| 101 | | 3306 | 1 | 10342271-d01f-11f0-8814-16ffc2ab7fdd |<br>+-----------+------+------+-----------+--------------------------------------+<br>1 row in set (0.00 sec)<br> |
|
1 |
mysql-DC2-1> systemctl stop mysql<br> |
|
1 |
mysql-DC2-2> select * from percona.replication;<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>| host | weight | localIndex | isReplica | lastUpdate | lastHeartbeat | connectionName | currentSource |<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>| DC2-1 | 10 | 0 | No | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1 | 172.31.78.120 |<br>| DC2-2 | 11 | 0 | Proposed | 2025-12-19 15:16:01 | 2025-12-19 15:16:01 | DC2-DC1 | |<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>2 rows in set (0.00 sec)<br> |
|
1 |
mysql-DC2-2> select * from percona.replication;<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>| host | weight | localIndex | isReplica | lastUpdate | lastHeartbeat | connectionName | currentSource |<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>| DC2-1 | 10 | 0 | No | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1 | 172.31.78.120 |<br>| DC2-2 | 11 | 0 | Yes | 2025-12-19 16:45:02 | 2025-12-19 16:45:02 | DC2-DC1 | 172.31.78.120 |<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>2 rows in set (0.00 sec)<br> |
|
1 |
mysql-DC2-2> show replica statusG;<br>*************************** 1. row ***************************<br> Replica_IO_State: Waiting for source to send event<br> Source_Host: 172.31.78.120<br> Source_User: repl<br> Source_Port: 3306<br> Connect_Retry: 60<br> Source_Log_File: binlog.000001<br> Read_Source_Log_Pos: 69068<br> Relay_Log_File: ip-172-31-71-144-relay-bin-dc2@002ddc1.000002<br> Relay_Log_Pos: 68553<br> Relay_Source_Log_File: binlog.000001<br> Replica_IO_Running: Yes<br> Replica_SQL_Running: Yes<br> |
|
1 |
mysql-DC1-1> show replicas;<br>+-----------+------+------+-----------+--------------------------------------+<br>| Server_Id | Host | Port | Source_Id | Replica_UUID |<br>+-----------+------+------+-----------+--------------------------------------+<br>| 10 | | 3306 | 1 | 2cdc0f10-cc93-11f0-9fc9-16ffee718857 |<br>+-----------+------+------+-----------+--------------------------------------+<br>1 row in set (0.00 sec)<br> |
The replication manager log file – /tmp/replication_manager.log would also show information regarding changing replica nodes and updating replication tables.
|
1 |
+ grep -c ERROR /tmp/mysql_error<br>+ '[' 0 -gt 0 ']'<br>+ '[' a172.31.78.120 == a ']'<br>+ '[' aYes == a ']'<br>+ '[' Yes == Yes ']'<br>+ [[ Yes == Yes ]]<br>+ [[ Yes == Yes ]]<br>+ /usr/bin/mysql --connect_timeout=10 -B -e '<br> update percona.replication<br> set isReplica='''Yes''', localIndex=0, lastHeartbeat=now(),currentSource='''172.31.78.120'''<br> where connectionName = '''DC2-DC1'''<br> and host = '''DC2-2''''<br>++ /usr/bin/mysql --connect_timeout=10 -B -BN -e 'select count(*) from percona.replication where isReplica = '''Yes''' and connectionName = '''DC2-DC1''' and unix_timestamp(lastHeartbeat) > unix_timestamp() - 179'<br>+ slaveCount=1<br>+ '[' 1 -gt 1 ']'<br> |
|
1 |
mysql-DC2-2> select * from percona.replication;<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>| host | weight | localIndex | isReplica | lastUpdate | lastHeartbeat | connectionName | currentSource |<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>| DC2-1 | 10 | 0 | No | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1 | 172.31.78.120 |<br>| DC2-2 | 11 | 0 | Yes | 2025-12-19 17:07:02 | 2025-12-19 17:07:02 | DC2-DC1 | 172.31.78.120 |<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>2 rows in set (0.00 sec)<br> |
|
1 |
mysql-DC2-2> show replica statusG;<br>*************************** 1. row ***************************<br> Replica_IO_State: Waiting for source to send event<br> Source_Host: 172.31.78.120<br> Source_User: repl<br> Source_Port: 3306<br> Connect_Retry: 60<br> Source_Log_File: binlog.000001<br> Read_Source_Log_Pos: 88214<br> Relay_Log_File: ip-172-31-71-144-relay-bin-dc2@002ddc1.000002<br> Relay_Log_Pos: 87699<br> Relay_Source_Log_File: binlog.000001<br> Replica_IO_Running: Yes<br> Replica_SQL_Running: Yes<br> |
|
1 |
mysql-DC1-1> systemctl stop mysql |
|
1 |
mysql-DC2-2> select * from percona.replication;<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>| host | weight | localIndex | isReplica | lastUpdate | lastHeartbeat | connectionName | currentSource |<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>| DC2-1 | 10 | 0 | No | 2025-12-19 14:06:02 | 2025-12-19 13:00:02 | DC2-DC1 | 172.31.78.120 |<br>| DC2-2 | 11 | 0 | Yes | 2025-12-19 17:17:02 | 2025-12-19 17:17:02 | DC2-DC1 | 172.31.70.222 |<br>+-------+--------+------------+-----------+---------------------+---------------------+----------------+---------------+<br>2 rows in set (0.00 sec)<br> |
|
1 |
mysql-DC2-2> show replica statusG;<br>*************************** 1. row ***************************<br> Replica_IO_State: Waiting for source to send event<br> Source_Host: 172.31.70.222<br> Source_User: repl<br> Source_Port: 3306<br> Connect_Retry: 60<br> Source_Log_File: binlog.000002<br> Read_Source_Log_Pos: 83664<br> Relay_Log_File: ip-172-31-71-144-relay-bin-dc2@002ddc1.000004<br> Relay_Log_Pos: 83875<br> Relay_Source_Log_File: binlog.000002<br> Replica_IO_Running: Yes<br> Replica_SQL_Running: Yes<br> |
We discussed some use cases of the PXC replication manager script, which is capable of handling both source and replica failovers within PXC/MariaDB-based Galera clusters. The script has certain limitations as it does not work with standard asynchronous replicas. It is applicable only when the nodes are part of a PXC cluster.
The script is also capable of handling more complex topologies, such as multi-source replication, where a single cluster synchronizes data from multiple sources using multiple replication channels. We may explore these scenarios in some separate blog posts.
Resources
RELATED POSTS