In this article, we will demonstrate how to achieve asynchronous replication automatic source failover when our replica is a Percona Server for MySQL (PS) and the source is a Percona XtraDB Cluster (PXC) cluster, using virtual IP (VIP) managed by Keepalived.
Let us consider our architecture below with async replication from PXC to Percona Server for MySQL:
|
1 |
PXC PS-MySQL<br>============== ==============<br>node1 +----> node4<br>node2 | <br>node3 -----+<br> |
Our goal is to set node3 as the async replication primary source. Then, upon failure, VIP can move to node2 and then node1 when node2 also fails. Once node3 becomes available again, the virtual IP should come back to node3.
|
1 |
| MySQL<br>Hostname | IP | Node | Remarks | server_id<br>---------+------------------+---------+----------|-----<br>el9-171 | 192.168.122.171 | node1 | PXC | 171<br>el9-172 | 192.168.122.172 | node2 | PXC | 172<br>el9-173 | 192.168.122.173 | node3 | PXC | 173<br>el9-174 | 192.168.122.174 | node4 | PS | 174<br> | 192.168.122.170 | | VIP-PXC | |
1) Edit then verify the priority with the highest to lowest value from node3, node2, node1:
|
1 |
node1> egrep 'priority' /etc/keepalived/keepalived.conf<br> priority 101<br><br>node2> egrep 'priority' /etc/keepalived/keepalived.conf<br> priority 102<br><br>node3> egrep 'priority' /etc/keepalived/keepalived.conf<br>priority 103<br> |
Sample configuration from node1:
|
1 |
node1> cat /etc/keepalived/keepalived.conf<br>vrrp_script chk_pxc {<br> script "/usr/bin/clustercheck"<br> interval 1<br> user mysql<br>}<br>vrrp_instance PXC_as_async_master {<br> state MASTER<br> interface ens2<br> virtual_router_id 51<br> priority 101<br> preempt<br> virtual_ipaddress {<br> 192.168.122.170<br> }<br> <br> track_script {<br> chk_pxc<br> }<br>} |
2) Restart Keepalived on all nodes and check the status.
|
1 |
shell> systemctl restart keepalived |
Check:
|
1 |
node1> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1<br>Nov 13 03:23:01 el9-171 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE<br><br>node2> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1<br>Nov 13 03:23:03 el9-172 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE<br><br>node3> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1<br>Nov 13 03:23:03 el9-173 Keepalived_vrrp[963]: (PXC_as_async_master) Entering MASTER STATE<br><br>node3> ip addr show dev ens2 | grep 'inet '<br>inet 192.168.122.173/24 brd 192.168.122.255 scope global noprefixroute ens2<br>inet 192.168.122.170/32 scope global ens2<br> |
We can see that node3 is the PRIMARY with VIP while the rest are in the BACKUP state.
3) Simulate a running write application.
|
1 |
node1> (set -e; while true; do mysql db1 -e"INSERT INTO t(s) VALUES('a');" ; sleep 5; done)<br> |
4) Check the replica’s current state using the below bash function:
|
1 |
node4> f_replica_status(){<br>mysql -e'SHOW REPLICA STATUSG' | sed '/,$/N;s/n/ /' | egrep 'Source_Host|Source_Log_File|Replica_IO_Running|Replica_SQL_Running|Last_IO_Error:|Last_SQL_Error:|Auto_Position|_Gtid_Set|Source_Server_Id|Source_UUID|Seconds_Behind_Source|Exec_Source_Log_Pos|Read_Source_Log_Pos'<br>}<br> |
Check replica status:
|
1 |
node4> f_replica_status<br> Source_Host: 192.168.122.173<br> Source_Log_File: s173.000036<br> Read_Source_Log_Pos: 102117<br> Relay_Source_Log_File: s173.000036<br> Replica_IO_Running: Yes<br> Replica_SQL_Running: Yes<br> Exec_Source_Log_Pos: 102117<br> Seconds_Behind_Source: 0<br> Last_IO_Error:<br> Last_SQL_Error:<br> Source_Server_Id: 173<br> Source_UUID: f81e1079-7f00-11ee-849e-525400146f98<br> Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates<br> Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53752-53859<br> Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53859, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222<br> Auto_Position: 1<br> |
5) Reconfigure the replica to use the VIP(192.168.122.170):
|
1 |
node4-mysql> STOP REPLICA;<br>node4-mysql> CHANGE REPLICATION SOURCE TO SOURCE_HOST = '192.168.122.170';<br>node4-mysql> START REPLICA;<br><br> |
6) Verify the replica. Source_Host is now set to VIP:
|
1 |
node4> f_replica_status<br> Source_Host: 192.168.122.170<br> Source_Log_File: s173.000036<br> Read_Source_Log_Pos: 107494<br> Relay_Source_Log_File: s173.000036<br> Replica_IO_Running: Yes<br> Replica_SQL_Running: Yes<br> Exec_Source_Log_Pos: 107494<br> Seconds_Behind_Source: 0<br> Last_IO_Error:<br> Last_SQL_Error:<br> Source_Server_Id: 173<br> Source_UUID: f81e1079-7f00-11ee-849e-525400146f98<br> Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates<br> Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53875-53878<br> Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53878, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222<br> Auto_Position: 1<br> |
7) Test failover
7.1) Stop mysqld
|
1 |
node3> mysqladmin shutdown |
7.2) Check the Keepalive state:
|
1 |
node1> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1<br>Nov 13 03:23:01 el9-171 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE<br><br>node2> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1<br>Nov 13 03:35:03 el9-172 Keepalived_vrrp[966]: (PXC_as_async_master) Entering MASTER STATE<br><br>node3> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1<br>Nov 13 03:35:02 el9-173 Keepalived_vrrp[963]: (PXC_as_async_master) Entering FAULT STATE<br> |
Node2 is the new PRIMARY while node3 changed to FAULT state.
7.3) Check the VIP transferred to node2:
|
1 |
node2> ip addr show dev ens2 | grep 'inet '<br>inet 192.168.122.172/24 brd 192.168.122.255 scope global noprefixroute ens2<br>inet 192.168.122.170/32 scope global ens2 |
7.4) Check the replica. You may have to wait for at least the value of SOURCE_CONNECT_RETRY, which is, by default, 60 seconds.
The Source_Server_Id, Source_Log_File, Relay_Source_Log_File, and Source_UUID have changed.
|
1 |
node4> f_replica_status<br> Source_Host: 192.168.122.170<br> Source_Log_File: s172.000033<br> Read_Source_Log_Pos: 198793<br> Relay_Source_Log_File: s172.000033<br> Replica_IO_Running: Yes<br> Replica_SQL_Running: Yes<br> Exec_Source_Log_Pos: 198793<br> Seconds_Behind_Source: 0<br> Last_IO_Error: <br> Last_SQL_Error: <br> Source_Server_Id: 172<br> Source_UUID: 6534a3eb-77d3-11ee-9870-52540028fd18<br> Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates<br> Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53875-53973<br> Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53973, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222<br> Auto_Position: 1 |
The replica now has a successful failover to the new source (node2).
When node3 comes back online again, the VIP will also move back to node3 since it has the highest Keepalived priority, making it the asynchronous replication source again.
Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!
Resources
RELATED POSTS