In this article, we will demonstrate how to achieve asynchronous replication automatic source failover when our replica is a Percona Server for MySQL (PS) and the source is a Percona XtraDB Cluster (PXC) cluster, using virtual IP (VIP) managed by Keepalived.
Let us consider our architecture below with async replication from PXC to Percona Server for MySQL:
|
1 2 3 4 5 |
PXC PS-MySQL ============== ============== node1 +----> node4 node2 | node3 -----+ |
Our goal is to set node3 as the async replication primary source. Then, upon failure, VIP can move to node2 and then node1 when node2 also fails. Once node3 becomes available again, the virtual IP should come back to node3.
|
1 2 3 4 5 6 7 8 |
| MySQL Hostname | IP | Node | Remarks | server_id ---------+------------------+---------+----------|----- el9-171 | 192.168.122.171 | node1 | PXC | 171 el9-172 | 192.168.122.172 | node2 | PXC | 172 el9-173 | 192.168.122.173 | node3 | PXC | 173 el9-174 | 192.168.122.174 | node4 | PS | 174 | 192.168.122.170 | | VIP-PXC | |
1) Edit then verify the priority with the highest to lowest value from node3, node2, node1:
|
1 2 3 4 5 6 7 8 |
node1> egrep 'priority' /etc/keepalived/keepalived.conf priority 101 node2> egrep 'priority' /etc/keepalived/keepalived.conf priority 102 node3> egrep 'priority' /etc/keepalived/keepalived.conf priority 103 |
Sample configuration from node1:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
node1> cat /etc/keepalived/keepalived.conf vrrp_script chk_pxc { script "/usr/bin/clustercheck" interval 1 user mysql } vrrp_instance PXC_as_async_master { state MASTER interface ens2 virtual_router_id 51 priority 101 preempt virtual_ipaddress { 192.168.122.170 } track_script { chk_pxc } } |
2) Restart Keepalived on all nodes and check the status.
|
1 |
shell> systemctl restart keepalived |
Check:
|
1 2 3 4 5 6 7 8 9 10 11 12 |
node1> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:23:01 el9-171 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE node2> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:23:03 el9-172 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE node3> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:23:03 el9-173 Keepalived_vrrp[963]: (PXC_as_async_master) Entering MASTER STATE node3> ip addr show dev ens2 | grep 'inet ' inet 192.168.122.173/24 brd 192.168.122.255 scope global noprefixroute ens2 inet 192.168.122.170/32 scope global ens2 |
We can see that node3 is the PRIMARY with VIP while the rest are in the BACKUP state.
3) Simulate a running write application.
|
1 |
node1> (set -e; while true; do mysql db1 -e"INSERT INTO t(s) VALUES('a');" ; sleep 5; done) |
4) Check the replica’s current state using the below bash function:
|
1 2 3 |
node4> f_replica_status(){ mysql -e'SHOW REPLICA STATUSG' | sed '/,$/N;s/n/ /' | egrep 'Source_Host|Source_Log_File|Replica_IO_Running|Replica_SQL_Running|Last_IO_Error:|Last_SQL_Error:|Auto_Position|_Gtid_Set|Source_Server_Id|Source_UUID|Seconds_Behind_Source|Exec_Source_Log_Pos|Read_Source_Log_Pos' } |
Check replica status:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
node4> f_replica_status Source_Host: 192.168.122.173 Source_Log_File: s173.000036 Read_Source_Log_Pos: 102117 Relay_Source_Log_File: s173.000036 Replica_IO_Running: Yes Replica_SQL_Running: Yes Exec_Source_Log_Pos: 102117 Seconds_Behind_Source: 0 Last_IO_Error: Last_SQL_Error: Source_Server_Id: 173 Source_UUID: f81e1079-7f00-11ee-849e-525400146f98 Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53752-53859 Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53859, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222 Auto_Position: 1 |
5) Reconfigure the replica to use the VIP(192.168.122.170):
|
1 2 3 |
node4-mysql> STOP REPLICA; node4-mysql> CHANGE REPLICATION SOURCE TO SOURCE_HOST = '192.168.122.170'; node4-mysql> START REPLICA; |
6) Verify the replica. Source_Host is now set to VIP:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
node4> f_replica_status Source_Host: 192.168.122.170 Source_Log_File: s173.000036 Read_Source_Log_Pos: 107494 Relay_Source_Log_File: s173.000036 Replica_IO_Running: Yes Replica_SQL_Running: Yes Exec_Source_Log_Pos: 107494 Seconds_Behind_Source: 0 Last_IO_Error: Last_SQL_Error: Source_Server_Id: 173 Source_UUID: f81e1079-7f00-11ee-849e-525400146f98 Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53875-53878 Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53878, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222 Auto_Position: 1 |
7) Test failover
7.1) Stop mysqld
|
1 |
node3> mysqladmin shutdown |
7.2) Check the Keepalive state:
|
1 2 3 4 5 6 7 8 |
node1> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:23:01 el9-171 Keepalived_vrrp[966]: (PXC_as_async_master) Entering BACKUP STATE node2> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:35:03 el9-172 Keepalived_vrrp[966]: (PXC_as_async_master) Entering MASTER STATE node3> journalctl -u keepalived.service -b | egrep ' systemd[1]: Starting | systemd[1]: Stopping | Entering | VRRP_Script' | tail -1 Nov 13 03:35:02 el9-173 Keepalived_vrrp[963]: (PXC_as_async_master) Entering FAULT STATE |
Node2 is the new PRIMARY while node3 changed to FAULT state.
7.3) Check the VIP transferred to node2:
|
1 2 3 |
node2> ip addr show dev ens2 | grep 'inet ' inet 192.168.122.172/24 brd 192.168.122.255 scope global noprefixroute ens2 inet 192.168.122.170/32 scope global ens2 |
7.4) Check the replica. You may have to wait for at least the value of SOURCE_CONNECT_RETRY, which is, by default, 60 seconds.
The Source_Server_Id, Source_Log_File, Relay_Source_Log_File, and Source_UUID have changed.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
node4> f_replica_status Source_Host: 192.168.122.170 Source_Log_File: s172.000033 Read_Source_Log_Pos: 198793 Relay_Source_Log_File: s172.000033 Replica_IO_Running: Yes Replica_SQL_Running: Yes Exec_Source_Log_Pos: 198793 Seconds_Behind_Source: 0 Last_IO_Error: Last_SQL_Error: Source_Server_Id: 172 Source_UUID: 6534a3eb-77d3-11ee-9870-52540028fd18 Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates Retrieved_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:53875-53973 Executed_Gtid_Set: 3d67d519-77d3-11ee-bcf4-8f28664ab56b:1-53973, c2982ae6-882c-ee11-430b-70d799b54a94:1-2222 Auto_Position: 1 |
The replica now has a successful failover to the new source (node2).
When node3 comes back online again, the VIP will also move back to node3 since it has the highest Keepalived priority, making it the asynchronous replication source again.
Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!
&
Resources
RELATED POSTS
Clever! Great article, (haven’t test it yet). It never occurred to me to use VIP and Keepalived as a solution for this setup. Eager to give it a try soon!