Here’s a step-by-step guide on how to invert roles for master and slave so you can perform a master server upgrade, and then switch roles back to the original setup.
* While following this guide consider server-A as your original master and server-B as your original slave. We will assume server-B already produces binlogs and that both nodes have log-slave-updates=1 in my.cnf
Check this following link for more details on log-slave-updates: https://dev.mysql.com/doc/refman/5.6/en/replication-options-slave.html#option_mysqld_log-slave-updates
1. Prepare the original Slave to invert roles.
Make sure the slave is not in read-only mode. This should be set to 0, if not change it:
1 2 |
server-B> SELECT @@global.read_only; server-B> SET GLOBAL read_only=0 |
1 |
server-B> SET GLOBAL read_only=0 |
* For critical service, you might also want to make sure the slave is actually a perfect replica by running a checksum of the tables with pt-table-checksum.
Check SHOW SLAVE STATUS \G on server-B until you see Seconds_Behind_Master is zero
1 |
server-B> SHOW SLAVE STATUS \G |
When server-B is fully caught up, then issue FLUSH TABLES WITH READ LOCK; in a mysql client prompt in server-A
At this point your users will suffer a service request interruption and downtime starts ticking.
* DO NOT CLOSE this mysql client; otherwise the lock will be lost. We’ll use this same session to run the rest of the commands in server-A.
1 |
server-A> FLUSH TABLES WITH READ LOCK; |
Get master coordinates:
1 |
server-A> SHOW MASTER STATUS; |
Run
SHOW SLAVE STATUS \G in server-B
Repeat until
Relay_Master_Log_File and
Exec_Master_Log_Pos matches file and position from the previous step.
1 |
server-B> SHOW SLAVE STATUS \G |
Check if
SHOW MASTER STATUS; is not changing on server-B (to ensure that there are no queries local to server-B), then stop the slave
1 2 3 |
server-B> SHOW MASTER STATUS; server-B> STOP SLAVE; server-B> RESET SLAVE ALL; |
Run
SHOW MASTER STATUS \G in server-B and save this information in a safe place.
Also make sure binlogs from that position and onwards are kept until you bring server-A back online. Otherwise you’ll need to rebuild server-A.
1 |
server-B> SHOW MASTER STATUS \G |
2. Reverse roles for Master – Slave and upgrade original Master
Direct traffic to server-B (point VIP to server-B, or through DNS -not really recommended-, or by updating the configuration files of API services).
In a separate shell session in server-A, stop mysql
Now server-A can be shut down and serviced
* At this point you’re safe to upgrade server-A.
* After you’ve finished upgrading your server you can continue this guide to put server-A back as master.
Start MySQL in server-A (with skip-slave-start!) and run
RESET SLAVE ALL;
1 |
server-A> RESET SLAVE ALL; |
Reconfigure slave in A with
CHANGE MASTER TO ... and start it
Make sure to put the same
MASTER_LOG_FILE and
MASTER_LOG_POS values as you previously got from running
SHOW MASTER STATUS \G on server-B.
1 2 3 |
server-A> CHANGE MASTER TO MASTER_HOST='<IP-of-server-B>', MASTER_USER='<your-replication-user>', MASTER_PASSWORD='<replication-user-password>', MASTER_LOG_FILE='<file-from-serverB>', MASTER_LOG_POS=<pos-from-serverB->; server-A> SET GLOBAL read_only=1; server-A> START SLAVE; |
Make sure replication is running
Repeat until you see Seconds_Behind_Master is 0.
1 |
server-A> SHOW SLAVE STATUS \G |
* Here is the moment where the roles are fully inverted, and A is an up-to-date slave of B.
3. Prepare the original Master to set back the original roles.
Prepare
When server-A is fully caught up, then issue FLUSH TABLES WITH READ LOCK; in a mysql client prompt in server-B.
DO NOT CLOSE this mysql client on server-B; otherwise the lock will be lost. We’ll use this same session to run the rest of the commands in server-B.
1 |
server-B> FLUSH TABLES WITH READ LOCK; |
* DO NOT CLOSE THIS CLIENT SESSION! *
Check master status
1 |
server-B> SHOW MASTER STATUS; |
Run
SHOW SLAVE STATUS \G in server-A. Repeat until
Relay_Master_Log_File and
Exec_Master_Log_Pos matches the file and position from
SHOW MASTER STATUS;
1 |
server-A> SHOW SLAVE STATUS \G |
Get master coordinates:
1 2 |
server-A> SHOW MASTER STATUS \G server-A> SET GLOBAL read_only=0; |
4. Set back the original roles for Master – Slave
Direct traffic to server-A (point VIP to server-A, or through DNS -not really recommended-, or by updating the configuration files of API services).
In a separate shell session in server-B, restart mysql.
Make server-B again a slave of server-A:
1 2 3 4 |
server-B> CHANGE MASTER TO MASTER_HOST='<IP-of-server-A>', MASTER_USER='<your-replication-user>', MASTER_PASSWORD='<replication-user-password>', MASTER_LOG_FILE='<file-from-server-A>, MASTER_LOG_POS=<pos-from-server-A>; server-B> SET GLOBAL read_only=1; server-B> START SLAVE; server-B> SHOW SLAVE STATUS \G |
Stop the slave threads in server-A:
1 2 |
server-A> STOP SLAVE; server-A> RESET SLAVE ALL; |
* If you have a chained replication setup you should take into consideration the log_slave_updates variable, especially as it is used in this case and should be enabled.
i.e.: In a replication chain A > B > C, for C to receive updates from A, B will need to log them to the binary logs and that is what this option does.
For more information regarding this variable, please check the following link:
https://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_log_slave_updates
All those steps are being executed manually, which means long downtime (if no mistakes happened)!
and the process will be even more complicate if you have many slaves for that master.
MHA can simplify that process with less manual processes and less downtime (from 0.5 to 3 seconds in the online master switching).
Check this blog for more details on how to use MHA to perform the manual maintenance on the master:
http://mysqlstepbystep.com/2015/06/01/mysql-high-available-with-mha-2/
When you restart the MySQL server process on server-A, you should ensure “read-only = 1” from the very beginning.
OTOH, I don’t see the reason for “skip-slave-start”:
– If server-A was configured to replicate from server-B (aka “master-master” setup), that should still be valid, and there is no need to use “change master”.
– If server-A was not configured to replicate from anywhere, “skip-slave-start” will not change anything.
– If server-A was configured to replicate from somewhere else, all the description is incomplete anyways.
So IMO on server-A the “read-only = 1” is both necessary and sufficient at restart.