As you probably know MySQL Replication (statement based) works by fetching statements from MASTERs binary log and executing them on the SLAVE. Since MySQL 4.0 this process is a bit more involved having events passing via relay logs on the Slave which also means there are two replication threads “IO Thread” and “SQL Thread” used in the process but idea remains the same.
For replication to work properly you need to have proper “snapshot” – consistent state of the database as it was on master and corresponding position in masters binary logs. There are multiple ways to get the “snapshot” – shutting down MySQL Server and copying data, using LVM, Using Innodb Hot Backup Tool, Using another Slave, using backup image etc but in all the cases you have to be sure the snapshot you’re dealing with corresponds to correct binary log position.
There are two sources of information for binary log position SHOW MASTER STATUS – this shows position as master writes its own binary and is helpful when you for example take backup on the Master to bring up first slave. This command output is pretty simple and leaves no room for mistakes:
|
1 |
<br>mysql> show master status G<br>*************************** 1. row ***************************<br> File: db04-bin.000068<br> Position: 98<br> Binlog_Do_DB:<br>Binlog_Ignore_DB:<br>1 row in set (0.00 sec)<br> |
If you’re backing up from the Slave or you’re cloning the Slave to make another one you need to use SHOW SLAVE STATUS which provides much more information.
|
1 |
<br>mysql> show slave statusG<br>*************************** 1. row ***************************<br> Slave_IO_State: Waiting for master to send event<br> Master_Host: 192.168.10.171<br> Master_User: replicant<br> Master_Port: 3306<br> Connect_Retry: 60<br> Master_Log_File: db01-bin.003814<br> Read_Master_Log_Pos: 744615652<br> Relay_Log_File: mysqld-relay-bin.000953<br> Relay_Log_Pos: 744615788<br> Relay_Master_Log_File: db01-bin.003814<br> Slave_IO_Running: Yes<br> Slave_SQL_Running: Yes<br> Replicate_Do_DB:<br> Replicate_Ignore_DB:<br> Replicate_Do_Table:<br> Replicate_Ignore_Table:<br> Replicate_Wild_Do_Table:<br>Replicate_Wild_Ignore_Table:<br> Last_Errno: 0<br> Last_Error:<br> Skip_Counter: 0<br> Exec_Master_Log_Pos: 744615652<br> Relay_Log_Space: 744615788<br> Until_Condition: None<br> Until_Log_File:<br> Until_Log_Pos: 0<br> Master_SSL_Allowed: No<br> Master_SSL_CA_File:<br> Master_SSL_CA_Path:<br> Master_SSL_Cert:<br> Master_SSL_Cipher:<br> Master_SSL_Key:<br> Seconds_Behind_Master: 0<br>1 row in set (0.00 sec)<br> |
As you can see in this output there are 3 log file names and log file positions which can get confusing, and indeed I often see people using wrong values to restore the replication using CHANGE MASTER TO statement.
It is very important to understand the difference between IO Thread position in Master logs (Master_Log_File:Read_Master_Log_Pos) and SQL Thread position in Master logs (Relay_Master_Log_File:Exec_Master_Log_Pos).
When you’re using CHANGE MASTER TO to set start position for the slave you’re specifying position for SQL thread and so you should use Relay_Master_Log_File:Exec_Master_Log_Pos. Otherwise you’re going to ruin your replication.
It does not help SHOW MASTER STATUS has Master_Log_File value and CHANGE MASTER TO accepts Master_Log_File parameter, while you actually want to use Relay_Master_Log_File instead. This naming was left from pre MySQL 4.0 era and it still causes confusion so many years later.
The mistake of using Master_Log_File is in fact VERY typical. I’ve seen multiple “internal instructions” for cloning the slave or slave recovery from backup using this value.
Infact Master_Log_File will work in most cases as well – you can note in the output about Master_Log_File and Relay_Master_Log_File are the same, and it will be the case in vast majority of cases.
If your replication does not fall behind the SQL Thread and IO thread will be running close, quite often being at same master position at all. Even if replication gets a bit behind in most cases you would still see SQL thread and IO threads close enough to share same master log file. Only in rare cases of switching master binary log file or replication being delayed very significantly you will see different log files for these variables which makes it very easy to make mistake and very hard to catch.