Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Database Daily Ops Series: GTID Replication

November 10, 2016

Author

Share this Post:

This post discusses ways of fixing broken GTID replication.

This blog series is all about the daily stories we have in Managed Services, dealing with customers’ environments (mostly when we need to quickly restore a service level within the SLA time).

One of the issues we encounter daily is replication using the GTID protocol. While there are a lot of blogs written about this subject, I would like to just highlight GTID replication operations, and the way you can deal with broken replication.

Most of the time we face way more complex scenarios then the one I’m about to present as an example, but the main goal of this blog is to quickly highlight the tools that can be used to fix issues to resume replication.

After reading this blog, you might ask yourself “Now, we know how to fix replication, but what about consistency?” The next blog will be entirely focused on that matter, data consistency!

Little less talk, little more action…

Replication is broken, and the SHOW SLAVE STATUS command output looks like below:

mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.12
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.000005
          Read_Master_Log_Pos: 879
               Relay_Log_File: mysqld-relay-bin.000009
                Relay_Log_Pos: 736
        Relay_Master_Log_File: mysqld-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1062
                   Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 634
              Relay_Log_Space: 1155
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1062
               Last_SQL_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 3
                  Master_UUID: 46fdb7ad-5852-11e6-92c9-0800274fb806
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 161108 16:47:53
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-4,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-3,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1
                Auto_Position: 1
1 row in set (0.00 sec)

mysql> show slave statusG

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host: 192.168.0.12

Master_User: repl

Master_Port: 3306

Connect_Retry: 60

Master_Log_File: mysqld-bin.000005

Read_Master_Log_Pos: 879

Relay_Log_File: mysqld-relay-bin.000009

Relay_Log_Pos: 736

Relay_Master_Log_File: mysqld-bin.000005

Slave_IO_Running: Yes

Slave_SQL_Running: No

Replicate_Do_DB:

Replicate_Ignore_DB:

Replicate_Do_Table:

Replicate_Ignore_Table:

Replicate_Wild_Do_Table:

Replicate_Wild_Ignore_Table:

Last_Errno: 1062

Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'

Skip_Counter: 0

Exec_Master_Log_Pos: 634

Relay_Log_Space: 1155

Until_Condition: None

Until_Log_File:

Until_Log_Pos: 0

Master_SSL_Allowed: No

Master_SSL_CA_File:

Master_SSL_CA_Path:

Master_SSL_Cert:

Master_SSL_Cipher:

Master_SSL_Key:

Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 1062

Last_SQL_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'

Replicate_Ignore_Server_Ids:

Master_Server_Id: 3

Master_UUID: 46fdb7ad-5852-11e6-92c9-0800274fb806

Master_Info_File: mysql.slave_master_info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State:

Master_Retry_Count: 86400

Master_Bind:

Last_IO_Error_Timestamp:

Last_SQL_Error_Timestamp: 161108 16:47:53

Master_SSL_Crl:

Master_SSL_Crlpath:

Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-4,

4fbe2d57-5843-11e6-9268-0800274fb806:1-3

Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-3,

4fbe2d57-5843-11e6-9268-0800274fb806:1-3,

81a567a8-5852-11e6-92cb-0800274fb806:1

Auto_Position: 1

1 row in set (0.00 sec)

When a slave configured to replicate using the GTID protocol breaks, pay attention to the SHOW SLAVE STATUS command output. You will find the Retrieved_Gtid_Set and Executed_Gtid_Set in the listed columns. You can see that the last global transaction ID retrieved from the current master was not executed (it’s going to appear on the Retrieved_Gtid_Set but not on the Executed_Gtid_Set, following the GTID format).

That means that the slave has retrieved a transaction that, for some other reason, it couldn’t execute. That’s the global transaction ID you need if you want to inject a fake transaction and get replication resumed. The fake transaction you inject takes the place of the one that has an SQL that cannot be executed due to an error found in Last_Error from the SHOW SLAVE STATUS.

Let’s analyze it:

#: replication is broken due to error 1062, when the primary key of a particular table is violated
Last_Errno: 1062
Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'
 
#: you can identify what is the global transaction id with problems, so, getting the replication streaming broken
           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-4,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-3,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1

#: replication is broken due to error 1062, when the primary key of a particular table is violated

Last_Errno: 1062

Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: ''. Query: 'insert into wb.t1 set i=1'

#: you can identify what is the global transaction id with problems, so, getting the replication streaming broken

Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-4,

4fbe2d57-5843-11e6-9268-0800274fb806:1-3

Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-3,

4fbe2d57-5843-11e6-9268-0800274fb806:1-3,

81a567a8-5852-11e6-92cb-0800274fb806:1

As shown above, we can see clearly that the transaction causing issues is global transaction ID number 4, coming from master under UUID 46fdb7ad-5852-11e6-92c9-0800274fb806. You can use the SHOW RELAYLOG EVENTS to try and verify that is the transaction’s query that is causing problems:

mysql> show relaylog events in 'mysqld-relay-bin.000009' from 736G
*************************** 1. row ***************************
   Log_name: mysqld-relay-bin.000009
        Pos: 736
 Event_type: Gtid
  Server_id: 3
End_log_pos: 682
       Info: SET @@SESSION.GTID_NEXT= '46fdb7ad-5852-11e6-92c9-0800274fb806:4'
*************************** 2. row ***************************
   Log_name: mysqld-relay-bin.000009
        Pos: 784
 Event_type: Query
  Server_id: 3
End_log_pos: 755
       Info: BEGIN
*************************** 3. row ***************************
   Log_name: mysqld-relay-bin.000009
        Pos: 857
 Event_type: Query
  Server_id: 3
End_log_pos: 848
       Info: insert into wb.t1 set i=1
*************************** 4. row ***************************
   Log_name: mysqld-relay-bin.000009
        Pos: 950
 Event_type: Xid
  Server_id: 3
End_log_pos: 879
       Info: COMMIT /* xid=66 */
4 rows in set (0.00 sec)

mysql> show relaylog events in 'mysqld-relay-bin.000009' from 736G

*************************** 1. row ***************************

Log_name: mysqld-relay-bin.000009

Pos: 736

Event_type: Gtid

Server_id: 3

End_log_pos: 682

Info: SET @@SESSION.GTID_NEXT= '46fdb7ad-5852-11e6-92c9-0800274fb806:4'

*************************** 2. row ***************************

Log_name: mysqld-relay-bin.000009

Pos: 784

Event_type: Query

Server_id: 3

End_log_pos: 755

Info: BEGIN

*************************** 3. row ***************************

Log_name: mysqld-relay-bin.000009

Pos: 857

Event_type: Query

Server_id: 3

End_log_pos: 848

Info: insert into wb.t1 set i=1

*************************** 4. row ***************************

Log_name: mysqld-relay-bin.000009

Pos: 950

Event_type: Xid

Server_id: 3

End_log_pos: 879

Info: COMMIT /* xid=66 */

4 rows in set (0.00 sec)

Before fixing and resuming the replication stream, we need to check why that INSERT query breaks replication. Let’s SELECT data and check the structure of table wb.t1:

mysql> select * from wb.t1;
+---+
| i |
+---+
| 1 |
+---+
1 row in set (0.00 sec)
 
mysql> show create table wb.t1;
+-------+-----------------------------------------------------+
| Table | Create Table                                                                                                       |
+-------+-----------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `i` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`i`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-----------------------------------------------------+
1 row in set (0.01 sec)

mysql> select * from wb.t1;

+---+

| i |

+---+

| 1 |

+---+

1 row in set (0.00 sec)

mysql> show create table wb.t1;

+-------+-----------------------------------------------------+

| Table | Create Table |

+-------+-----------------------------------------------------+

| t1 | CREATE TABLE `t1` (

`i` int(11) NOT NULL DEFAULT '0',

PRIMARY KEY (`i`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

+-------+-----------------------------------------------------+

1 row in set (0.01 sec)

It’s clear that something is wrong somewhere else than just the database. It’s time to fix and resume replication, and catch up with the master data. GTID replication data uses the below tools:

- pt-slave-restart

- mysqlslavetrx

- inject a fake/empty transaction

pt-slave-restart

One of the easiest ways of resuming replication on slaves when replication is broken is using pt-slave-restart, which is part of Percona Toolkit. Once you find the above facts (mainly the master UUID of the problematic global transaction ID that broke slave replication), you can move forward using pt-slave-restartwith the GTID flag —-master-uuid. This passes the master’s UUID and it skips all global transactions breaking replication on a specific slave server, as you can see below:

[root@dbops02 ~]# pt-slave-restart --master-uuid 46fdb7ad-5852-11e6-92c9-0800274fb806 --host=localhost -u root
2016-11-08T17:24:09 h=localhost,u=root mysqld-relay-bin.000009         736 1062
2016-11-08T17:24:25 h=localhost,u=root mysqld-relay-bin.000010         491 1062
2016-11-08T17:24:34 h=localhost,u=root mysqld-relay-bin.000010         736 1062
2016-11-08T17:24:35 h=localhost,u=root mysqld-relay-bin.000010         981 1062
2016-11-08T17:24:36 h=localhost,u=root mysqld-relay-bin.000010        1226 1062

[root@dbops02 ~]# pt-slave-restart --master-uuid 46fdb7ad-5852-11e6-92c9-0800274fb806 --host=localhost -u root

2016-11-08T17:24:09 h=localhost,u=root mysqld-relay-bin.000009 736 1062

2016-11-08T17:24:25 h=localhost,u=root mysqld-relay-bin.000010 491 1062

2016-11-08T17:24:34 h=localhost,u=root mysqld-relay-bin.000010 736 1062

2016-11-08T17:24:35 h=localhost,u=root mysqld-relay-bin.000010 981 1062

2016-11-08T17:24:36 h=localhost,u=root mysqld-relay-bin.000010 1226 1062

With the resources provided by pt-slave-restart, together with the above info, replication should resume. If you don’t have the Percona Toolkit package setup on your servers, make sure you follow these steps. It’s easier if you add the Percona Repository to your servers (you can use the Package Manager to install it for Debian-based and for RedHat-based systems).

mysqlslavetrx

To use mysqlslavetrx (which is part of MySQL Utilities developer by Oracle), I recommend you read the article written by Daniel Guzman, and install MySQL Utilities on your database servers. Using it to skip problematic transactions and inject fake ones is pretty straightforward as well .

So, find the below on the slave side:

         Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-8,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1

Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13,

4fbe2d57-5843-11e6-9268-0800274fb806:1-3

Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-8,

4fbe2d57-5843-11e6-9268-0800274fb806:1-3,

81a567a8-5852-11e6-92cb-0800274fb806:1

Then use GTID_SUBTRACT as the first parameter, which you can find on Retrieved_Gtid_Set and the subset, which you can grab from the Executed_Gtid_Set. Use the UUID of the direct master in the function to find the global transaction IDs executed on slave:

#: this below function is pretty cool and will exactly shows you if the slave is lacking some
#: GTIDs master have and vice-versa - this is going to help out using mysqlslavetrx to put
#: master and slave consistently with the same binary logs contents
mysql> SELECT GTID_SUBTRACT('46fdb7ad-5852-11e6-92c9-0800274fb806:1-13','46fdb7ad-5852-11e6-92c9-0800274fb806:1-8') gap;
*************************** 1. row ***************************
gap: 46fdb7ad-5852-11e6-92c9-0800274fb806:9-13
1 row in set (0.00 sec)

#: this below function is pretty cool and will exactly shows you if the slave is lacking some

#: GTIDs master have and vice-versa - this is going to help out using mysqlslavetrx to put

#: master and slave consistently with the same binary logs contents

mysql> SELECT GTID_SUBTRACT('46fdb7ad-5852-11e6-92c9-0800274fb806:1-13','46fdb7ad-5852-11e6-92c9-0800274fb806:1-8') gap;

*************************** 1. row ***************************

gap: 46fdb7ad-5852-11e6-92c9-0800274fb806:9-13

1 row in set (0.00 sec)

Now we can use mysqlslavetrx to insert a fake transaction on the slave to resume replication, as below:

 [root@dbops02 mysql-utilities-1.6.2]# mysqlslavetrx --gtid-set=46fdb7ad-5852-11e6-92c9-0800274fb806:9-13 --verbose --slaves=wb@localhost:3306
WARNING: Using a password on the command line interface can be insecure.
#
# GTID set to be skipped for each server:
# - localhost@3306: 46fdb7ad-5852-11e6-92c9-0800274fb806:9-13
#
# Injecting empty transactions for 'localhost:3306'...
# - 46fdb7ad-5852-11e6-92c9-0800274fb806:9
# - 46fdb7ad-5852-11e6-92c9-0800274fb806:10
# - 46fdb7ad-5852-11e6-92c9-0800274fb806:11
# - 46fdb7ad-5852-11e6-92c9-0800274fb806:12
# - 46fdb7ad-5852-11e6-92c9-0800274fb806:13
#
#...done.
#

[root@dbops02 mysql-utilities-1.6.2]# mysqlslavetrx --gtid-set=46fdb7ad-5852-11e6-92c9-0800274fb806:9-13 --verbose --slaves=wb@localhost:3306

WARNING: Using a password on the command line interface can be insecure.

# GTID set to be skipped for each server:

# - localhost@3306: 46fdb7ad-5852-11e6-92c9-0800274fb806:9-13

# Injecting empty transactions for 'localhost:3306'...

# - 46fdb7ad-5852-11e6-92c9-0800274fb806:9

# - 46fdb7ad-5852-11e6-92c9-0800274fb806:10

# - 46fdb7ad-5852-11e6-92c9-0800274fb806:11

# - 46fdb7ad-5852-11e6-92c9-0800274fb806:12

# - 46fdb7ad-5852-11e6-92c9-0800274fb806:13

#...done.

When you get back to the MySQL client on the slave, you’ll see that the retrieved and executed out of SHOW SLAVE STATUS will point that they are in the same position:

           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1

Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13,

4fbe2d57-5843-11e6-9268-0800274fb806:1-3

Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-13,

4fbe2d57-5843-11e6-9268-0800274fb806:1-3,

81a567a8-5852-11e6-92cb-0800274fb806:1

After, make sure you start the slave (mysqlslavetrx won’t start the replication as previous tool does).

Inject a Fake Transaction

Fake transactions are called empty transactions as well, but the fact is if a global transaction is affecting a slave, you must fake empty transactions that won’t affect data to resume replication and carry on processing the data streaming from the master aka replication! We need to know that it’s not going to affect future slaves, especially if this server becomes a new master after a failover/switchover process. You can get more information about Errant Transactions here and here.

mysql> stop slave;
Query OK, 0 rows affected (0.01 sec)
 
mysql> set gtid_next='46fdb7ad-5852-11e6-92c9-0800274fb806:14';
Query OK, 0 rows affected (0.00 sec)
 
mysql> begin; commit;
Query OK, 0 rows affected (0.00 sec)
 
Query OK, 0 rows affected (0.00 sec)
 
mysql> set gtid_next=automatic;
Query OK, 0 rows affected (0.00 sec)

mysql> stop slave;

Query OK, 0 rows affected (0.01 sec)

mysql> set gtid_next='46fdb7ad-5852-11e6-92c9-0800274fb806:14';

Query OK, 0 rows affected (0.00 sec)

mysql> begin; commit;

Query OK, 0 rows affected (0.00 sec)

mysql> set gtid_next=automatic;

Query OK, 0 rows affected (0.00 sec)

Now, when you check retrieved and executed out of SHOW SLAVE STATUS, you can see the below:

           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1

Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14,

4fbe2d57-5843-11e6-9268-0800274fb806:1-3

Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14,

4fbe2d57-5843-11e6-9268-0800274fb806:1-3,

81a567a8-5852-11e6-92cb-0800274fb806:1

It’s time to start slave (and be happy)!

mysql> show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.0.12
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysqld-bin.000005
          Read_Master_Log_Pos: 3329
               Relay_Log_File: mysqld-relay-bin.000011
                Relay_Log_Pos: 491
        Relay_Master_Log_File: mysqld-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 3329
              Relay_Log_Space: 3486
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 3
                  Master_UUID: 46fdb7ad-5852-11e6-92c9-0800274fb806
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3
            Executed_Gtid_Set: 46fdb7ad-5852-11e6-92c9-0800274fb806:1-14,
4fbe2d57-5843-11e6-9268-0800274fb806:1-3,
81a567a8-5852-11e6-92cb-0800274fb806:1
                Auto_Position: 1
1 row in set (0.00 sec)