GTID Replication and Binary Logs Purge

GTID replicationThis blog continues the ongoing series on daily operations and GTID replication and binary logs purge.

In this blog, I’m going to investigate why the error below has been appearing in a special environment I’ve been working with on the last few days:

The error provides the right message and explains what is going on. But sometimes, it can be a bit tricky to solve this issue: you need additional information discovered after some tests and readings. We try and keep Managed Services scripted, in the sense that our advice and best practices are repeatable and consistent. However, some additional features and practices can be added depending on the customer situation.

Some time ago one of our customer’s database servers presented the above message. At that point, we could see the binary log files in a compressed form on master (gzipped). Of course, MySQL can’t identify a compressed file with a .gz extension as a binary log. We uncompressed the file, but replication presented the same problem – even after uncompressing the file and making sure the UUID of the current master and the TRX_ID were there. Obviously, I needed to go and investigate the problem to see what was going on.

After some reading, I re-read the below:

When the server starts, the global value of gtid_purged, which was called before as gtid_lost, is initialized to the set of GTIDs contained by the Previous_gtid_log_event of the oldest binary log. When a binary log is purged, gtid_purged is re-read from the binary log that has now become the oldest one.

=> https://dev.mysql.com/doc/refman/5.6/en/replication-options-gtids.html#sysvar_gtid_purged

That makes me think: if something is compressing binlogs on the master without purging them as expected by the GTID mechanism, it’s not going to be able to re-read existing GTIDs on disk. When the slave replication threads restarts, or the DBA issues commands like reset slave and reset master (to clean out the increased GTID sets on Executed_Gtid_Set from the SHOW SLAVE STATUS command, for example), this error can occur. But if I compress the file:

  • Will the slave get lost and not find all the needed GTIDs on the master after a reset slave/reset master?
  • If I purge the logs correctly, using PURGE BINARY LOGS, will the slave be OK when restarting replication threads?

Test 1: Compressing the oldest binary log file on master, restarting slave threads

I would like to test this very methodically. We’ll create one GTID per binary log, and then I will compress the oldest binary log file in order to make it unavailable for the slaves. I’m working with three virtual machines, one master and two slaves. On the second slave, I’m going to run the following sequence: stop slave, reset slave, reset master, start slave, and then, check the results. Let’s see what happens.

On master (tool01):