Replication has been the core functionality, allowing high availability in MySQL for decades already. However, you may still encounter replication errors that keep you awake at night. One of the most common and challenging to deal with starts with: “Got fatal error 1236 from source when reading data from binary log“. This blog post is a refreshed attempt to explain the typical causes of this error and ways to deal with the problem.
Errant GTIDs
Nowadays, typical replication setup is using GTID mode, and the full error message can look like this:
1 2 3 4 5 6 7 8 |
mysql > show replica statusG *************************** 1. row *************************** ... Replica_IO_Running: No Replica_SQL_Running: Yes ... Last_IO_Errno: 13114 Last_IO_Error: Got fatal error 1236 from source when reading data from binary log: 'Cannot replicate because the source purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new replica from backup. Consider increasing the source's binary log expiration period. The GTID set sent by the replica is '00022738-1111-1111-1111-111111111111:1-370', and the missing transactions are '00022739-2222-2222-2222-222222222222:1-2'' |
So, we have additional information – errno 13114, which doesn’t add much though:
1 2 |
$ perror 13114 MySQL error code MY-013114 (ER_SERVER_SOURCE_FATAL_ERROR_READING_BINLOG): Got fatal error %d from source when reading data from binary log: '%-.512s' |
There are, however, more details about the reason for the error. The message explains that the source no longer has the needed binary logs, and the GTID details give even more precise insight: “the missing transactions are '00022739-2222-2222-2222-222222222222:1-2'
.”
Digging further, we can see that the source has two GTID sets in gtid_executed, while the replica has only one:
— source
1 2 3 4 5 6 7 |
mysql > select @@global.gtid_executed,@@global.gtid_purgedG *************************** 1. row *************************** @@global.gtid_executed: 00022738-1111-1111-1111-111111111111:1-370, 00022739-2222-2222-2222-222222222222:1-2 @@global.gtid_purged: 00022738-1111-1111-1111-111111111111:1-357, 00022739-2222-2222-2222-222222222222:1-2 1 row in set (0.00 sec) |
— replica
1 2 3 4 5 |
mysql > select @@global.gtid_executed,@@global.gtid_purgedG *************************** 1. row *************************** @@global.gtid_executed: 00022738-1111-1111-1111-111111111111:1-370 @@global.gtid_purged: 1 row in set (0.00 sec) |
Moreover, this extra set is marked as purged. Therefore, it cannot be provided to the replica. We call it an errant transaction.
As the binary logs are purged, we cannot investigate what these two extra transactions were about unless the source instance binlogs were backed up and we can find them in history.
Suppose there is no way to check what these were about. In that case, the quick solution to resume replication is to insert empty transactions with the same GTIDs and later check the instances for inconsistencies (i.e. with pt-table-checksum). To accomplish that, on the replica, we can do this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
mysql > set gtid_next='00022739-2222-2222-2222-222222222222:1'; Query OK, 0 rows affected (0.00 sec) mysql > begin; commit; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.01 sec) mysql > set gtid_next='00022739-2222-2222-2222-222222222222:2'; Query OK, 0 rows affected (0.00 sec) mysql > begin; commit; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.01 sec) mysql > set gtid_next=automatic; Query OK, 0 rows affected (0.00 sec) mysql > select @@global.gtid_executed,@@global.gtid_purgedG *************************** 1. row *************************** @@global.gtid_executed: 00022738-1111-1111-1111-111111111111:1-370, 00022739-2222-2222-2222-222222222222:1-2 @@global.gtid_purged: 1 row in set (0.00 sec) mysql > start replica; Query OK, 0 rows affected (0.00 sec) mysql > show replica statusG *************************** 1. row *************************** ... Replica_IO_Running: Yes Replica_SQL_Running: Yes |
A typical reason for this scenario is when the errant transaction is introduced on a replica first, and sometime later, the same replica gets promoted to be the new source.
Why could this have happened if you have all replicas running with read-only mode? Well, here is the case with my test replica:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
mysql > select @@super_read_only,@@read_only; +-------------------+-------------+ | @@super_read_only | @@read_only | +-------------------+-------------+ | 1 | 1 | +-------------------+-------------+ 1 row in set (0.00 sec) mysql > flush hosts; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql > show binlog events in 'mysql-bin.000002'; +------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ | mysql-bin.000002 | 4 | Format_desc | 22739 | 126 | Server ver: 8.0.37, Binlog ver: 4 | | mysql-bin.000002 | 126 | Previous_gtids | 22739 | 197 | 00022738-1111-1111-1111-111111111111:1-357 | | mysql-bin.000002 | 197 | Gtid | 22739 | 274 | SET @@SESSION.GTID_NEXT= '00022739-2222-2222-2222-222222222222:1' | | mysql-bin.000002 | 274 | Query | 22739 | 351 | flush hosts | | mysql-bin.000002 | 351 | Gtid | 22739 | 428 | SET @@SESSION.GTID_NEXT= '00022739-2222-2222-2222-222222222222:2' | | mysql-bin.000002 | 428 | Query | 22739 | 505 | flush hosts | +------------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ 6 rows in set (0.00 sec) |
Even with super_read_only turned on, it is possible to add binlog events with the local server’s UUID. So, when later this binlog gets rotated, and the instance gets promoted, other replicas will have no way to sync these events! This problem was reported years ago and still valid to this day: https://bugs.mysql.com/bug.php?id=88720
The max_allowed_packet is too small?
Another possible case for error 1236 is when MySQL reports that the maximum allowed packet size is exceeded. An example error state from the replica’s status can look like this:
1 2 |
Last_IO_Errno: 13114 Last_IO_Error: Got fatal error 1236 from source when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on source; the first event '' at 4, the last event read from './mysql-bin.000002' at 7628, the last byte read from './mysql-bin.000002' at 7647.' |
The corresponding error log entries on the replica side are:
1 2 |
2024-06-05T14:19:57.956581Z 10 [ERROR] [MY-010557] [Repl] Error reading packet from server for channel '': log event entry exceeded max_allowed_packet; Increase max_allowed_packet on source; the first event '' at 4, the last event read from './mysql-bin.000002' at 7628, the last byte read from './mysql-bin.000002' at 7647. (server_errno=1236) 2024-06-05T14:19:57.956622Z 10 [ERROR] [MY-013114] [Repl] Replica I/O for channel '': Got fatal error 1236 from source when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on source; the first event '' at 4, the last event read from './mysql-bin.000002' at 7628, the last byte read from './mysql-bin.000002' at 7647.', Error_code: MY-013114 |
Now, the above advice provided, together with the error to increase the max_allowed_packet setting on the source, may be completely unreasonable. It is printed even if the source has it already set to the maximum possible value, that is 1GB:
1 2 3 4 5 6 7 |
mysql > select @@max_allowed_packet; +----------------------+ | @@max_allowed_packet | +----------------------+ | 1073741824 | +----------------------+ 1 row in set (0.00 sec) |
On the replica side, by default, the maximum packet size is set via replica_max_allowed_packet (also 1G).
It is important to check whether the culprit binary log is indeed bigger than one GB first because if not, the error is most likely related to log corruption, for example, when the source was rebooted abruptly and sync_binlog<>1. In any case, the binlog file should be tested with mysqlbinlog tool if is parseable at all. When the binlog file does not get written completely to the disk (due to abrupt power loss), surprisingly, the error message may look exactly the same.
If the binlog is one GB or bigger and not corrupted, though, this may be a result of hitting the following bugs (still valid today):
https://bugs.mysql.com/bug.php?id=107113 – when a single row is big enough
https://bugs.mysql.com/bug.php?id=55231 – when binlog file size exceeds 4GB
To avoid this error variant, very big transactions should be avoided, and sync_binlog=1 should minimize the risk of corruption.
Missing binary log file
Another popular reason for the same error can be like this:
1 2 |
Last_IO_Errno: 13114 Last_IO_Error: Got fatal error 1236 from source when reading data from binary log: 'Could not find first log file name in binary log index file' |
It can typically be expected in non-GTID mode and also when GTID mode is enabled but the auto-positioning feature is disabled. Hence, the replication IO thread is looking at the binary log file and position.
The reason for this one is simple – the source rotated the needed binary log before the replica was able to download it. So, for example, on the source, there is:
1 2 3 4 5 6 7 |
mysql > show binary logs; +------------------+-----------+-----------+ | Log_name | File_size | Encrypted | +------------------+-----------+-----------+ | mysql-bin.000005 | 1674 | No | +------------------+-----------+-----------+ 1 row in set (0.00 sec) |
But the replica needed the previous file to continue:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
mysql > show replica statusG *************************** 1. row *************************** Replica_IO_State: Source_Host: 127.0.0.1 Source_User: rsandbox Source_Port: 22738 Connect_Retry: 60 Source_Log_File: mysql-bin.000004 Read_Source_Log_Pos: 716 Relay_Log_File: mysql-relay.000004 Relay_Log_Pos: 892 Relay_Source_Log_File: mysql-bin.000004 Replica_IO_Running: No Replica_SQL_Running: Yes … Auto_Position: 0 |
A proper log-rotating policy should be implemented to avoid this problem. Often, MySQL administrators use relatively short retention settings (via binlog_expire_logs_seconds) as it is very hard to predict the disk space usage, which depends on the actual write volume rather than on time. I would argue that it is much easier to better utilize the dedicated disk space for binlogs with Percona’s extension and variable binlog_space_limit!
Out of disk space
Disk space problems on the source can result in yet another variant of the error, for example:
1 2 |
Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event '' at 4, the last event read from './mysqld-bin.000001' at 12826202, the last byte read from './mysqld-bin.000001' at 12826221.' |
Quite common is when the temporary space (tmpdir or innodb_tmpdir) is mounted on a separate, small partition. When that one gets full, the binlog cache file can’t be written to the disk, and as a result, the binary log entry is corrupted, causing the replica to fail with the same error.
Reference:
https://bugs.mysql.com/bug.php?id=86991
https://bugs.mysql.com/bug.php?id=72457
Other examples of active bugs leading to binlog corruption while the same error message is printed:
https://bugs.mysql.com/bug.php?id=75746
https://bugs.mysql.com/bug.php?id=75507
Summary
In general, dealing with this replication error category may be challenging. In some cases, it will be best to re-create the replicas’ data from the source backup. Fast methods to achieve this include XtraBackup or the clone plugin.
But in others, you may try our tools to identify and sync table differences in a more lightweight approach, as demonstrated here:
https://www.percona.com/blog/mysql-replication-primer-with-pt-table-checksum-and-pt-table-sync/