Emergency

Replication error 1236 "unknown error reading log event on the master"

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replication error 1236 "unknown error reading log event on the master"

    Hello,

    we are using two instances of 5.7.20-18-log Percona Server in a master-master (dbm1 / dbm2) replication. The application only uses dbm1 for writes and reads so the dbm2 is used as a cold standby only.
    In the last weeks the replication on dbm2 stops multiple times per week with the error message "unknown error reading log event on the master". Here is the full slave status (dbm2) from the last incident:

    Code:
    mysql> show slave status \G
    *************************** 1. row ***************************
                   Slave_IO_State: 
                      Master_Host: 10.6.233.101
                      Master_User: replicator
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.007091
              Read_Master_Log_Pos: 71137
                   Relay_Log_File: mysqld-relay-bin.001603
                    Relay_Log_Pos: 71350
            Relay_Master_Log_File: mysql-bin.007091
                 Slave_IO_Running: No
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 71137
                  Relay_Log_Space: 71605
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 1236
                    Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'unknown error reading log event on the master; the first event 'mysql-bin.007003' at 43445, the last event read from './mysql-bin.007091' at 71137, the last byte read from './mysql-bin.007091' at 71137.'
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 1
                      Master_UUID: 7cb8eccb-e1a0-11e7-9d96-545edb2e572a
                 Master_Info_File: /var/lib/mysql/master.info
                        SQL_Delay: 0
              SQL_Remaining_Delay: NULL
          Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
               Master_Retry_Count: 86400
                      Master_Bind: 
          Last_IO_Error_Timestamp: 180308 06:41:25
         Last_SQL_Error_Timestamp: 
                   Master_SSL_Crl: 
               Master_SSL_Crlpath: 
               Retrieved_Gtid_Set: 
                Executed_Gtid_Set: 
                    Auto_Position: 0
             Replicate_Rewrite_DB: 
                     Channel_Name: 
               Master_TLS_Version: 
    1 row in set (0.00 sec)
    Generally we address this with a full restore of dbm2 with XtraBackup but after a few days the problem on dbm2 appears again.
    As we can see in our monitoring both servers have enough resources left (and no IO-Wait) - what we can see is, that this often happens when the application is doing some import jobs (writing in dbm1):



    Both servers are located in the same datacenter, communication via the internal network.

    More details:
    Code:
    OS: Ubuntu 14.04.5 LTS
    Kernel: OpenVZ VPS (Kernel 2.6.32-042stab127.2)
    CPU: 8 cores Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
    RAM: 16 gb
    Percona: 5.7.20-18-log Percona Server (GPL), Release '18', Revision '7ce54a6deef'
    my.cnf
    Code:
    [client]
    port = 3306
    socket = /var/run/mysqld/mysqld.sock
    
    [isamchk]
    key_buffer_size = 16M
    
    [mysqld]
    basedir = /usr
    bind_address = *
    binlog_cache_size = 1M
    binlog_format = mixed
    bulk_insert_buffer_size = 64M
    datadir = /var/lib/mysql
    expire_logs_days = 2
    innodb_buffer_pool_dump_at_shutdown = ON
    innodb_buffer_pool_load_at_startup = ON
    innodb_buffer_pool_size = 12G
    innodb_doublewrite = OFF
    innodb_flush_log_at_trx_commit = 2
    innodb_io_capacity = 1000
    innodb_log_file_size = 256M
    innodb_read_io_threads = 8
    innodb_write_io_threads = 8
    join_buffer_size = 244K
    key_buffer_size = 100M
    log-bin = mysql-bin
    log-error = /var/log/mysql/error.log
    log_slow_verbosity = full
    long_query_time = 1
    max_allowed_packet = 16M
    max_binlog_files = 200
    max_binlog_size = 100M
    max_connections = 500
    max_heap_table_size = 128M
    max_relay_log_size = 256M
    max_slowlog_files = 1
    max_slowlog_size = 4G
    myisam_sort_buffer_size = 64M
    optimizer_switch = index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=off,block_nested_loop=off,batched_key_access=on,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on
    pid-file = /var/run/mysqld/mysqld.pid
    port = 3306
    query_cache_limit = 1M
    query_cache_size = 128M
    query_cache_type = ON
    read_buffer_size = 244K
    relay_log = mysqld-relay-bin
    server_id = 1
    skip-external-locking
    skip_name_resolve = ON
    slow_query_log = ON
    slow_query_log_file = /var/log/mysql/mysqld.slowlog
    socket = /var/run/mysqld/mysqld.sock
    sort_buffer_size = 2047K
    sql_mode = STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
    ssl = false
    ssl-ca = /etc/mysql/cacert.pem
    ssl-cert = /etc/mysql/server-cert.pem
    ssl-key = /etc/mysql/server-key.pem
    thread_cache_size = 8
    thread_stack = 256K
    tmp_table_size = 128M
    tmpdir = /tmp
    user = mysql
    userstat = ON
    
    [mysqld-5.0]
    myisam-recover = BACKUP
    
    [mysqld-5.1]
    myisam-recover = BACKUP
    
    [mysqld-5.5]
    myisam-recover = BACKUP
    
    [mysqld-5.6]
    myisam-recover-options = BACKUP
    
    [mysqld-5.7]
    myisam-recover-options = BACKUP
    
    [mysqld_safe]
    log-error = /var/log/mysql/error.log
    nice = 0
    socket = /var/run/mysqld/mysqld.sock
    
    [mysqldump]
    max_allowed_packet = 16M
    quick
    quote-names
    Does anybody has an idea how to mitigate the root cause for this type of replication error?

    Thanks and best regards

    Adrian
    dbm1 query types dbm2 query types
Working...
X