GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

PXC 5.6 crashes while blacklisting some IPs

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • PXC 5.6 crashes while blacklisting some IPs

    Hello Folks,

    We have a setup that is like this :
    3 debian servers
    Those are VMS, on 2 differents physical server (2 on one side, 1 on the other)
    We are running PXC 5.6 from the percona repo
    All writes goes to one server, the other are hot standby, keepalived manages the virtual IP.

    The problem is, at some point in the week there seems to be a replication issue, leading to the nodes resnchronizing from the master, and those events goes for some time until the moment where the master crashes.
    What I can see in the log are something like this :

    The log from the primary got erased unfortunately.

    On one of the 'slave' :
    Code:
    2014-04-12 00:36:45 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    2014-04-12 00:36:45 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.10.10.14:4567
    2014-04-12 00:36:45 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    2014-04-12 00:36:46 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    2014-04-12 00:36:46 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') reconnecting to 96e20aae-bd97-11e3-b30b-a65820d5fee1 (tcp://10.10.10.14:4567), attempt 0
    2014-04-12 00:36:46 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') cleaning up duplicate 0x2a77d60 after established 0x2a89000
    2014-04-12 00:36:46 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    2014-04-12 00:36:46 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    2014-04-12 00:36:46 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    2014-04-12 00:36:46 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') turning message relay requesting off
    2014-04-12 00:36:48 3497 [Note] WSREP: evs::proto(9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, GATHER, view_id(REG,07afbea5-c191-11e3-a533-7fe32c71cd52,24)) suspecting node: 96e20aae-bd97-11e3-b30b-a65820d5fee1
    2014-04-12 00:36:48 3497 [Note] WSREP: evs::proto(9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, GATHER, view_id(REG,07afbea5-c191-11e3-a533-7fe32c71cd52,24)) suspecting node: 96e20aae-bd97-11e3-b30b-a65820d5fee1
    2014-04-12 00:36:49 3497 [Note] WSREP: declaring 07afbea5-c191-11e3-a533-7fe32c71cd52 stable
    2014-04-12 00:36:49 3497 [Note] WSREP: Node 07afbea5-c191-11e3-a533-7fe32c71cd52 state prim
    2014-04-12 00:36:49 3497 [Note] WSREP: view(view_id(PRIM,07afbea5-c191-11e3-a533-7fe32c71cd52,25) memb {
            07afbea5-c191-11e3-a533-7fe32c71cd52,0
            9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b,0
    } joined {
    } left {
    } partitioned {
            96e20aae-bd97-11e3-b30b-a65820d5fee1,0
    })
    2014-04-12 00:36:49 3497 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
    2014-04-12 00:36:49 3497 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    2014-04-12 00:36:49 3497 [Note] WSREP: forgetting 96e20aae-bd97-11e3-b30b-a65820d5fee1 (tcp://10.10.10.14:4567)
    2014-04-12 00:36:49 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    2014-04-12 00:36:49 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    2014-04-12 00:36:49 3497 [Note] WSREP: STATE EXCHANGE: sent state msg: c570a799-c1c9-11e3-b3c1-eff1874c545a
    2014-04-12 00:36:49 3497 [Note] WSREP: STATE EXCHANGE: got state msg: c570a799-c1c9-11e3-b3c1-eff1874c545a from 0 (mysql3)
    2014-04-12 00:36:49 3497 [Note] WSREP: STATE EXCHANGE: got state msg: c570a799-c1c9-11e3-b3c1-eff1874c545a from 1 (mysql1)
    2014-04-12 00:36:49 3497 [Note] WSREP: Quorum results:
            version    = 3,
            component  = PRIMARY,
            conf_id    = 19,
            members    = 2/2 (joined/total),
            act_id     = 1527991,
            last_appl. = 1527915,
            protocols  = 0/5/2 (gcs/repl/appl),
            group UUID = 36a9013d-7860-11e3-0800-7fc3d2bd6c58
    2014-04-12 00:36:49 3497 [Note] WSREP: Flow-control interval: [23, 23]
    2014-04-12 00:36:49 3497 [Note] WSREP: New cluster view: global state: 36a9013d-7860-11e3-0800-7fc3d2bd6c58:1527991, view# 20: Primary, number of nodes: 2, my index: 1, protocol version 2
    2014-04-12 00:36:49 3497 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    2014-04-12 00:36:49 3497 [Note] WSREP: REPL Protocols: 5 (3, 1)
    2014-04-12 00:36:49 3497 [Note] WSREP: Assign initial position for certification: 1527991, protocol version: 3
    2014-04-12 00:36:49 3497 [Note] WSREP: Service thread queue flushed.
    2014-04-12 00:36:50 3497 [Warning] WSREP: discarding established (time wait) 96e20aae-bd97-11e3-b30b-a65820d5fee1 (tcp://10.10.10.14:4567)
    2014-04-12 00:36:50 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    2014-04-12 00:36:52 3497 [Warning] WSREP: discarding established (time wait) 96e20aae-bd97-11e3-b30b-a65820d5fee1 (tcp://10.10.10.14:4567)
    2014-04-12 00:36:52 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    2014-04-12 00:36:53 3497 [Warning] WSREP: discarding established (time wait) 96e20aae-bd97-11e3-b30b-a65820d5fee1 (tcp://10.10.10.14:4567)
    [...]
    2014-04-12 00:36:55 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    2014-04-12 00:36:55 3497 [Warning] WSREP: evs::proto(9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, GATHER, view_id(REG,07afbea5-c191-11e3-a533-7fe32c71cd52,25)) source 96e20aae-bd97-11e3-b30b-a65820d5fee1 is not supposed to be representative
    2014-04-12 00:36:57 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.13:4567' pointing to uuid 9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b is blacklisted, skipping
    my config looks like this :
    Code:
    root@mysql1:~# cat /etc/mysql/my.cnf
    # The following options will be passed to all MySQL clients
    [client]
    #password       = your_password
    port            = 3306
    socket          = /var/run/mysqld/mysqld.sock
    
    # Here follows entries for some specific programs
    
    
    [mysqld_safe]
    socket          = /var/run/mysqld/mysqld.sock
    nice            = 0
    
    [mysqld]
    pid-file        = /var/run/mysqld/mysqld.pid
    socket          = /var/run/mysqld/mysqld.sock
    
    user                           = mysql
    default_storage_engine         = InnoDB
    datadir                        = /storage/mysql/
    
    port            = 3306
    socket          = /tmp/mysql.sock
    skip-external-locking
    skip-name-resolve
    key_buffer_size = 384M
    max_allowed_packet = 16M
    table_open_cache = 512
    sort_buffer_size = 4M
    read_buffer_size = 4M
    read_rnd_buffer_size = 8M
    myisam_sort_buffer_size = 64M
    thread_cache_size = 8
    #query_cache_size = 128M
    # Try number of CPU's*2 for thread_concurrency
    thread_concurrency = 8
    
    #skip-networking
    
    # BINARY LOGGING #
    #log-bin                        = /var/lib/mysql/mysql-bin
    #expire-logs-days               = 14
    #sync-binlog                    = 1
    # In order for Galera to work correctly binlog format should be ROW
    binlog_format=ROW
    
    
    # Uncomment the following if you are using InnoDB tables
    innodb_data_home_dir = /storage/mysql/
    innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
    innodb_log_group_home_dir = /storage/mysql/
    # You can set .._buffer_pool_size up to 50 - 80 %
    # of RAM but beware of setting memory usage too high
    innodb_buffer_pool_size = 2G
    innodb_additional_mem_pool_size = 20M
    # Set .._log_file_size to 25 % of buffer pool size
    innodb_log_file_size = 500M
    innodb_log_buffer_size = 8M
    innodb_flush_log_at_trx_commit = 1
    innodb_lock_wait_timeout = 50
    
    
    # LOGGING #
    log_error                      = /storage/mysql/mysql-error.log
    log_queries_not_using_indexes  = 1
    slow_query_log                 = 1
    slow_query_log_file            = /storage/mysql/mysql-slow.log
    slow_query_log_use_global_control = all
    
    ## xtradb
    wsrep_provider=/usr/lib/libgalera_smm.so
    wsrep_cluster_address=gcomm://10.10.10.13:4567,10.10.10.14:4567,10.10.10.16:4567
    wsrep_node_address=10.10.10.13
    #wsrep_slave_threads=2
    wsrep_cluster_name=sqmCluster
    wsrep_sst_method=xtrabackup-v2
    wsrep_node_name=mysql1
    wsrep_sst_auth="USER:PASS"
    
    ### auto increment issue
    #auto_increment_offset=1
    #auto_increment_increment=1
    wsrep_auto_increment_control=OFF
    ###
    innodb_locks_unsafe_for_binlog=1
    innodb_autoinc_lock_mode=2
    
    
    [mysqldump]
    quick
    max_allowed_packet = 16M
    
    [mysql]
    no-auto-rehash
    # Remove the next comment character if you are not familiar with SQL
    #safe-updates
    
    [myisamchk]
    key_buffer_size = 256M
    sort_buffer_size = 256M
    read_buffer = 2M
    write_buffer = 2M
    
    [mysqlhotcopy]
    interactive-timeout
    I would like to add that there is no network issue as far as I can tell, those two physical servers are connected directly back to back.

    Thanks for you help,

    Thomas

  • #2
    No one from percona around ?

    Comment


    • #3
      The warnings about blacklisting are normal and are related the the node seeing itself and excluding himself from communication negotiations. While the warnings appear mostly at a moment where cluster configuration changes, which in this case was the node 10.10.10.14 considered as not available:
      2014-04-12 00:36:45 3497 [Note] WSREP: (9e6115f5-c1aa-11e3-8d85-7a0bafd1c35b, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.10.10.14:4567
      This looks to be network problem where connection with 10.10.10.14 is not reliable hence two other cluster members marked it as partitioned node.

      Comment

      Working...
      X