GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Cluster Node crached with strange error

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster Node crached with strange error

    Hello!

    We have Percona XtraDB Cluster with 3 modes.
    Debian 6.0
    Percona percona-xtradb-cluster-server-5.5 (Version: 5.5.33-23.7.6-496.squeeze)

    Every few days server node is craching with error. We trued to upgrade software version from repos, but this didn't solve the problem.

    Configuration file:
    Code:
    /etc/mysql/my.cnf
    [mysqld]
    
            datadir=/var/lib/mysql
            user=mysql
    
            binlog_format=ROW
    
            wsrep_provider=/usr/lib/libgalera_smm.so
    
            wsrep_cluster_address=gcomm://176.X.X.66,95.X.X.218
    
            wsrep_slave_threads=16
            wsrep_cluster_name=xxx_cluster
            wsrep_node_name=node_2
    
            wsrep_sst_method=rsync
            wsrep_sst_auth=xxxx:xxxxxxxxxx
            log_error=/var/log/mysql/error.log
    
    
            innodb_locks_unsafe_for_binlog=1
            innodb_autoinc_lock_mode=2
    
            innodb_buffer_pool_size=4G
            innodb_log_file_size=128M
            innodb_log_buffer_size=4M
            innodb-file-per-table
    
            query_cache_size = 0
            innodb_flush_log_at_trx_commit = 0
    
            max_connect_errors = 10000
    Error log from crach node:

    Code:
    131008 19:27:19 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 255f747f-2ebf-11e3-ab78-ef46ed89d801 (tcp://176.X.X.66:4567), attempt 0
    131008 19:27:19 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting off
    131008 19:27:22 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://176.X.X.66:4567
    131008 19:27:23 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 255f747f-2ebf-11e3-ab78-ef46ed89d801 (tcp://176.X.X.66:4567), attempt 0
    131008 19:27:36 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2 (tcp://95.X.X.218:4567), attempt 0
    131008 19:27:36 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting off
    131008 19:27:37 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, OPERATIONAL, view_id(REG,20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2,97)) suspecting node: 20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2
    131008 19:27:37 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, OPERATIONAL, view_id(REG,20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2,97)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
    131008 19:27:38 [Note] WSREP: view(view_id(NON_PRIM,20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2,97) memb {
            42794e56-300e-11e3-9431-5a351eb5fbb0,
    } joined {
    } left {
    } partitioned {
            20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2,
            255f747f-2ebf-11e3-ab78-ef46ed89d801,
    })
    131008 19:27:38 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
    131008 19:27:38 [Note] WSREP: view(view_id(NON_PRIM,42794e56-300e-11e3-9431-5a351eb5fbb0,98) memb {
            42794e56-300e-11e3-9431-5a351eb5fbb0,
    } joined {
    } left {
    } partitioned {
            20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2,
            255f747f-2ebf-11e3-ab78-ef46ed89d801,
    })
    131008 19:27:38 [Note] WSREP: Flow-control interval: [16, 16]
    131008 19:27:38 [Note] WSREP: Received NON-PRIMARY.
    131008 19:27:38 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1526)
    131008 19:27:38 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
    131008 19:27:38 [Note] WSREP: Flow-control interval: [16, 16]
    131008 19:27:38 [Note] WSREP: New cluster view: global state: 18f7464c-2cee-11e3-0800-6a67365ca6cb:1526, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
    131008 19:27:38 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    131008 19:27:38 [Note] WSREP: Received NON-PRIMARY.
    131008 19:27:38 [Note] WSREP: New cluster view: global state: 18f7464c-2cee-11e3-0800-6a67365ca6cb:1526, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
    131008 19:27:38 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    131008 19:27:39 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://176.X.X.66:4567
    131008 19:27:40 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 255f747f-2ebf-11e3-ab78-ef46ed89d801 (tcp://176.X.X.66:4567), attempt 0
    131008 19:27:41 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2 (tcp://95.X.X.218:4567), attempt 0
    131008 19:27:43 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') reconnecting to 20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2 (tcp://95.X.X.218:4567), attempt 0
    131008 19:27:43 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
    131008 19:27:44 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
    131008 19:27:44 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
    131008 19:27:45 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
    131008 19:27:46 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
    131008 19:27:47 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
    131008 19:27:49 [Note] WSREP: evs::proto(42794e56-300e-11e3-9431-5a351eb5fbb0, GATHER, view_id(REG,42794e56-300e-11e3-9431-5a351eb5fbb0,98)) suspecting node: 255f747f-2ebf-11e3-ab78-ef46ed89d801
    131008 19:27:53 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting off
    131008 19:27:54 [Note] WSREP: declaring 20af6d6d-2ebf-11e3-903a-e2b3dfdc1cb2 stable
    131008 19:27:54 [Note] WSREP: declaring 255f747f-2ebf-11e3-ab78-ef46ed89d801 stable
    131008 19:27:54 [Note] WSREP: (42794e56-300e-11e3-9431-5a351eb5fbb0, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://95.X.X.218:4567
    
    16:27:54 UTC - mysqld got signal 11 ;
    This could be because you hit a bug. It is also possible that this binary
    or one of the libraries it was linked against is corrupt, improperly built,
    or misconfigured. This error can also be caused by malfunctioning hardware.
    We will try our best to scrape up some info that will hopefully help
    diagnose the problem, but since we have already crashed,
    something is definitely wrong and this may fail.
    Please help us make Percona Server better by reporting any
    bugs at http://bugs.percona.com/
    
    key_buffer_size=8388608
    read_buffer_size=131072
    max_used_connections=20
    max_threads=153
    thread_count=17
    connection_count=17
    It is possible that mysqld could use up to
    key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 343054 K  bytes of memory
    Hope that's ok; if not, decrease some variables in the equation.
    
    Thread pointer: 0x0
    Attempting backtrace. You can use the following information to find out
    where mysqld died. If you see no messages after this, something went
    terribly wrong...
    stack_bottom = 0 thread_stack 0x40000
    /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x7f51d5]
    /usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x6c17a4]
    /lib/libpthread.so.0(+0xeff0)[0x7fb8f4b74ff0]
    /lib/libc.so.6(+0x72146)[0x7fb8f37b8146]
    /lib/libc.so.6(+0x73758)[0x7fb8f37b9758]
    /lib/libc.so.6(cfree+0x6c)[0x7fb8f37bcb8c]
    /usr/lib/libgalera_smm.so(_ZN5gcomm13AsioTcpSocket13write_handlerERKN4asio10error_codeEm+0x358)[0x7fb8f1ff0b48]
    /usr/lib/libgalera_smm.so(_ZN4asio6detail8write_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS4_EEEEN5boost5arrayINS_12const_bufferELm2EEENS0_14transfer_all_tENS8_3_bi6bind_tIvNS8_4_mfi3mf2IvN5gcomm13AsioTcpSocketERKNS_10error_codeEmEENSD_5list3INSD_5valueINS8_10shared_ptrISI_EEEEPFNS8_3argILi1EEEvEPFNSS_ILi2EEEvEEEEEEclESL_mi+0xee)[0x7fb8f1ff986e]
    /usr/lib/libgalera_smm.so(_ZN4asio6detail23reactive_socket_send_opINS0_17consuming_buffersINS_12const_bufferEN5boost5arrayIS3_Lm2EEEEENS0_8write_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceISB_EEEES6_NS0_14transfer_all_tENS4_3_bi6bind_tIvNS4_4_mfi3mf2IvN5gcomm13AsioTcpSocketERKNS_10error_codeEmEENSG_5list3INSG_5valueINS4_10shared_ptrISL_EEEEPFNS4_3argILi1EEEvEPFNSV_ILi2EEEvEEEEEEEE11do_completeEPNS0_15task_io_serviceEPNS0_25task_io_service_operationESM_m+0x2bf)[0x7fb8f1ffa32f]
    /usr/lib/libgalera_smm.so(_ZN4asio6detail15task_io_service3runERNS_10error_codeE+0x45a)[0x7fb8f201924a]
    /usr/lib/libgalera_smm.so(_ZN5gcomm12AsioProtonet10event_loopERKN2gu8datetime6PeriodE+0x1d6)[0x7fb8f2012e76]
    /usr/lib/libgalera_smm.so(_ZN9GCommConn3runEv+0x57)[0x7fb8f202c757]
    /usr/lib/libgalera_smm.so(_ZN9GCommConn6run_fnEPv+0x9)[0x7fb8f2030989]
    /lib/libpthread.so.0(+0x68ca)[0x7fb8f4b6c8ca]
    /lib/libc.so.6(clone+0x6d)[0x7fb8f3815b6d]
    You may download the Percona Server operations manual by visiting
    http://www.percona.com/software/percona-server/. You may find information
    in the manual which will help you identify the cause of the crash.
    
    131008 19:27:54 mysqld_safe Number of processes running now: 0
    131008 19:27:54 mysqld_safe WSREP: not restarting wsrep node automatically
    131008 19:27:54 mysqld_safe mysqld from pid file /var/lib/mysql/server.my.pid ended
    What can be a reason of suck craching?
    Thank you!

  • #2
    Hi,

    I do not see "wsrep_node_address" option in your configuration file.
    Can you please set it and check if crash will happen again?

    http://www.percona.com/doc/percona-x...p_node_address
    http://www.codership.com/wiki/doku.p...p_node_address

    Comment

    Working...
    X