GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

XtraDB Cluster node failure

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • XtraDB Cluster node failure

    How do you diagnose what caused an XtraDB cluster node to fail?

    What are ALL the log files etc that I should be looking at?

    Thanks.

    Dom

    A node (6 node cluster, split three members on two sites (1000MB or more between sites) failed at Nov 2 00:55:14 and failed to restart, several times, mysql_safe seems to have captured the failure but failed to restart it.

    We had to do a HARD OS restart and a manual mysql restart as you can see from the mysql error log.

    There was no user activity on the db at the time of the failure.

    We lost a second node 6 hours after the first and a third about 5 hours later, again nothing seems to explain it.

    Environment:
    scientific 6.2 64bit
    vmware 5
    2vcpu
    8GB ram
    4GB db
    5.9GB innoDB buffer
    Version 5.5.24-23.6

    my.cnf file:

    [client]
    port = 3306
    socket = /var/lib/mysql/mysql.sock

    [mysqld]
    skip-name-resolv
    character-set-server=utf8
    collation-server=utf8_bin
    datadir=/var/lib/mysql
    user=mysql
    expire_logs_days=3
    port = 3306
    socket = /var/lib/mysql/mysql.sock
    back_log = 50
    max_connections = 250
    max_connect_errors = 10
    table_open_cache = 2048
    max_allowed_packet = 32M
    thread_cache_size = 8

    query_cache_size = 0
    query_cache_type = 0
    ft_min_word_len = 4

    default-storage-engine = InnoDB
    thread_stack = 192K
    transaction_isolation = REPEATABLE-READ
    tmp_table_size = 64M

    log-bin=mysql-bin
    binlog_format=ROW
    log_slave_updates
    slow_query_log=1
    slow_query_log_file=/var/log/mysql/slow.log
    long_query_time = 2

    general_log=0
    general_log_file=/var/log/mysql/general.log
    log-error=/var/log/mysql/error.log

    server-id=183371624
    key_buffer_size = 32M
    bulk_insert_buffer_size = 64M
    myisam_sort_buffer_size = 128M
    myisam_max_sort_file_size = 10G
    myisam_repair_threads = 1
    myisam_recover

    innodb_additional_mem_pool_size = 16M
    innodb_buffer_pool_size = 6000M
    innodb_data_file_path = ibdata1:10M:autoextend
    innodb_thread_concurrency = 0
    innodb_flush_log_at_trx_commit = 0
    innodb_doublewrite=1
    innodb_log_buffer_size = 8M
    innodb_log_file_size = 384M
    innodb_log_files_in_group = 2
    innodb_max_dirty_pages_pct = 90
    innodb_file_per_table
    innodb_autoinc_lock_mode=2
    innodb_locks_unsafe_for_binlog=1

    wait_timeout=300

    wsrep_provider=/usr/lib64/libgalera_smm.so
    wsrep_provider_options = "gcache.size=4G; gmcast.listen_addr=tcp://0.0.0.0:4010;evs.send_window=512; evs.user_send_window=512;"
    wsrep_slave_threads=8
    wsrep_cluster_name=lptlive
    wsrep_sst_method=xtrabackup
    wsrep_sst_auth=replic:5vaeNeIiTYBRU
    wsrep_notify_cmd=/usr/bin/wsrep_notify.sh

    [xtrabackup]
    target-dir = /backup

    [mysqldump]
    quick
    max_allowed_packet = 32M


    [mysql]
    no-auto-rehash


    [myisamchk]
    key_buffer_size = 512M
    sort_buffer_size = 512M
    read_buffer = 8M
    write_buffer = 8M


    [mysqlhotcopy]
    interactive-timeout


    [mysqld_safe]
    open-files-limit = 8192
    wsrep_urls= gcomm://192.168.100.101:4010,gcomm://192.168.100.102:4010,gc omm://192.168.100.103:4010,gcomm://192.168.101.104:4010,gcom m://192.168.101.105:4010,gcomm://192.168.101.106:4010,gcomm: //

    I have checked the mysql error log and it says:

    121102 00:55:14 mysqld_safe Number of processes running now: 0
    121102 00:55:14 mysqld_safe mysqld restarted
    121102 0:55:15 [Note] Flashcache bypass: disabled
    121102 0:55:15 [Note] Flashcache setup error is : open flash device failed

    121102 0:55:15 [Note] WSREP: Read nil XID from storage engines, skipping position init
    121102 0:55:15 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
    121102 0:55:15 [Note] WSREP: wsrep_load(): Galera 2.1(r113) by Codership Oy loaded succesfully.
    121102 0:55:15 [Note] WSREP: Found saved state: 72de1b90-022d-11e2-0800-484cdf256d28:-1
    121102 0:55:15 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
    121102 0:55:15 [Note] WSREP: Passing config to GCS: base_host = 192.168.100.103; evs.send_window = 512; evs.user_send_window = 512; gcache.dir = /var/lib/mysql/; gcache.keep_pages
    _size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 4G; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16;
    gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; g
    mcast.listen_addr = tcp://0.0.0.0:4010; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
    121102 0:55:15 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    121102 0:55:15 [Note] WSREP: wsrep_sst_grab()
    121102 0:55:15 [Note] WSREP: Start replication
    121102 0:55:15 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    121102 0:55:15 [Note] WSREP: protonet asio version 0
    121102 0:55:15 [Note] WSREP: backend: asio
    121102 0:55:15 [Note] WSREP: GMCast version 0
    121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, 'tcp://0.0.0.0:4010') listening at tcp://0.0.0.0:4010
    121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, 'tcp://0.0.0.0:4010') multicast: , ttl: 1
    121102 0:55:15 [Note] WSREP: EVS version 0
    121102 0:55:15 [Note] WSREP: PC version 0
    121102 0:55:15 [Note] WSREP: gcomm: connecting to group 'lptlive', peer '192.168.100.101:4010'
    121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, 'tcp://0.0.0.0:4010') turning message relay requesting on, nonlive peers: tcp://192.168.100.57:4010 tcp://192.168.101.105:4010 tcp://192.168.101.104:4010 tcp://192.168.100.102:4010 tcp://192.168.100.139:4010
    121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, 'tcp://0.0.0.0:4010') cleaning up duplicate 0x28b0660 after established 0x289c490
    121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, 'tcp://0.0.0.0:4010') turning message relay requesting off
    121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, 'tcp://0.0.0.0:4010') cleaning up duplicate 0x28ae630 after established 0x28dc140
    121102 0:55:15 [Note] WSREP: (f6ba0383-2487-11e2-0800-0e4230df1ac6, 'tcp://0.0.0.0:4010') cleaning up duplicate 0x28b0000 after established 0x28db420
    121102 0:55:19 [Note] WSREP: declaring 72f89f95-0729-11e2-0800-e80c813e3134 stable
    121102 0:55:19 [Note] WSREP: declaring a8511994-190a-11e2-0800-580d0a80208f stable
    121102 0:55:19 [Note] WSREP: declaring a882f2cb-190a-11e2-0800-00171a908230 stable
    121102 0:55:19 [Note] WSREP: declaring a8b2f666-190a-11e2-0800-56702e7cb352 stable
    121102 0:55:19 [Note] WSREP: declaring a8beb15d-190a-11e2-0800-81fcb2686439 stable
    121102 0:55:19 [Note] WSREP: declaring a8c0cfb1-190a-11e2-0800-7533cf9b72c7 stable
    121102 0:55:19 [Note] WSREP: view(view_id(PRIM,72f89f95-0729-11e2-0800-e80c813e3134,117) memb {
    72f89f95-0729-11e2-0800-e80c813e3134,
    a8511994-190a-11e2-0800-580d0a80208f,
    a882f2cb-190a-11e2-0800-00171a908230,
    a8b2f666-190a-11e2-0800-56702e7cb352,
    a8beb15d-190a-11e2-0800-81fcb2686439,
    a8c0cfb1-190a-11e2-0800-7533cf9b72c7,
    f6ba0383-2487-11e2-0800-0e4230df1ac6,
    } joined {
    } left {
    } partitioned {
    aae97de1-1831-11e2-0800-ee58636f1b39,
    })
    121102 0:55:19 [Note] WSREP: gcomm: connected
    121102 0:55:19 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    121102 0:55:19 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    121102 0:55:19 [Note] WSREP: Opened channel 'lptlive'
    121102 0:55:19 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 6, memb_num = 7
    121102 0:55:19 [Note] WSREP: Waiting for SST to complete.
    121102 0:55:19 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    121102 0:55:19 [Note] WSREP: STATE EXCHANGE: sent state msg: f95d07c6-2487-11e2-0800-d49782a4e38d
    121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 0 (garb)
    121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 1 (mysqldbdrv03)
    121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 2 (mysqldbdrv01)
    121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 4 (mysqldb03)
    121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 5 (mysqldb02)
    121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 3 (mysqldbdrv02)
    121102 0:55:19 [Note] WSREP: STATE EXCHANGE: got state msg: f95d07c6-2487-11e2-0800-d49782a4e38d from 6 (mysqldb01)
    121102 0:55:19 [Note] WSREP: 'garb' demoted SYNCED->PRIMARY due to gap in history: 8585650 - 8585807
    121102 0:55:19 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 112,
    members = 5/7 (joined/total),
    act_id = 8585807,
    last_appl. = -1,
    protocols = 0/4/2 (gcs/repl/appl),
    group UUID = 72de1b90-022d-11e2-0800-484cdf256d28
    121102 0:55:19 [Note] WSREP: Flow-control interval: [21, 42]
    121102 0:55:19 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8585807)
    121102 0:55:19 [Note] WSREP: State transfer required:
    Group state: 72de1b90-022d-11e2-0800-484cdf256d28:8585807
    Local state: 72de1b90-022d-11e2-0800-484cdf256d28:-1
    121102 0:55:19 [Note] WSREP: New cluster view: global state: 72de1b90-022d-11e2-0800-484cdf256d28:8585807, view# 113: Primary, number of nodes: 7, my index: 6, protocol version 2
    121102 0:55:19 [Warning] WSREP: Gap in state sequence. Need state transfer.
    121102 0:55:19 [Note] WSREP: Node 0 (garb) requested state transfer from '*any*'. Selected 1 (mysqldbdrv03)(SYNCED) as donor.
    121102 0:55:19 [Note] WSREP: 0 (garb): State transfer from 1 (mysqldbdrv03) complete.
    121102 0:55:19 [Note] WSREP: Member 0 (garb) synced with group.
    121102 0:55:20 [Warning] WSREP: 1 (mysqldbdrv03): State transfer to 0 (garb) failed: -1 (Operation not permitted)
    121102 0:55:20 [Note] WSREP: Member 1 (mysqldbdrv03) synced with group.
    121102 0:55:21 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' '192.168.100.103' 'replic:5vaeNeIiTYBRU' '/var/lib/mysql/' '/etc/my.cnf' '4981' 2>sst.err'
    121102 0:55:21 [Note] WSREP: Prepared SST request: xtrabackup|192.168.100.103:4444/xtrabackup_sst
    ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)
    sed: couldn't flush stdout: Broken pipe
    121102 0:55:21 [ERROR] WSREP: Process completed with error: /usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 6 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,a8511994-190a-11e 2-0800-580d0a80208f/mysqldbdrv03/192.168.100.
    121102 0:55:21 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 6 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,a8511994-190a-11e 2-0800-580d0a802
    121102 0:55:21 [Note] WSREP: Assign initial position for certification: 8585807, protocol version: 2
    121102 0:55:21 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.168.100.103:4011
    121102 0:55:21 [Note] WSREP: Node 6 (mysqldb01) requested state transfer from '*any*'. Selected 1 (mysqldbdrv03)(SYNCED) as donor.
    121102 0:55:21 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8585830)
    121102 0:55:21 [Note] WSREP: Requesting state transfer: success, donor: 1
    121102 0:56:57 [Note] WSREP: 1 (mysqldbdrv03): State transfer to 6 (mysqldb01) complete.
    121102 0:56:57 [Note] WSREP: Member 1 (mysqldbdrv03) synced with group.
    121102 0:57:17 [Note] WSREP: SST complete, seqno: 8585950
    121102 0:57:17 [Note] Plugin 'FEDERATED' is disabled.
    121102 0:57:17 InnoDB: The InnoDB memory heap is disabled
    121102 0:57:17 InnoDB: Mutexes and rw_locks use GCC atomic builtins
    121102 0:57:17 InnoDB: Compressed tables use zlib 1.2.3
    121102 0:57:17 InnoDB: Using Linux native AIO
    121102 0:57:17 InnoDB: Initializing buffer pool, size = 5.9G
    InnoDB: mmap(6463488000 bytes) failed; errno 12
    121102 0:57:17 InnoDB: Completed initialization of buffer pool
    121102 0:57:17 InnoDB: Fatal error: cannot allocate memory for the buffer pool
    121102 0:57:17 [ERROR] Plugin 'InnoDB' init function returned error.
    121102 0:57:17 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
    121102 0:57:17 [ERROR] Unknown/unsupported storage engine: InnoDB
    121102 0:57:17 [ERROR] Aborting

    121102 0:57:19 [Note] WSREP: Closing send monitor...
    121102 0:57:19 [Note] WSREP: Closed send monitor.
    121102 0:57:19 [Note] WSREP: gcomm: terminating thread
    121102 0:57:19 [Note] WSREP: gcomm: joining thread
    121102 0:57:19 [Note] WSREP: gcomm: closing backend
    121102 0:57:19 [Note] WSREP: view(view_id(NON_PRIM,72f89f95-0729-11e2-0800-e80c813e3134,1 17) memb {
    f6ba0383-2487-11e2-0800-0e4230df1ac6,
    } joined {
    } left {
    } partitioned {
    72f89f95-0729-11e2-0800-e80c813e3134,
    a8511994-190a-11e2-0800-580d0a80208f,
    a882f2cb-190a-11e2-0800-00171a908230,
    a8b2f666-190a-11e2-0800-56702e7cb352,
    a8beb15d-190a-11e2-0800-81fcb2686439,
    a8c0cfb1-190a-11e2-0800-7533cf9b72c7,
    })
    121102 0:57:19 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
    121102 0:57:19 [Note] WSREP: view((empty))
    121102 0:57:19 [Note] WSREP: gcomm: closed
    121102 0:57:19 [Note] WSREP: Flow-control interval: [8, 16]
    121102 0:57:19 [Note] WSREP: Received NON-PRIMARY.
    121102 0:57:19 [Note] WSREP: Shifting JOINER -> OPEN (TO: 8586007)
    121102 0:57:19 [Note] WSREP: Received self-leave message.
    121102 0:57:19 [Note] WSREP: Flow-control interval: [0, 0]
    121102 0:57:19 [Note] WSREP: Received SELF-LEAVE. Closing connection.
    121102 0:57:19 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 8586007)
    121102 0:57:19 [Note] WSREP: RECV thread exiting 0: Success
    121102 0:57:19 [Note] WSREP: recv_thread() joined.
    121102 0:57:19 [Note] WSREP: Closing slave action queue.
    121102 0:57:19 [Note] WSREP: Service disconnected.
    121102 0:57:19 [Note] WSREP: rollbacker thread exiting
    121102 0:57:20 [Note] WSREP: Some threads may fail to exit.
    121102 0:57:20 [Note] /usr/sbin/mysqld: Shutdown complete

    Error in my_thread_global_end(): 2 threads didn't exit
    121102 00:57:25 mysqld_safe mysqld from pid file /var/lib/mysql/mysqldb01.pid ended
    121102 13:02:33 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    121102 13:02:33 [Note] Flashcache bypass: disabled
    121102 13:02:33 [Note] Flashcache setup error is : open flash device failed

    121102 13:02:33 [Note] WSREP: Read nil XID from storage engines, skipping position init
    121102 13:02:33 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
    121102 13:02:33 [Note] WSREP: wsrep_load(): Galera 2.1(r113) by Codership Oy loaded succesfully.
    121102 13:02:33 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    121102 13:02:33 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
    121102 13:02:33 [Note] WSREP: Passing config to GCS: base_host = 192.168.100.103; evs.send_window = 512; evs.user_send_window = 512; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 4G; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4010; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
    121102 13:02:33 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    121102 13:02:33 [Note] WSREP: wsrep_sst_grab()
    121102 13:02:33 [Note] WSREP: Start replication
    121102 13:02:33 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    121102 13:02:33 [Note] WSREP: protonet asio version 0
    121102 13:02:33 [Note] WSREP: backend: asio
    121102 13:02:33 [Note] WSREP: GMCast version 0
    121102 13:02:33 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, 'tcp://0.0.0.0:4010') listening at tcp://0.0.0.0:4010
    121102 13:02:33 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, 'tcp://0.0.0.0:4010') multicast: , ttl: 1
    121102 13:02:33 [Note] WSREP: EVS version 0
    121102 13:02:33 [Note] WSREP: PC version 0
    121102 13:02:33 [Note] WSREP: gcomm: connecting to group 'lptlive', peer '192.168.100.101:4010'
    121102 13:02:33 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, 'tcp://0.0.0.0:4010') turning message relay requesting on, nonlive peers: tcp://192.168.100.57:4010 tcp://192.168.101.105:4010 tcp://192.168.101.104:4010 tcp://192.168.100.139:4010
    121102 13:02:33 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, 'tcp://0.0.0.0:4010') cleaning up duplicate 0x13d9290 after established 0x13d05e0
    121102 13:02:33 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, 'tcp://0.0.0.0:4010') cleaning up duplicate 0x13f3110 after established 0x13ea570
    121102 13:02:34 [Note] WSREP: (914eeace-24ed-11e2-0800-f3b6d7ea9afa, 'tcp://0.0.0.0:4010') turning message relay requesting off
    121102 13:02:34 [Note] WSREP: declaring 72f89f95-0729-11e2-0800-e80c813e3134 stable
    121102 13:02:34 [Note] WSREP: declaring a8511994-190a-11e2-0800-580d0a80208f stable
    121102 13:02:34 [Note] WSREP: declaring a882f2cb-190a-11e2-0800-00171a908230 stable
    121102 13:02:34 [Note] WSREP: declaring a8b2f666-190a-11e2-0800-56702e7cb352 stable
    121102 13:02:34 [Note] WSREP: declaring a8beb15d-190a-11e2-0800-81fcb2686439 stable
    121102 13:02:34 [Note] WSREP: view(view_id(PRIM,72f89f95-0729-11e2-0800-e80c813e3134,121) memb {
    72f89f95-0729-11e2-0800-e80c813e3134,
    914eeace-24ed-11e2-0800-f3b6d7ea9afa,
    a8511994-190a-11e2-0800-580d0a80208f,
    a882f2cb-190a-11e2-0800-00171a908230,
    a8b2f666-190a-11e2-0800-56702e7cb352,
    a8beb15d-190a-11e2-0800-81fcb2686439,
    } joined {
    } left {
    } partitioned {
    })
    121102 13:02:34 [Note] WSREP: gcomm: connected
    121102 13:02:34 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    121102 13:02:34 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    121102 13:02:34 [Note] WSREP: Opened channel 'lptlive'
    121102 13:02:34 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 6
    121102 13:02:34 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    121102 13:02:34 [Note] WSREP: Waiting for SST to complete.
    121102 13:02:34 [Note] WSREP: STATE EXCHANGE: sent state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a
    121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 0 (garb)
    121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 2 (mysqldbdrv03)
    121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 3 (mysqldbdrv01)
    121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 4 (mysqldbdrv02)
    121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 5 (mysqldb03)
    121102 13:02:34 [Note] WSREP: STATE EXCHANGE: got state msg: 917d7c9d-24ed-11e2-0800-7b9a2dd6372a from 1 (mysqldb01)
    121102 13:02:34 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 116,
    members = 5/6 (joined/total),
    act_id = 8643168,
    last_appl. = -1,
    protocols = 0/4/2 (gcs/repl/appl),
    group UUID = 72de1b90-022d-11e2-0800-484cdf256d28
    121102 13:02:34 [Note] WSREP: Flow-control interval: [20, 39]
    121102 13:02:34 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8643168)
    121102 13:02:34 [Note] WSREP: State transfer required:
    Group state: 72de1b90-022d-11e2-0800-484cdf256d28:8643168
    Local state: 00000000-0000-0000-0000-000000000000:-1
    121102 13:02:34 [Note] WSREP: New cluster view: global state: 72de1b90-022d-11e2-0800-484cdf256d28:8643168, view# 117: Primary, number of nodes: 6, my index: 1, protocol version
    2
    121102 13:02:34 [Warning] WSREP: Gap in state sequence. Need state transfer.
    121102 13:02:36 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' '192.168.100.103' 'replic:5vaeNeIiTYBRU' '/var/lib/mysql/' '/etc/my.cnf' '22659' 2>sst.err'
    121102 13:02:36 [Note] WSREP: Prepared SST request: xtrabackup|192.168.100.103:4444/xtrabackup_sst
    ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)
    sed: couldn't flush stdout: Broken pipe
    121102 13:02:36 [ERROR] WSREP: Process completed with error: /usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 1 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,914eeace-24ed-11e 2-0800-f3b6d7ea9afa/mysqldb01/10.238.
    121102 13:02:36 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 1 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,914eeace-24ed-11e 2-0800-f3b6d7ea9
    121102 13:02:36 [Note] WSREP: Assign initial position for certification: 8643168, protocol version: 2
    121102 13:02:36 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (72de1b90-022d-11e2-0800-484cdf256d28): 1 (Operation not permitted)
    at galera/src/replicator_str.cpprepare_for_IST():439. IST will be unavailable.
    121102 13:02:36 [Note] WSREP: Node 1 (mysqldb01) requested state transfer from '*any*'. Selected 2 (mysqldbdrv03)(SYNCED) as donor.
    121102 13:02:36 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8643172)
    121102 13:02:36 [Note] WSREP: Requesting state transfer: success, donor: 2
    121102 13:04:09 [Note] WSREP: 2 (mysqldbdrv03): State transfer to 1 (mysqldb01) complete.
    121102 13:04:09 [Note] WSREP: Member 2 (mysqldbdrv03) synced with group.
    121102 13:04:32 [Note] WSREP: SST complete, seqno: 8643297
    121102 13:04:32 [Note] Plugin 'FEDERATED' is disabled.
    121102 13:04:32 InnoDB: The InnoDB memory heap is disabled
    121102 13:04:32 InnoDB: Mutexes and rw_locks use GCC atomic builtins
    121102 13:04:32 InnoDB: Compressed tables use zlib 1.2.3
    121102 13:04:32 InnoDB: Using Linux native AIO
    121102 13:04:32 InnoDB: Initializing buffer pool, size = 5.9G
    InnoDB: mmap(6463488000 bytes) failed; errno 12
    121102 13:04:32 InnoDB: Completed initialization of buffer pool
    121102 13:04:32 InnoDB: Fatal error: cannot allocate memory for the buffer pool
    121102 13:04:32 [ERROR] Plugin 'InnoDB' init function returned error.
    121102 13:04:32 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
    121102 13:04:32 [ERROR] Unknown/unsupported storage engine: InnoDB
    121102 13:04:32 [ERROR] Aborting

    121102 13:04:34 [Note] WSREP: Closing send monitor...
    121102 13:04:34 [Note] WSREP: Closed send monitor.
    121102 13:04:34 [Note] WSREP: gcomm: terminating thread
    121102 13:04:34 [Note] WSREP: gcomm: joining thread
    121102 13:04:34 [Note] WSREP: gcomm: closing backend
    121102 13:04:34 [Note] WSREP: view(view_id(NON_PRIM,72f89f95-0729-11e2-0800-e80c813e3134,1 21) memb {
    914eeace-24ed-11e2-0800-f3b6d7ea9afa,
    } joined {
    } left {
    } partitioned {
    72f89f95-0729-11e2-0800-e80c813e3134,
    a8511994-190a-11e2-0800-580d0a80208f,
    a882f2cb-190a-11e2-0800-00171a908230,
    a8b2f666-190a-11e2-0800-56702e7cb352,
    a8beb15d-190a-11e2-0800-81fcb2686439,
    })
    121102 13:04:34 [Note] WSREP: view((empty))
    121102 13:04:34 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
    121102 13:04:34 [Note] WSREP: gcomm: closed
    121102 13:04:34 [Note] WSREP: Flow-control interval: [8, 16]
    121102 13:04:34 [Note] WSREP: Received NON-PRIMARY.
    121102 13:04:34 [Note] WSREP: Shifting JOINER -> OPEN (TO: 8643326)
    121102 13:04:34 [Note] WSREP: Received self-leave message.
    121102 13:04:34 [Note] WSREP: Flow-control interval: [0, 0]
    121102 13:04:34 [Note] WSREP: Received SELF-LEAVE. Closing connection.
    121102 13:04:34 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 8643326)
    121102 13:04:34 [Note] WSREP: RECV thread exiting 0: Success
    121102 13:04:34 [Note] WSREP: recv_thread() joined.
    121102 13:04:34 [Note] WSREP: Closing slave action queue.
    121102 13:04:34 [Note] WSREP: Service disconnected.
    121102 13:04:34 [Note] WSREP: rollbacker thread exiting
    121102 13:04:35 [Note] WSREP: Some threads may fail to exit.
    121102 13:04:35 [Note] /usr/sbin/mysqld: Shutdown complete

    Error in my_thread_global_end(): 2 threads didn't exit
    121102 13:04:40 mysqld_safe mysqld from pid file /var/lib/mysql/mysqldb01.pid ended
    121102 13:11:44 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    121102 13:11:45 [Note] Flashcache bypass: disabled
    121102 13:11:45 [Note] Flashcache setup error is : open flash device failed

    121102 13:11:45 [Note] WSREP: Read nil XID from storage engines, skipping position init
    121102 13:11:45 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
    121102 13:11:45 [Note] WSREP: wsrep_load(): Galera 2.1(r113) by Codership Oy loaded succesfully.
    121102 13:11:45 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    121102 13:11:45 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
    121102 13:11:45 [Note] WSREP: Passing config to GCS: base_host = 192.168.100.103; evs.send_window = 512; evs.user_send_window = 512; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 4G; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://0.0.0.0:4010; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
    121102 13:11:45 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    121102 13:11:45 [Note] WSREP: wsrep_sst_grab()
    121102 13:11:45 [Note] WSREP: Start replication
    121102 13:11:45 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    121102 13:11:45 [Note] WSREP: protonet asio version 0
    121102 13:11:45 [Note] WSREP: backend: asio
    121102 13:11:45 [Note] WSREP: GMCast version 0
    121102 13:11:45 [Note] WSREP: (d9fd24c5-24ee-11e2-0800-044cf4047e4f, 'tcp://0.0.0.0:4010') listening at tcp://0.0.0.0:4010
    121102 13:11:45 [Note] WSREP: (d9fd24c5-24ee-11e2-0800-044cf4047e4f, 'tcp://0.0.0.0:4010') multicast: , ttl: 1
    121102 13:11:45 [Note] WSREP: EVS version 0
    121102 13:11:45 [Note] WSREP: PC version 0
    121102 13:11:45 [Note] WSREP: gcomm: connecting to group 'lptlive', peer '192.168.100.101:4010'
    121102 13:11:45 [Note] WSREP: (d9fd24c5-24ee-11e2-0800-044cf4047e4f, 'tcp://0.0.0.0:4010') turning message relay requesting on, nonlive peers: tcp://192.168.100.57:4010 tcp://192.168.101.105:4010 tcp://192.168.101.104:4010 tcp://192.168.100.139:4010
    121102 13:11:45 [Note] WSREP: (d9fd24c5-24ee-11e2-0800-044cf4047e4f, 'tcp://0.0.0.0:4010') turning message relay requesting off
    121102 13:11:45 [Note] WSREP: declaring 72f89f95-0729-11e2-0800-e80c813e3134 stable
    121102 13:11:45 [Note] WSREP: declaring a8511994-190a-11e2-0800-580d0a80208f stable
    121102 13:11:45 [Note] WSREP: declaring a882f2cb-190a-11e2-0800-00171a908230 stable
    121102 13:11:45 [Note] WSREP: declaring a8b2f666-190a-11e2-0800-56702e7cb352 stable
    121102 13:11:45 [Note] WSREP: declaring a8beb15d-190a-11e2-0800-81fcb2686439 stable
    121102 13:11:45 [Note] WSREP: view(view_id(PRIM,72f89f95-0729-11e2-0800-e80c813e3134,123) memb {
    72f89f95-0729-11e2-0800-e80c813e3134,
    a8511994-190a-11e2-0800-580d0a80208f,
    a882f2cb-190a-11e2-0800-00171a908230,
    a8b2f666-190a-11e2-0800-56702e7cb352,
    a8beb15d-190a-11e2-0800-81fcb2686439,
    d9fd24c5-24ee-11e2-0800-044cf4047e4f,
    } joined {
    } left {
    } partitioned {
    })
    121102 13:11:45 [Note] WSREP: gcomm: connected
    121102 13:11:45 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    121102 13:11:45 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    121102 13:11:45 [Note] WSREP: Opened channel 'lptlive'
    121102 13:11:45 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 5, memb_num = 6
    121102 13:11:45 [Note] WSREP: Waiting for SST to complete.
    121102 13:11:45 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    121102 13:11:45 [Note] WSREP: STATE EXCHANGE: sent state msg: da2023f8-24ee-11e2-0800-d38b758c56bd
    121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 0 (garb)
    121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 1 (mysqldbdrv03)
    121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 3 (mysqldbdrv02)
    121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 4 (mysqldb03)
    121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 2 (mysqldbdrv01)
    121102 13:11:45 [Note] WSREP: STATE EXCHANGE: got state msg: da2023f8-24ee-11e2-0800-d38b758c56bd from 5 (mysqldb01)
    121102 13:11:45 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 118,
    members = 5/6 (joined/total),
    act_id = 8643880,
    last_appl. = -1,
    protocols = 0/4/2 (gcs/repl/appl),
    group UUID = 72de1b90-022d-11e2-0800-484cdf256d28
    121102 13:11:45 [Note] WSREP: Flow-control interval: [20, 39]
    121102 13:11:45 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8643880)
    121102 13:11:45 [Note] WSREP: State transfer required:
    Group state: 72de1b90-022d-11e2-0800-484cdf256d28:8643880
    Local state: 00000000-0000-0000-0000-000000000000:-1
    121102 13:11:45 [Note] WSREP: New cluster view: global state: 72de1b90-022d-11e2-0800-484cdf256d28:8643880, view# 119: Primary, number of nodes: 6, my index: 5, protocol version 2
    121102 13:11:45 [Warning] WSREP: Gap in state sequence. Need state transfer.
    121102 13:11:47 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' '192.168.100.103' 'replic:5vaeNeIiTYBRU' '/var/lib/mysql/' '/etc/my.cnf' '12805' 2>sst.err'
    121102 13:11:47 [Note] WSREP: Prepared SST request: xtrabackup|192.168.100.103:4444/xtrabackup_sst
    ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)
    sed: couldn't flush stdout: Broken pipe
    121102 13:11:47 [ERROR] WSREP: Process completed with error: /usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 5 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,a8511994-190a-11e 2-0800-580d0a80208f/mysqldbdrv03/192.168.100.
    121102 13:11:47 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/usr/bin/wsrep_notify.sh --status Joiner --uuid 72de1b90-022d-11e2-0800-484cdf256d28 --primary yes --index 5 --members 72f89f95-0729-11e2-0800-e80c813e3134/garb/,a8511994-190a-11e 2-0800-580d0a802
    121102 13:11:47 [Note] WSREP: Assign initial position for certification: 8643880, protocol version: 2
    121102 13:11:47 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (72de1b90-022d-11e2-0800-484cdf256d28): 1 (Operation not permitted)
    at galera/src/replicator_str.cpprepare_for_IST():439. IST will be unavailable.
    121102 13:11:47 [Note] WSREP: Node 5 (mysqldb01) requested state transfer from '*any*'. Selected 1 (mysqldbdrv03)(SYNCED) as donor.
    121102 13:11:47 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8643883)
    121102 13:11:47 [Note] WSREP: Requesting state transfer: success, donor: 1
    121102 13:13:24 [Note] WSREP: 1 (mysqldbdrv03): State transfer to 5 (mysqldb01) complete.
    121102 13:13:24 [Note] WSREP: Member 1 (mysqldbdrv03) synced with group.
    121102 13:13:38 [Note] WSREP: SST complete, seqno: 8644014
    121102 13:13:38 [Note] Plugin 'FEDERATED' is disabled.
    121102 13:13:38 InnoDB: The InnoDB memory heap is disabled
    121102 13:13:38 InnoDB: Mutexes and rw_locks use GCC atomic builtins
    121102 13:13:38 InnoDB: Compressed tables use zlib 1.2.3
    121102 13:13:38 InnoDB: Using Linux native AIO
    121102 13:13:38 InnoDB: Initializing buffer pool, size = 5.9G
    121102 13:13:38 InnoDB: Completed initialization of buffer pool
    121102 13:13:38 InnoDB: highest supported file format is Barracuda.
    121102 13:13:38 InnoDB: Waiting for the background threads to start
    121102 13:13:39 Percona XtraDB (http://www.percona.com) 1.1.8-rel25.3 started; log sequence number 31925464588
    121102 13:13:39 [Note] Recovering after a crash using mysql-bin
    121102 13:13:40 [Note] Starting crash recovery...
    121102 13:13:40 [Note] Crash recovery finished.
    121102 13:13:40 [Note] Server hostname (bind-address): '(null)'; port: 3306
    121102 13:13:40 [Note] - '(null)' resolves to '0.0.0.0';
    121102 13:13:40 [Note] - '(null)' resolves to '::';
    121102 13:13:40 [Note] Server socket created on IP: '0.0.0.0'.
    121102 13:13:40 [Warning] 'proxies_priv' entry '@ root@mysqldb02' ignored in --skip-name-resolve mode.
    121102 13:13:40 [Note] Event Scheduler: Loaded 0 events
    121102 13:13:40 [Note] WSREP: Signalling provider to continue.
    121102 13:13:40 [Note] WSREP: Received SST: 72de1b90-022d-11e2-0800-484cdf256d28:8644014
    121102 13:13:40 [Note] WSREP: SST received: 72de1b90-022d-11e2-0800-484cdf256d28:8644014
    121102 13:13:40 [Note] WSREP: 5 (mysqldb01): State transfer from 1 (mysqldbdrv03) complete


    The /var/log/messages at the time of failure says:

    Nov 2 00:55:12 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4850 from=::ffff:192.168.100.102
    Nov 2 00:55:12 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4851 from=::ffff:192.168.100.101
    Nov 2 00:55:13 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4851 duration=1(sec)
    Nov 2 00:55:13 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4850 duration=1(sec)
    Nov 2 00:55:13 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4909 from=::ffff:192.168.100.101
    Nov 2 00:55:13 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4910 from=::ffff:192.168.100.102
    Nov 2 00:55:14 mysqldbsrv01 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
    Nov 2 00:55:14 mysqldbsrv01 kernel: mysqld cpuset=/ mems_allowed=0
    Nov 2 00:55:14 mysqldbsrv01 kernel: Pid: 4457, comm: mysqld Not tainted 2.6.32-220.el6.x86_64 #1
    Nov 2 00:55:14 mysqldbsrv01 kernel: Call Trace:
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? cpuset_print_task_mems_allowed+0x91/0xb0
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? dump_header+0x90/0x1b0
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? security_real_capable_noaudit+0x3c/0x70
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? oom_kill_process+0x8a/0x2c0
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? select_bad_process+0xe1/0x120
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? out_of_memory+0x220/0x3c0
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? __alloc_pages_nodemask+0x89e/0x940
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? alloc_pages_current+0xaa/0x110
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? __page_cache_alloc+0x87/0x90
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? __do_page_cache_readahead+0xdb/0x210
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? ra_submit+0x21/0x30
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? filemap_fault+0x4c3/0x500
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? __do_fault+0x54/0x510
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? handle_pte_fault+0xf7/0xb50
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? sock_aio_read+0x181/0x190
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? handle_mm_fault+0x1e4/0x2b0
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? __do_page_fault+0x139/0x480
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? lock_sock_nested+0xac/0xc0
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? _spin_unlock_bh+0x1b/0x20
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? release_sock+0xce/0xe0
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? sock_setsockopt+0x16d/0x6f0
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? do_page_fault+0x3e/0xa0
    Nov 2 00:55:14 mysqldbsrv01 kernel: [] ? page_fault+0x25/0x30
    Nov 2 00:55:14 mysqldbsrv01 kernel: Mem-Info:
    Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA per-cpu:
    Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 0: hi: 0, btch: 1 usd: 0
    Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 1: hi: 0, btch: 1 usd: 0
    Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA32 per-cpu:
    Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 0: hi: 186, btch: 31 usd: 21
    Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 1: hi: 186, btch: 31 usd: 46
    Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 Normal per-cpu:
    Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 0: hi: 186, btch: 31 usd: 34
    Nov 2 00:55:14 mysqldbsrv01 kernel: CPU 1: hi: 186, btch: 31 usd: 53
    Nov 2 00:55:14 mysqldbsrv01 kernel: active_anon:1648167 inactive_anon:322976 isolated_anon:0
    Nov 2 00:55:14 mysqldbsrv01 kernel: active_file:210 inactive_file:3276 isolated_file:0
    Nov 2 00:55:14 mysqldbsrv01 kernel: unevictable:0 dirty:0 writeback:0 unstable:0
    Nov 2 00:55:14 mysqldbsrv01 kernel: free:25820 slab_reclaimable:2474 slab_unreclaimable:12998
    Nov 2 00:55:14 mysqldbsrv01 kernel: mapped:759 shmem:0 pagetables:7759 bounce:0
    Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA free:15660kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15268kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
    Nov 2 00:55:14 mysqldbsrv01 kernel: lowmem_reserve[]: 0 3000 8050 8050
    Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA32 free:45348kB min:25140kB low:31424kB high:37708kB active_anon:2310072kB inactive_anon:578088kB active_file:24kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:24kB shmem:0kB slab_reclaimable:144kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1636kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:33 all_unreclaimable? no
    Nov 2 00:55:14 mysqldbsrv01 kernel: lowmem_reserve[]: 0 0 5050 5050
    Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 Normal free:42272kB min:42316kB low:52892kB high:63472kB active_anon:4282596kB inactive_anon:713816kB active_file:816kB inactive_file:13104kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:5171200kB mlocked:0kB dirty:0kB writeback:0kB mapped:3184kB shmem:0kB slab_reclaimable:9752kB slab_unreclaimable:51992kB kernel_stack:1352kB pagetables:29400kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:608 all_unreclaimable? yes
    Nov 2 00:55:14 mysqldbsrv01 kernel: lowmem_reserve[]: 0 0 0 0
    Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA: 1*4kB 1*8kB 0*16kB 1*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15660kB
    Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 Normal: 416*4kB 282*8kB 205*16kB 148*32kB 122*64kB 64*128kB 28*256kB 10*512kB 2*1024kB 0*2048kB 0*4096kB = 42272kB
    Nov 2 00:55:14 mysqldbsrv01 kernel: 9145 total pagecache pages
    Nov 2 00:55:14 mysqldbsrv01 kernel: 5631 pages in swap cache
    Nov 2 00:55:14 mysqldbsrv01 kernel: Swap cache stats: add 684333, delete 678702, find 182870270/182890443
    Nov 2 00:55:14 mysqldbsrv01 kernel: Free swap = 0kB
    Nov 2 00:55:14 mysqldbsrv01 kernel: Total swap = 999992kB
    Nov 2 00:55:14 mysqldbsrv01 kernel: 2097136 pages RAM
    Nov 2 00:55:14 mysqldbsrv01 kernel: 48654 pages reserved
    Nov 2 00:55:14 mysqldbsrv01 kernel: 2079 pages shared
    Nov 2 00:55:14 mysqldbsrv01 kernel: 2018022 pages non-shared
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 453] 0 453 2880 0 1 -17 -1000 udevd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1013] 0 1013 37677 114 0 0 0 vmtoolsd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1266] 0 1266 23306 31 0 -17 -1000 auditd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1282] 0 1282 62714 105 0 0 0 rsyslogd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1432] 0 1432 29321 24 1 0 0 crond
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1460] 0 1460 1029 1 0 0 0 mingetty
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1462] 0 1462 1029 1 0 0 0 mingetty
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1464] 0 1464 3077 0 0 -17 -1000 udevd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1465] 0 1465 3077 0 0 -17 -1000 udevd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1466] 0 1466 1029 1 1 0 0 mingetty
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1468] 0 1468 1029 1 1 0 0 mingetty
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 1470] 0 1470 1029 1 1 0 0 mingetty
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3338] 81 3338 7920 1 1 0 0 dbus-daemon
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3366] 0 3366 5767 0 0 0 0 oddjobd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3476] 0 3476 16017 24 1 0 0 sshd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3527] 38 3527 7014 39 0 0 0 ntpd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3894] 0 3894 44097 54 1 0 0 sssd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3896] 0 3896 635966 456032 0 0 0 sssd_be
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3898] 0 3898 156200 41587 0 0 0 sssd_nss
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 3899] 0 3899 41845 28 0 0 0 sssd_pam
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4893] 0 4893 12909 23 1 0 0 master
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4896] 89 4896 13463 17 0 0 0 qmgr
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 5541] 0 5541 33684 8872 0 0 0 ruby
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 6798] 0 6798 1029 1 1 0 0 mingetty
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 8461] 0 8461 5536 32 0 0 0 xinetd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [23283] 0 23283 2873 17 0 0 0 mysqld_safe
    Nov 2 00:55:14 mysqldbsrv01 kernel: [26543] 0 26543 49177 142 1 0 0 snmpd
    Nov 2 00:55:14 mysqldbsrv01 kernel: [15222] 498 15222 3092711 1457974 1 0 0 mysqld
    Nov 2 00:55:14 mysqldbsrv01 kernel: [22692] 20504 22692 2614 38 1 0 0 nrpe
    Nov 2 00:55:14 mysqldbsrv01 kernel: [27036] 89 27036 13445 140 1 0 0 pickup
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4871] 20504 4871 2614 34 0 0 0 nrpe
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4872] 20504 4872 2615 34 1 0 0 nrpe
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4873] 20504 4873 27054 69 1 0 0 pmp-check-mysql
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4875] 20504 4875 2614 34 0 0 0 nrpe
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4876] 20504 4876 2615 34 1 0 0 nrpe
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4877] 20504 4877 27054 69 0 0 0 pmp-check-mysql
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4879] 20504 4879 2614 34 0 0 0 nrpe
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4880] 20504 4880 2615 34 1 0 0 nrpe
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4881] 20504 4881 27054 69 0 0 0 pmp-check-mysql
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4882] 20504 4882 27087 77 1 0 0 pmp-check-mysql
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4883] 20504 4883 27087 77 0 0 0 pmp-check-mysql
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4884] 20504 4884 27087 77 0 0 0 pmp-check-mysql
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4906] 20504 4906 49488 391 0 0 0 mysql
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4907] 20504 4907 49488 391 0 0 0 mysql
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4908] 20504 4908 49488 388 0 0 0 mysql
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4909] 99 4909 26521 176 0 0 0 clustercheck
    Nov 2 00:55:14 mysqldbsrv01 kernel: [ 4910] 99 4910 26521 177 0 0 0 clustercheck
    Nov 2 00:55:14 mysqldbsrv01 kernel: Out of memory: Kill process 15222 (mysqld) score 654 or sacrifice child
    Nov 2 00:55:14 mysqldbsrv01 kernel: Killed process 15222, UID 498, (mysqld) total-vm:12370844kB, anon-rss:5830224kB, file-rss:1672kB
    Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4922 from=::ffff:192.168.100.101
    Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4923 from=::ffff:192.168.100.102
    Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4909 duration=1(sec)
    Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4923 duration=0(sec)
    Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4922 duration=0(sec)
    Nov 2 00:55:14 mysqldbsrv01 xinetd[8461]: EXIT: mysqlchk status=141 pid=4910 duration=1(sec)
    Nov 2 00:55:15 mysqldbsrv01 xinetd[8461]: START: mysqlchk pid=4983 from=::ffff:192.168.100.101

    Nov 2 00:55:14 mysqldbsrv01 kernel: Node 0 DMA32: 353*4kB 146*8kB 71*16kB 53*32kB 36*64kB 28*128kB 21*256kB 20*512kB 10*1024kB 4*2048kB 0*4096kB = 45348kB

  • #2
    No one able to comment?

    Comment


    • #3
      Well this is just a guess, but in your log messages I see some notices about running out of memory, so that's probably the root cause.


      121102 0:57:17 InnoDB: Fatal error: cannot allocate memory for the buffer pool


      And this point kernel oom killer has step in by killing mysqld:

      Nov 2 00:55:14 mysqldbsrv01 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0


      I'm currently evaluating Galera cluster by myself and I'm having problem with excessive memory usage compared to standalone mysql server, so maybe these issues are related. http://forum.percona.com/index.php?t=msg&th=2871

      Comment


      • #4
        Thanks for the response...

        The error messages that it had run out of memory are noted but there isn't any explanation as to why it happened (why I asked were there some other logs that should be looked at) the servers and databases were not under load high IO or had anything that could obviously explain it ... the nodes run fine, and then pop and go down... but why?

        I also note the apparent memory leak in 5.5.27 cluster and that 5.5.28 standalone doesn't have the same leak after your post.

        Maybe the problems I am seeing are related.

        Dom

        Comment

        Working...
        X