Announcement

Announcement Module
Collapse
No announcement yet.

All nodes in the cluster becomes inaccessible

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • All nodes in the cluster becomes inaccessible

    It has now happened twice.

    Nodes in the cluster become inaccessible (via MySQL), hours from each other, which leaves the whole cluster in an inaccessible state. We're using version 5.6.

    When I checked the states of each node, found out that (node 2 no longer in service):
    - node 1 still has the mysql service running
    - node 3 mysql service no longer running
    - node 4 mysql service no longer running

    The following are the logs from each of the nodes.

    Node1

    Code:
    2014-05-15 02:51:44 25544 [Note] WSREP: (da81411f-d34b-11e3-b420-c65dd499bd5c, 'tcp://0.0.0.0:4567') reconnecting to eb65461e-d34b-11e3-b8e3-16c13b906ff5 (tcp://192.168.0.62:4567), attempt 112080
    2014-05-15 02:52:29 25544 [Note] WSREP: (da81411f-d34b-11e3-b420-c65dd499bd5c, 'tcp://0.0.0.0:4567') reconnecting to eb65461e-d34b-11e3-b8e3-16c13b906ff5 (tcp://192.168.0.62:4567), attempt 112110


    Node3

    Code:
    2014-05-13 01:41:36 29139 [Warning] WSREP: SYNC message from member 0 in non-primary configuration. Ignored.
    2014-05-13 01:41:36 29139 [Note] WSREP: STATE EXCHANGE: sent state msg: 71ba447f-da7a-11e3-8952-f2887c244e09
    2014-05-13 01:41:36 29139 [Note] WSREP: STATE EXCHANGE: got state msg: 71ba447f-da7a-11e3-8952-f2887c244e09 from 0 (node1)
    2014-05-13 01:41:36 29139 [Note] WSREP: STATE EXCHANGE: got state msg: 71ba447f-da7a-11e3-8952-f2887c244e09 from 1 (node3)
    2014-05-13 01:41:36 29139 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 84,
        members    = 2/2 (joined/total),
        act_id     = 1451246,
        last_appl. = 1451221,
        protocols  = 0/5/2 (gcs/repl/appl),
        group UUID = da819e4a-d34b-11e3-bb14-4a770834cff0
    2014-05-13 01:41:36 29139 [Note] WSREP: Flow-control interval: [23, 23]
    2014-05-13 01:41:36 29139 [Note] WSREP: New cluster view: global state: da819e4a-d34b-11e3-bb14-4a770834cff0:1451246, view# 85: Primary, number of nodes: 2, my index: 1, protocol version 2
    2014-05-13 01:41:36 29139 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    2014-05-13 01:41:36 29139 [Note] WSREP: REPL Protocols: 5 (3, 1)
    2014-05-13 01:41:36 29139 [Note] WSREP: Assign initial position for certification: 1451246, protocol version: 3
    2014-05-13 01:41:36 29139 [Note] WSREP: Member 0 (node1) synced with group.
    2014-05-13 01:41:36 29139 [Note] WSREP: Service thread queue flushed.
    2014-05-13 01:41:41 29139 [Note] WSREP:  cleaning up 7ecbfb50-d356-11e3-bc64-ea7ee5869f45 (tcp://192.168.0.63:4567)
    2014-05-13 03:41:50 29139 [ERROR] /usr/sbin/mysqld: Sort aborted: Query execution was interrupted
    2014-05-13 03:41:50 29139 [ERROR] /usr/sbin/mysqld: Sort aborted: Query execution was interrupted
    2014-05-13 03:43:38 29139 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.52864S), skipping check
    2014-05-13 03:50:40 29139 [ERROR] /usr/sbin/mysqld: Sort aborted: Query execution was interrupted
    Killed
    140513 04:09:47 mysqld_safe Number of processes running now: 0
    140513 04:09:47 mysqld_safe WSREP: not restarting wsrep node automatically
    140513 04:09:47 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

    Node4

    Code:
    2014-05-13 01:41:27 32604 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
    2014-05-13 01:41:27 32604 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 5f6aa149-da7a-11e3-9e55-47df500e7415
    2014-05-13 01:41:27 32604 [Note] WSREP: STATE EXCHANGE: sent state msg: 5f6aa149-da7a-11e3-9e55-47df500e7415
    2014-05-13 01:41:27 32604 [Note] WSREP: STATE EXCHANGE: got state msg: 5f6aa149-da7a-11e3-9e55-47df500e7415 from 0 (node4)
    2014-05-13 01:41:27 32604 [Note] WSREP: STATE EXCHANGE: got state msg: 5f6aa149-da7a-11e3-9e55-47df500e7415 from 1 (node1)
    2014-05-13 01:41:27 32604 [Note] WSREP: STATE EXCHANGE: got state msg: 5f6aa149-da7a-11e3-9e55-47df500e7415 from 2 (node3)
    2014-05-13 01:41:27 32604 [Note] WSREP: Quorum results:
         version    = 3,
         component  = PRIMARY,
         conf_id    = 83,
         members    = 2/3 (joined/total),
         act_id     = 1451231,
         last_appl. = 1451210,
         protocols  = 0/5/2 (gcs/repl/appl),
         group UUID = da819e4a-d34b-11e3-bb14-4a770834cff0
    2014-05-13 01:41:27 32604 [Note] WSREP: Flow-control interval: [28, 28]
    2014-05-13 01:41:27 32604 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 1451231)
    2014-05-13 01:41:27 32604 [Note] WSREP: State transfer required:
         Group state: da819e4a-d34b-11e3-bb14-4a770834cff0:1451231
         Local state: da819e4a-d34b-11e3-bb14-4a770834cff0:1451222
    2014-05-13 01:41:27 32604 [Note] WSREP: New cluster view: global state: da819e4a-d34b-11e3-bb14-4a770834cff0:1451231, view# 84: Primary, number of nodes: 3, my index: 0, protocol version 2
    2014-05-13 01:41:27 32604 [Warning] WSREP: Gap in state sequence. Need state transfer.
    2014-05-13 01:41:29 32604 [Note] WSREP: killing local connection: 166283
    2014-05-13 01:41:29 32604 [ERROR] WSREP: Node IP address not obtained from bind_address, trying alternate methods
    2014-05-13 01:41:29 32604 [Note] WSREP: You have configured 'rsync' state snapshot transfer method which cannot be performed on a running server. Wsrep provider won't be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server.
    2014-05-13 01:41:29 32604 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    2014-05-13 01:41:29 32604 [Note] WSREP: REPL Protocols: 5 (3, 1)
    2014-05-13 01:41:29 32604 [Note] WSREP: Assign initial position for certification: 1451231, protocol version: 3
    2014-05-13 01:41:29 32604 [Note] WSREP: Service thread queue flushed.
    2014-05-13 01:41:29 32604 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.168.0.63:4568
    2014-05-13 01:41:29 32604 [Note] WSREP: Node 0.0 (node4) requested state transfer from '*any*'. Selected 1.0 (node1)(SYNCED) as donor.
    2014-05-13 01:41:29 32604 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1451238)
    2014-05-13 01:41:29 32604 [Note] WSREP: Requesting state transfer: success, donor: 1
    2014-05-13 01:41:29 32604 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): discarded 85194386 bytes
    2014-05-13 01:41:29 32604 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): found 0/7 locked buffers
    2014-05-13 01:41:29 32604 [Note] WSREP: Receiving IST: 9 writesets, seqnos 1451222-1451231
    mysqld: /mnt/workspace/percona-xtradb-cluster-5.6-debs/label_exp/ubuntu-precise-64bit/target/Percona-XtraDB-Cluster-5.6.15/sql/wsrep_applier.cc:321: wsrep_cb_status_t wsrep_commit_cb(void*, uint32_t, const wsrep_trx_meta_t*, wsrep_bool_t*, bool): Assertion `meta->gtid.seqno == wsrep_thd_trx_seqno(thd)' failed.
    08:41:32 UTC - mysqld got signal 6 ;
    This could be because you hit a bug. It is also possible that this binary
    or one of the libraries it was linked against is corrupt, improperly built,
    or misconfigured. This error can also be caused by malfunctioning hardware.
    We will try our best to scrape up some info that will hopefully help
    diagnose the problem, but since we have already crashed,
    something is definitely wrong and this may fail.
    Please help us make Percona XtraDB Cluster better by reporting any
    bugs at https://bugs.launchpad.net/percona-xtradb-cluster
    
    key_buffer_size=8388608
    read_buffer_size=131072
    max_used_connections=325
    max_threads=502
    thread_count=9
    connection_count=9
    It is possible that mysqld could use up to
    key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 208471 K  bytes of memory
    Hope that's ok; if not, decrease some variables in the equation.
    
    Thread pointer: 0x7f01c0000990
    
    stack_bottom = 7f023c2b9a50 thread_stack 0x40000
    /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x8ea20e]
    /usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x68d104]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f028290fcb0]
    /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f0281d69445]
    /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f0281d6cbab]
    /lib/x86_64-linux-gnu/libc.so.6(+0x2f10e)[0x7f0281d6210e]
    /lib/x86_64-linux-gnu/libc.so.6(+0x2f1b2)[0x7f0281d621b2]
    /usr/sbin/mysqld[0x5d81ab]
    /usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0x152)[0x7f02621b43b2]
    /usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM8recv_ISTEPv+0x294)[0x7f02621c1e04]
    /usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x332)[0x7f02621b6682]
    /usr/lib/libgalera_smm.so(galera_recv+0x1e)[0x7f02621c8bbe]
    /usr/sbin/mysqld[0x5d83d1]
    /usr/sbin/mysqld(start_wsrep_THD+0x41d)[0x5c0f5d]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f0282907e9a]
    /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f0281e254bd]
    
    Trying to get some variables.
    Some pointers may be invalid and cause the dump to abort.
    Query (0): is an invalid pointer
    Connection ID (thread ID): 5
    Status: NOT_KILLED
    
    140513 01:41:32 mysqld_safe Number of processes running now: 0
    140513 01:41:32 mysqld_safe WSREP: not restarting wsrep node automatically
    140513 01:41:32 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended


    From what I understand from the logs, seems like Node4 crashed while trying to sync with the group, and indicates it could possibly be a bug. Then with Node3, crashed a few hours after syncing with Node1. Could the fact that there were only two remaining nodes running caused this?

    MySQL service in Node1 was still running, but was trying to connect to Node4, and could not be accessed. Why? Why wasn't it able to stand on its own?


    Here's the cluster address configuration for each of the nodes.

    Code:
    wsrep_cluster_address          = "gcomm://192.168.0.60,192.168.0.62,192.168.0.63"


    Any solution on how we can prevent this from happening again?

    Thanks!

  • #2
    Encountered another crash on Node4 just now:

    Code:
    2014-05-16 05:31:11 21958 [Warning] WSREP: Gap in state sequence. Need state transfer.
    2014-05-16 05:31:11 21958 [ERROR] /usr/sbin/mysqld: Sort aborted: Query execution was interrupted
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 1340
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 1134
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 25
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 1173
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 1343
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 1101
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 1268
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 1070
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 983
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 1041
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 1385
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 1238
    2014-05-16 05:31:13 21958 [Note] WSREP: killing local connection: 1215
    2014-05-16 05:31:13 21958 [ERROR] WSREP: Node IP address not obtained from bind_address, trying alternate methods
    2014-05-16 05:31:14 21958 [Note] WSREP: You have configured 'rsync' state snapshot transfer method which cannot be performed on a running server. Wsrep provider won't be able to fall back to it if other means of state transfer are unavailable. In that case you will need to restart the server.
    2014-05-16 05:31:14 21958 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    2014-05-16 05:31:14 21958 [Note] WSREP: REPL Protocols: 5 (3, 1)
    2014-05-16 05:31:14 21958 [Note] WSREP: Assign initial position for certification: 1614379, protocol version: 3
    2014-05-16 05:31:14 21958 [Note] WSREP: Service thread queue flushed.
    2014-05-16 05:31:14 21958 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.168.0.63:4568
    2014-05-16 05:31:14 21958 [Note] WSREP: Node 1.0 (node4) requested state transfer from '*any*'. Selected 0.0 (node3)(SYNCED) as donor.
    2014-05-16 05:31:14 21958 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1614386)
    2014-05-16 05:31:14 21958 [Note] WSREP: Requesting state transfer: success, donor: 0
    2014-05-16 05:31:14 21958 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): discarded 109701967 bytes
    2014-05-16 05:31:14 21958 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): found 0/29 locked buffers
    2014-05-16 05:31:14 21958 [Note] WSREP: Receiving IST: 46 writesets, seqnos 1614333-1614379
    mysqld: /mnt/workspace/percona-xtradb-cluster-5.6-debs/label_exp/ubuntu-precise-64bit/target/Percona-XtraDB-Cluster-5.6.15/sql/wsrep_applier.cc:321: wsrep_cb_status_t wsrep_commit_cb(void*, uint32_t, const wsrep_trx_meta_t*, wsrep_bool_t*, bool): Assertion `meta->gtid.seqno == wsrep_thd_trx_seqno(thd)' failed.
    12:31:14 UTC - mysqld got signal 6 ;
    This could be because you hit a bug. It is also possible that this binary
    or one of the libraries it was linked against is corrupt, improperly built,
    or misconfigured. This error can also be caused by malfunctioning hardware.
    We will try our best to scrape up some info that will hopefully help
    diagnose the problem, but since we have already crashed, 
    something is definitely wrong and this may fail.
    Please help us make Percona XtraDB Cluster better by reporting any
    bugs at https://bugs.launchpad.net/percona-xtradb-cluster
    
    key_buffer_size=8388608
    read_buffer_size=131072
    max_used_connections=36
    max_threads=502
    thread_count=9
    connection_count=9
    It is possible that mysqld could use up to 
    key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 208471 K  bytes of memory
    Hope that's ok; if not, decrease some variables in the equation.
    
    Thread pointer: 0x7f6ca8000990
    Attempting backtrace. You can use the following information to find out
    where mysqld died. If you see no messages after this, something went
    terribly wrong...
    stack_bottom = 7f6d54631a50 thread_stack 0x40000
    /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x8ea20e]
    /usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x68d104]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f6d82382cb0]
    /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f6d817dc445]
    /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f6d817dfbab]
    /lib/x86_64-linux-gnu/libc.so.6(+0x2f10e)[0x7f6d817d510e]
    /lib/x86_64-linux-gnu/libc.so.6(+0x2f1b2)[0x7f6d817d51b2]
    /usr/sbin/mysqld[0x5d81ab]
    /usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0x152)[0x7f6d61bcf3b2]
    /usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM8recv_ISTEPv+0x294)[0x7f6d61bdce04]
    /usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x332)[0x7f6d61bd1682]
    /usr/lib/libgalera_smm.so(galera_recv+0x1e)[0x7f6d61be3bbe]
    /usr/sbin/mysqld[0x5d83d1]
    /usr/sbin/mysqld(start_wsrep_THD+0x41d)[0x5c0f5d]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f6d8237ae9a]
    /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6d818984bd]
    
    Trying to get some variables.
    Some pointers may be invalid and cause the dump to abort.
    Query (0): is an invalid pointer
    Connection ID (thread ID): 8
    Status: NOT_KILLED
    
    You may download the Percona XtraDB Cluster operations manual by visiting
    http://www.percona.com/software/percona-xtradb-cluster/. You may find information
    in the manual which will help you identify the cause of the crash.
    140516 05:31:15 mysqld_safe Number of processes running now: 0
    140516 05:31:15 mysqld_safe WSREP: not restarting wsrep node automatically
    140516 05:31:15 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

    Comment


    • #3
      The line "/usr/sbin/mysqld: Sort aborted: Query execution was interrupted" indicates, that there is a problem with one query. Perhaps see here: http://stackoverflow.com/questions/5...r-sort-aborted

      Comment


      • #4
        Sojaducana, it seems you've hit this bug:
        https://bugs.launchpad.net/galera/+bug/1284803
        I was already fixed in Galera 3.5:
        https://launchpad.net/percona-xtradb...one/galera-3.5
        So please upgrade the PXC packages.

        Comment


        • #5
          Hi przemek,

          How will I know the specific package version of Galera our system is running on?

          Edit:

          Nevermind, found the info the I need by:

          Code:
          dpkg -s percona-xtradb-cluster-galera-3.x
          Last edited by sojaducana; 06-06-2014, 12:46 AM.

          Comment


          • #6
            Hi guys,

            I haven't encountered (for some time now) the same issues above (as far as the logs are concerned) after the packages were updated.

            But earlier today, the cluster went down again, where all of the nodes were reportedly inaccessible by our proxy. Upon checking on each of the nodes, found out that the MySQL services were still running.

            From what I minimally understand in the logs, it seems that the nodes suddenly couldn't see each other.

            Please help in understanding what exactly is happening here.

            Node 1
            Code:
            2014-07-02 22:08:12 1835 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 6522848)
            2014-07-02 22:08:12 1835 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
            2014-07-02 22:08:12 1835 [Note] WSREP: Flow-control interval: [16, 16]
            2014-07-02 22:08:12 1835 [Note] WSREP: Received NON-PRIMARY.
            2014-07-02 22:08:12 1835 [Warning] WSREP: Send action {(nil), 45726387, TORDERED} returned -107 (Transport endpoint is not connected)
            2014-07-02 22:08:12 1835 [Warning] WSREP: Send action {(nil), 804, TORDERED} returned -107 (Transport endpoint is not connected)
            2014-07-02 22:08:12 1835 [Warning] WSREP: Send action {(nil), 736, TORDERED} returned -107 (Transport endpoint is not connected)
            2014-07-02 22:08:12 1835 [Warning] WSREP: Send action {(nil), 736, TORDERED} returned -107 (Transport endpoint is not connected)
            2014-07-02 22:08:12 1835 [Warning] WSREP: Send action {(nil), 804, TORDERED} returned -107 (Transport endpoint is not connected)
            2014-07-02 22:08:12 1835 [Note] WSREP: New cluster view: global state: da819e4a-d34b-11e3-bb14-4a770834cff0:6522848, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
            2014-07-02 22:08:12 1835 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2014-07-02 22:08:12 1835 [Note] WSREP: New cluster view: global state: da819e4a-d34b-11e3-bb14-4a770834cff0:6522848, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
            2014-07-02 22:08:12 1835 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2014-07-02 22:08:13 1835 [Note] WSREP: (6d0b8e07-fff9-11e3-9469-feac1f409996, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.60:4567' pointing to uuid 6d0b8e07-fff9-11e3-9469-feac1f409996 is blacklisted, skipping
            2014-07-02 22:08:17 1835 [Note] WSREP: (6d0b8e07-fff9-11e3-9469-feac1f409996, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.60:4567' pointing to uuid 6d0b8e07-fff9-11e3-9469-feac1f409996 is blacklisted, skipping
            2014-07-02 22:08:18 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:18 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:19 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:19 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:20 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:20 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:21 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:25 1835 [Note] WSREP: (6d0b8e07-fff9-11e3-9469-feac1f409996, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.60:4567' pointing to uuid 6d0b8e07-fff9-11e3-9469-feac1f409996 is blacklisted, skipping
            2014-07-02 22:08:25 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:26 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:26 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:27 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:27 1835 [Note] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) suspecting node: 40d6660e-f75c-11e3-adc6-af1ae442c2de
            2014-07-02 22:08:28 1835 [Warning] WSREP: evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)) install timer expired
            evs::proto(evs::proto(6d0b8e07-fff9-11e3-9469-feac1f409996, GATHER, view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85)), GATHER) {
            current_view=view(view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85) memb {
                 6d0b8e07-fff9-11e3-9469-feac1f409996,0
            } joined {
            } left {
            } partitioned {
            }),
            input_map=evs::input_map: {aru_seq=0,safe_seq=0,node_index=node: {idx=0,range=[1,0],safe_seq=0} },
            fifo_seq=3180871,
            last_sent=0,
            known={
                 40d6660e-f75c-11e3-adc6-af1ae442c2de,evs::node{operational=1,suspected=1,installed=0,fifo_seq=14060586,}
                 6d0b8e07-fff9-11e3-9469-feac1f409996,evs::node{operational=1,suspected=0,installed=0,fifo_seq=-1,join_message=
            evs::msg{version=0,type=4,user_type=255,order=1,seq=0,seq_range=-1,aru_seq=0,flags=0,source=6d0b8e07-fff9-11e3-9469-feac1f409996,source_view_id=view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=3180871,node_list=(     40d6660e-f75c-11e3-adc6-af1ae442c2de,node: {operational=1,suspected=1,leave_seq=-1,view_id=view_id(REG,00000000-0000-0000-0000-000000000000,0),safe_seq=-1,im_range=[-1,-1],}
                 6d0b8e07-fff9-11e3-9469-feac1f409996,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85),safe_seq=0,im_range=[1,0],}
            )
            },
            }
             }
             }
            2014-07-02 22:08:28 1835 [Note] WSREP: no install message received
            2014-07-02 22:08:28 1835 [Warning] WSREP: subsequent views have same members, prev view view(view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,85) memb {
                 6d0b8e07-fff9-11e3-9469-feac1f409996,0
            } joined {
            } left {
            } partitioned {
            }) current view view(view_id(REG,6d0b8e07-fff9-11e3-9469-feac1f409996,86) memb {
                 6d0b8e07-fff9-11e3-9469-feac1f409996,0
            } joined {
            } left {
            } partitioned {
            })
            2014-07-02 22:08:28 1835 [Note] WSREP: view(view_id(NON_PRIM,6d0b8e07-fff9-11e3-9469-feac1f409996,86) memb {
                 6d0b8e07-fff9-11e3-9469-feac1f409996,0
            } joined {
            } left {
            } partitioned {
                 40d6660e-f75c-11e3-adc6-af1ae442c2de,0
                 a774e603-0254-11e4-80f4-977f03eae3c4,0
            })
            2014-07-02 22:08:28 1835 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
            2014-07-02 22:08:28 1835 [Note] WSREP: Flow-control interval: [16, 16]
            2014-07-02 22:08:28 1835 [Note] WSREP: Received NON-PRIMARY.
            2014-07-02 22:08:28 1835 [Note] WSREP: New cluster view: global state: da819e4a-d34b-11e3-bb14-4a770834cff0:6522848, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
            2014-07-02 22:08:28 1835 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2014-07-02 22:08:28 1835 [Note] WSREP: declaring 40d6660e-f75c-11e3-adc6-af1ae442c2de stable
            2014-07-02 22:08:28 1835 [Note] WSREP: view(view_id(NON_PRIM,40d6660e-f75c-11e3-adc6-af1ae442c2de,87) memb {
                 40d6660e-f75c-11e3-adc6-af1ae442c2de,0
                 6d0b8e07-fff9-11e3-9469-feac1f409996,0
            } joined {
            } left {
            } partitioned {
                 a774e603-0254-11e4-80f4-977f03eae3c4,0
            })
            2014-07-02 22:08:28 1835 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
            2014-07-02 22:08:28 1835 [Note] WSREP: Flow-control interval: [23, 23]
            2014-07-02 22:08:28 1835 [Note] WSREP: Received NON-PRIMARY.
            2014-07-02 22:08:28 1835 [Note] WSREP: New cluster view: global state: da819e4a-d34b-11e3-bb14-4a770834cff0:6522848, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version 2
            2014-07-02 22:08:28 1835 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2014-07-02 22:08:29 1835 [Note] WSREP: (6d0b8e07-fff9-11e3-9469-feac1f409996, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.60:4567' pointing to uuid 6d0b8e07-fff9-11e3-9469-feac1f409996 is blacklisted, skipping










            Comment


            • #7
              Node 3 (Part 1)

              Code:
              2014-07-02 22:03:46 3317 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.69202S), skipping check
              2014-07-02 22:04:24 3317 [Note] WSREP: (a774e603-0254-11e4-80f4-977f03eae3c4, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.0.60:4567 tcp://192.168.0.63:4567
              2014-07-02 22:04:28 3317 [Note] WSREP: (a774e603-0254-11e4-80f4-977f03eae3c4, 'tcp://0.0.0.0:4567') reconnecting to 6d0b8e07-fff9-11e3-9469-feac1f409996 (tcp://192.168.0.60:4567), attempt 0
              2014-07-02 22:04:30 3317 [Warning] WSREP: last inactive check more than PT1.5S ago (PT44.3581S), skipping check
              2014-07-02 22:04:30 3317 [Warning] WSREP: evs::proto(a774e603-0254-11e4-80f4-977f03eae3c4, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)) install timer expired
              evs::proto(evs::proto(a774e603-0254-11e4-80f4-977f03eae3c4, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)), GATHER) {
              current_view=view(view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84) memb {
                  40d6660e-f75c-11e3-adc6-af1ae442c2de,0
                  6d0b8e07-fff9-11e3-9469-feac1f409996,0
                  a774e603-0254-11e4-80f4-977f03eae3c4,0
              } joined {
              } left {
              } partitioned {
              }),
              input_map=evs::input_map: {aru_seq=67773,safe_seq=67764,node_index=node: {idx=0,range=[67774,67773],safe_seq=67764} node: {idx=1,range=[67775,67774],safe_seq=67766} node: {idx=2,range=[67774,67773],safe_seq=67773} },
              fifo_seq=133201,
              last_sent=67773,
              known={
                  40d6660e-f75c-11e3-adc6-af1ae442c2de,evs::node{operational=1,suspected=0,installed=0,fifo_seq=14060431,join_message=
              evs::msg{version=0,type=4,user_type=255,order=1,seq=67764,seq_range=-1,aru_seq=67764,flags=4,source=40d6660e-f75c-11e3-adc6-af1ae442c2de,source_view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=14060431,node_list=(     40d6660e-f75c-11e3-adc6-af1ae442c2de,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67764,im_range=[67774,67773],}
                  6d0b8e07-fff9-11e3-9469-feac1f409996,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67766,im_range=[67773,67772],}
                  a774e603-0254-11e4-80f4-977f03eae3c4,node: {operational=1,suspected=1,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67764,im_range=[67765,67764],}
              )
              },
              }
                  6d0b8e07-fff9-11e3-9469-feac1f409996,evs::node{operational=1,suspected=0,installed=0,fifo_seq=3180707,}
                  a774e603-0254-11e4-80f4-977f03eae3c4,evs::node{operational=1,suspected=0,installed=0,fifo_seq=-1,join_message=
              evs::msg{version=0,type=4,user_type=255,order=1,seq=67764,seq_range=-1,aru_seq=67773,flags=0,source=a774e603-0254-11e4-80f4-977f03eae3c4,source_view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=133201,node_list=(     40d6660e-f75c-11e3-adc6-af1ae442c2de,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67764,im_range=[67774,67773],}
                  6d0b8e07-fff9-11e3-9469-feac1f409996,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67766,im_range=[67775,67774],}
                  a774e603-0254-11e4-80f4-977f03eae3c4,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67773,im_range=[67774,67773],}
              )
              },
              }
              }
              }
              2014-07-02 22:04:30 3317 [Note] WSREP: no install message received
              2014-07-02 22:04:30 3317 [Note] WSREP: (a774e603-0254-11e4-80f4-977f03eae3c4, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.62:4567' pointing to uuid a774e603-0254-11e4-80f4-977f03eae3c4 is blacklisted, skipping
              2014-07-02 22:04:46 3317 [Warning] WSREP: evs::proto(a774e603-0254-11e4-80f4-977f03eae3c4, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)) install timer expired
              evs::proto(evs::proto(a774e603-0254-11e4-80f4-977f03eae3c4, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)), GATHER) {
              current_view=view(view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84) memb {
                  40d6660e-f75c-11e3-adc6-af1ae442c2de,0
                  6d0b8e07-fff9-11e3-9469-feac1f409996,0
                  a774e603-0254-11e4-80f4-977f03eae3c4,0
              } joined {
              } left {
              } partitioned {
              }),
              input_map=evs::input_map: {aru_seq=67773,safe_seq=67764,node_index=node: {idx=0,range=[67774,67773],safe_seq=67764} node: {idx=1,range=[67775,67774],safe_seq=67766} node: {idx=2,range=[67774,67773],safe_seq=67773} },
              fifo_seq=133216,
              last_sent=67773,
              known={
                  40d6660e-f75c-11e3-adc6-af1ae442c2de,evs::node{operational=0,suspected=1,installed=0,fifo_seq=14060431,}
                  6d0b8e07-fff9-11e3-9469-feac1f409996,evs::node{operational=0,suspected=1,installed=0,fifo_seq=3180707,}
                  a774e603-0254-11e4-80f4-977f03eae3c4,evs::node{operational=1,suspected=0,installed=0,fifo_seq=-1,join_message=
              evs::msg{version=0,type=4,user_type=255,order=1,seq=67764,seq_range=-1,aru_seq=67773,flags=0,source=a774e603-0254-11e4-80f4-977f03eae3c4,source_view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=133216,node_list=(     40d6660e-f75c-11e3-adc6-af1ae442c2de,node: {operational=0,suspected=1,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67764,im_range=[67774,67773],}
                  6d0b8e07-fff9-11e3-9469-feac1f409996,node: {operational=0,suspected=1,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67766,im_range=[67775,67774],}
                  a774e603-0254-11e4-80f4-977f03eae3c4,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67773,im_range=[67774,67773],}
              )
              },
              }
              }
              }

              Comment


              • #8
                Node 3 (part 2)

                Code:
                2014-07-02 22:04:47 3317 [Note] WSREP: no install message received
                2014-07-02 22:04:48 3317 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.06982S), skipping check
                2014-07-02 22:05:02 3317 [Warning] WSREP: evs::proto(a774e603-0254-11e4-80f4-977f03eae3c4, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)) install timer expired
                evs::proto(evs::proto(a774e603-0254-11e4-80f4-977f03eae3c4, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)), GATHER) {
                current_view=view(view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84) memb {
                     40d6660e-f75c-11e3-adc6-af1ae442c2de,0
                     6d0b8e07-fff9-11e3-9469-feac1f409996,0
                     a774e603-0254-11e4-80f4-977f03eae3c4,0
                } joined {
                } left {
                } partitioned {
                }),
                input_map=evs::input_map: {aru_seq=67773,safe_seq=67764,node_index=node: {idx=0,range=[67774,67773],safe_seq=67764} node: {idx=1,range=[67775,67774],safe_seq=67766} node: {idx=2,range=[67774,67773],safe_seq=67773} },
                fifo_seq=133231,
                last_sent=67773,
                known={
                     40d6660e-f75c-11e3-adc6-af1ae442c2de,evs::node{operational=0,suspected=1,installed=0,fifo_seq=14060431,}
                     6d0b8e07-fff9-11e3-9469-feac1f409996,evs::node{operational=0,suspected=1,installed=0,fifo_seq=3180707,}
                     a774e603-0254-11e4-80f4-977f03eae3c4,evs::node{operational=1,suspected=0,installed=0,fifo_seq=-1,join_message=
                evs::msg{version=0,type=4,user_type=255,order=1,seq=67764,seq_range=-1,aru_seq=67773,flags=0,source=a774e603-0254-11e4-80f4-977f03eae3c4,source_view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=133231,node_list=(     40d6660e-f75c-11e3-adc6-af1ae442c2de,node: {operational=0,suspected=1,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67764,im_range=[67774,67773],}
                     6d0b8e07-fff9-11e3-9469-feac1f409996,node: {operational=0,suspected=1,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67766,im_range=[67775,67774],}
                     a774e603-0254-11e4-80f4-977f03eae3c4,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67773,im_range=[67774,67773],}
                )
                },
                }
                 }
                 }
                2014-07-02 22:05:02 3317 [Note] WSREP: going to give up, state dump for diagnosis:
                evs::proto(evs::proto(a774e603-0254-11e4-80f4-977f03eae3c4, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)), GATHER) {
                current_view=view(view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84) memb {
                     40d6660e-f75c-11e3-adc6-af1ae442c2de,0
                     6d0b8e07-fff9-11e3-9469-feac1f409996,0
                     a774e603-0254-11e4-80f4-977f03eae3c4,0
                } joined {
                } left {
                } partitioned {
                }),
                input_map=evs::input_map: {aru_seq=67773,safe_seq=67764,node_index=node: {idx=0,range=[67774,67773],safe_seq=67764} node: {idx=1,range=[67775,67774],safe_seq=67766} node: {idx=2,range=[67774,67773],safe_seq=67773} },
                fifo_seq=133231,
                last_sent=67773,
                known={
                     40d6660e-f75c-11e3-adc6-af1ae442c2de,evs::node{operational=0,suspected=1,installed=0,fifo_seq=14060431,}
                     6d0b8e07-fff9-11e3-9469-feac1f409996,evs::node{operational=0,suspected=1,installed=0,fifo_seq=3180707,}
                     a774e603-0254-11e4-80f4-977f03eae3c4,evs::node{operational=1,suspected=0,installed=0,fifo_seq=-1,join_message=
                evs::msg{version=0,type=4,user_type=255,order=1,seq=67764,seq_range=-1,aru_seq=67773,flags=0,source=a774e603-0254-11e4-80f4-977f03eae3c4,source_view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=133231,node_list=(     40d6660e-f75c-11e3-adc6-af1ae442c2de,node: {operational=0,suspected=1,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67764,im_range=[67774,67773],}
                     6d0b8e07-fff9-11e3-9469-feac1f409996,node: {operational=0,suspected=1,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67766,im_range=[67775,67774],}
                     a774e603-0254-11e4-80f4-977f03eae3c4,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84),safe_seq=67773,im_range=[67774,67773],}
                )
                },
                }
                 }
                 }
                2014-07-02 22:05:02 3317 [ERROR] WSREP: exception from gcomm, backend must be restarted: evs::proto(a774e603-0254-11e4-80f4-977f03eae3c4, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)) failed to form singleton view after exceeding max_install_timeouts 1, giving up (FATAL)
                      at gcomm/src/evs_proto.cpp:handle_install_timer():577
                2014-07-02 22:05:02 3317 [Note] WSREP: Received self-leave message.
                2014-07-02 22:05:02 3317 [Note] WSREP: Flow-control interval: [0, 0]
                2014-07-02 22:05:02 3317 [Note] WSREP: Received SELF-LEAVE. Closing connection.
                2014-07-02 22:05:02 3317 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 6522848)
                2014-07-02 22:05:02 3317 [Note] WSREP: RECV thread exiting 0: Success
                2014-07-02 22:05:02 3317 [Warning] WSREP: Send action {(nil), 679, TORDERED} returned -103 (Software caused connection abort)
                2014-07-02 22:05:02 3317 [Note] WSREP: New cluster view: global state: da819e4a-d34b-11e3-bb14-4a770834cff0:6522848, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 2
                2014-07-02 22:05:02 3317 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
                2014-07-02 22:05:02 3317 [Note] WSREP: applier thread exiting (code:0)

                Comment


                • #9
                  Node 4

                  Code:
                  2014-07-02 22:03:42 19041 [Note] WSREP: (40d6660e-f75c-11e3-adc6-af1ae442c2de, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.63:4567' pointing to uuid 40d6660e-f75c-11e3-adc6-af1ae442c2de is blacklisted, skipping
                  2014-07-02 22:03:42 19041 [Note] WSREP: (40d6660e-f75c-11e3-adc6-af1ae442c2de, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.0.62:4567
                  2014-07-02 22:03:43 19041 [Note] WSREP: (40d6660e-f75c-11e3-adc6-af1ae442c2de, 'tcp://0.0.0.0:4567') reconnecting to a774e603-0254-11e4-80f4-977f03eae3c4 (tcp://192.168.0.62:4567), attempt 0
                  2014-07-02 22:03:44 19041 [Note] WSREP: evs::proto(40d6660e-f75c-11e3-adc6-af1ae442c2de, OPERATIONAL, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)) suspecting node: a774e603-0254-11e4-80f4-977f03eae3c4
                  2014-07-02 22:03:45 19041 [Note] WSREP: evs::proto(40d6660e-f75c-11e3-adc6-af1ae442c2de, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)) suspecting node: a774e603-0254-11e4-80f4-977f03eae3c4
                  2014-07-02 22:03:45 19041 [Note] WSREP: evs::proto(40d6660e-f75c-11e3-adc6-af1ae442c2de, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)) suspecting node: a774e603-0254-11e4-80f4-977f03eae3c4
                  2014-07-02 22:03:46 19041 [Note] WSREP: evs::proto(40d6660e-f75c-11e3-adc6-af1ae442c2de, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)) suspecting node: a774e603-0254-11e4-80f4-977f03eae3c4
                  2014-07-02 22:03:46 19041 [Note] WSREP: (40d6660e-f75c-11e3-adc6-af1ae442c2de, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.63:4567' pointing to uuid 40d6660e-f75c-11e3-adc6-af1ae442c2de is blacklisted, skipping
                  2014-07-02 22:03:46 19041 [Note] WSREP: evs::proto(40d6660e-f75c-11e3-adc6-af1ae442c2de, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,84)) suspecting node: a774e603-0254-11e4-80f4-977f03eae3c4
                  
                  
                  2014-07-02 22:03:50 19041 [Warning] WSREP: Send action {(nil), 704, TORDERED} returned -107 (Transport endpoint is not connected)
                  2014-07-02 22:03:50 19041 [Warning] WSREP: Send action {(nil), 764, TORDERED} returned -107 (Transport endpoint is not connected)
                  2014-07-02 22:03:50 19041 [Warning] WSREP: Send action {(nil), 704, TORDERED} returned -107 (Transport endpoint is not connected)
                  2014-07-02 22:03:50 19041 [Warning] WSREP: Send action {(nil), 692, TORDERED} returned -107 (Transport endpoint is not connected)
                  2014-07-02 22:03:50 19041 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
                  2014-07-02 22:03:50 19041 [Note] WSREP: Flow-control interval: [16, 16]
                  2014-07-02 22:03:50 19041 [Note] WSREP: Received NON-PRIMARY.
                  2014-07-02 22:03:50 19041 [Warning] WSREP: Send action {(nil), 764, TORDERED} returned -107 (Transport endpoint is not connected)
                  2014-07-02 22:03:50 19041 [Warning] WSREP: Send action {(nil), 692, TORDERED} returned -107 (Transport endpoint is not connected)
                  2014-07-02 22:03:50 19041 [Note] WSREP: New cluster view: global state: da819e4a-d34b-11e3-bb14-4a770834cff0:6522848, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
                  2014-07-02 22:03:50 19041 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
                  2014-07-02 22:03:50 19041 [Note] WSREP: New cluster view: global state: da819e4a-d34b-11e3-bb14-4a770834cff0:6522848, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 2
                  2014-07-02 22:03:50 19041 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
                  2014-07-02 22:03:54 19041 [Note] WSREP: (40d6660e-f75c-11e3-adc6-af1ae442c2de, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.63:4567' pointing to uuid 40d6660e-f75c-11e3-adc6-af1ae442c2de is blacklisted, skipping
                  2014-07-02 22:03:58 19041 [Note] WSREP: (40d6660e-f75c-11e3-adc6-af1ae442c2de, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.63:4567' pointing to uuid 40d6660e-f75c-11e3-adc6-af1ae442c2de is blacklisted, skipping
                  2014-07-02 22:04:02 19041 [Note] WSREP: (40d6660e-f75c-11e3-adc6-af1ae442c2de, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.63:4567' pointing to uuid 40d6660e-f75c-11e3-adc6-af1ae442c2de is blacklisted, skipping
                  2014-07-02 22:04:06 19041 [Note] WSREP: (40d6660e-f75c-11e3-adc6-af1ae442c2de, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.63:4567' pointing to uuid 40d6660e-f75c-11e3-adc6-af1ae442c2de is blacklisted, skipping
                  2014-07-02 22:04:08 19041 [Note] WSREP: (40d6660e-f75c-11e3-adc6-af1ae442c2de, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.63:4567' pointing to uuid 40d6660e-f75c-11e3-adc6-af1ae442c2de is blacklisted, skipping
                  2014-07-02 22:04:08 19041 [Warning] WSREP: evs::proto(40d6660e-f75c-11e3-adc6-af1ae442c2de, GATHER, view_id(REG,40d6660e-f75c-11e3-adc6-af1ae442c2de,85)) source 6d0b8e07-fff9-11e3-9469-feac1f409996 is not supposed to be representative
                  2014-07-02 22:04:08 19041 [Note] WSREP: declaring 6d0b8e07-fff9-11e3-9469-feac1f409996 stable
                  2014-07-02 22:04:08 19041 [Note] WSREP: view(view_id(NON_PRIM,40d6660e-f75c-11e3-adc6-af1ae442c2de,87) memb {
                       40d6660e-f75c-11e3-adc6-af1ae442c2de,0
                       6d0b8e07-fff9-11e3-9469-feac1f409996,0
                  } joined {
                  } left {
                  } partitioned {
                       a774e603-0254-11e4-80f4-977f03eae3c4,0
                  })
                  2014-07-02 22:04:08 19041 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
                  2014-07-02 22:04:08 19041 [Note] WSREP: Flow-control interval: [23, 23]
                  2014-07-02 22:04:08 19041 [Note] WSREP: Received NON-PRIMARY.
                  2014-07-02 22:04:08 19041 [Note] WSREP: New cluster view: global state: da819e4a-d34b-11e3-bb14-4a770834cff0:6522848, view# -1: non-Primary, number of nodes: 2, my index: 0, protocol version 2
                  2014-07-02 22:04:08 19041 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
                  2014-07-02 22:04:10 19041 [Note] WSREP: (40d6660e-f75c-11e3-adc6-af1ae442c2de, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.63:4567' pointing to uuid 40d6660e-f75c-11e3-adc6-af1ae442c2de is blacklisted, skipping
                  2014-07-02 22:04:14 19041 [Note] WSREP: (40d6660e-f75c-11e3-adc6-af1ae442c2de, 'tcp://0.0.0.0:4567') address 'tcp://192.168.0.63:4567' pointing to uuid 40d6660e-f75c-11e3-adc6-af1ae442c2de is blacklisted, skipping

                  Comment


                  • #10
                    Did you actually check the network connectivity between the nodes once this happened? Are those nodes a separate machines or mabe VMs, possibly sharing the same host machine, or storage?

                    Comment

                    Working...
                    X