GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Node refuses to re-enter cluster

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Node refuses to re-enter cluster

    Hello,

    I have setup up a 5 node Galera cluster with Percona xtradb. The names of the servers are galera01-05. I bootstrapped the cluster, and added all the nodes. I then proceeded to dump back in a mysql dump. Everything seemed to be working fine. To test ( as I had problems before ) I decided to restart a node ( service mysql restart ) to see if it could survive the falling out and joining back. It was not able to and now there is nothing I can do to get it back in. Even removing everything from the server and reseting it backup it will not be able to join the cluster again.

    So from galera02 ( 10.173) to galera05 ( it chose it as the doner 10.177 ) : here is the error log on galera05
    140602 10:13:16 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 76,
    members = 4/5 (joined/total),
    act_id = 15422,
    last_appl. = 15147,
    protocols = 0/4/2 (gcs/repl/appl),
    group UUID = 655d5286-e9f4-11e3-9ad3-a7361a15dc8a
    140602 10:13:16 [Note] WSREP: Flow-control interval: [36, 36]
    140602 10:13:16 [Note] WSREP: New cluster view: global state: 655d5286-e9f4-11e3-9ad3-a7361a15dc8a:15422, view# 77: Primary, number of nodes: 5, my index: 1, protocol version 2
    140602 10:13:16 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    140602 10:13:16 [Note] WSREP: Assign initial position for certification: 15422, protocol version: 2
    140602 10:13:18 [Note] WSREP: Node 0 (Galera02) requested state transfer from '*any*'. Selected 1 (galera05)(SYNCED) as donor.
    140602 10:13:18 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 15422)
    140602 10:13:18 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    140602 10:13:18 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address '10.10.10.173:4444/xtrabackup_sst' --auth 'root:goingforbroke' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '655d5286-e9f4-11e3-9ad3-a7361a15dc8a:15422''
    140602 10:13:18 [Note] WSREP: sst_donor_thread signaled with 0
    WSREP_SST: [INFO] Streaming with tar (20140602 10:13:18.504)
    WSREP_SST: [INFO] Using socat as streamer (20140602 10:13:18.505)
    WSREP_SST: [INFO] Streaming the backup to joiner at 10.10.10.173 4444 (20140602 10:13:18.511)
    WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf $INNOEXTRA --galera-info --stream=$sfmt ${TMPDIR} 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:10.10.10.173:4444; RC=( ${PIPESTATUS[@]} ) (20140602 10:13:18.512)
    140602 10:13:20 [Note] WSREP: declaring 5a4b91d1-e9fa-11e3-ac44-82a35eb07b16 stable
    140602 10:13:20 [Note] WSREP: declaring 6cea84c0-e9fa-11e3-902a-c37fd410e57e stable
    140602 10:13:20 [Note] WSREP: declaring d645de99-e9f7-11e3-8294-eec3f6cb9a92 stable
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.10.10.173:4567
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:20 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:21 [Note] WSREP: Node 5327be86-e9fa-11e3-8fa6-f628e7172e46 state prim
    140602 10:13:21 [Note] WSREP: view(view_id(PRIM,5327be86-e9fa-11e3-8fa6-f628e7172e46,78) memb {
    5327be86-e9fa-11e3-8fa6-f628e7172e46,
    5a4b91d1-e9fa-11e3-ac44-82a35eb07b16,
    6cea84c0-e9fa-11e3-902a-c37fd410e57e,
    d645de99-e9f7-11e3-8294-eec3f6cb9a92,
    } joined {
    } left {
    } partitioned {
    2f0f5c49-ea79-11e3-b5d0-9f1f1821a243,
    })
    140602 10:13:21 [Note] WSREP: forgetting 2f0f5c49-ea79-11e3-b5d0-9f1f1821a243 (tcp://10.10.10.173:4567)
    140602 10:13:21 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 4
    140602 10:13:21 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:21 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:21 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') address 'tcp://10.10.10.177:4567' pointing to uuid 5327be86-e9fa-11e3-8fa6-f628e7172e46 is blacklisted, skipping
    140602 10:13:21 [Note] WSREP: (5327be86-e9fa-11e3-8fa6-f628e7172e46, 'tcp://0.0.0.0:4567') turning message relay requesting off
    140602 10:13:21 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d
    140602 10:13:21 [Note] WSREP: STATE EXCHANGE: sent state msg: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d
    140602 10:13:21 [Note] WSREP: STATE EXCHANGE: got state msg: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d from 0 (galera05)
    140602 10:13:21 [Note] WSREP: STATE EXCHANGE: got state msg: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d from 1 (galera04)
    140602 10:13:21 [Note] WSREP: STATE EXCHANGE: got state msg: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d from 2 (galera03)
    140602 10:13:21 [Note] WSREP: STATE EXCHANGE: got state msg: 32bf19dc-ea79-11e3-abb3-8fa6dbc1e13d from 3 (galera01)
    140602 10:13:21 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 77,
    members = 4/4 (joined/total),
    act_id = 15422,
    last_appl. = 15147,
    protocols = 0/4/2 (gcs/repl/appl),
    group UUID = 655d5286-e9f4-11e3-9ad3-a7361a15dc8a
    140602 10:13:21 [Note] WSREP: Flow-control interval: [32, 32]
    140602 10:13:21 [Note] WSREP: New cluster view: global state: 655d5286-e9f4-11e3-9ad3-a7361a15dc8a:15422, view# 78: Primary, number of nodes: 4, my index: 0, protocol version 2
    140602 10:13:21 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    140602 10:13:21 [Note] WSREP: Assign initial position for certification: 15422, protocol version: 2
    140602 10:13:26 [Note] WSREP: cleaning up 2f0f5c49-ea79-11e3-b5d0-9f1f1821a243 (tcp://10.10.10.173:4567)




    I don't understand what is going on here, help!

    Thank You.

  • #2
    This is the log from donor, but what happened on joiner (10.10.10.173)? Paste the err log from it too. Also, is the 4444 TCP port open between them?

    Comment

    Working...
    X