GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

XtraDB Cluster node disconnect from cluster after SST and need new SST operation

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • XtraDB Cluster node disconnect from cluster after SST and need new SST operation

    XtraDB Cluster node db1 try to connect to cluster (db3, garbd), but after success SST and established connection (wsrep_cluster_size show 3 nodes), db1 disconnect from cluster and need another SST (Remove '/home/database/mysql//grastate.dat' file and restart if you wish to continue. (FATAL)).

    Cluster worked fine for 2 months.

    Following important options are set (DB1):
    wsrep_sst_method=xtrabackup
    wsrep_cluster_address=gcomm://192.168.1.85,192.168.1.87,192.168.1.60
    wsrep_sst_donor=db3_node
    wsrep_causal_reads=1
    wsrep_provider_options="gcache.size=4G"

    Warning and error messages.

    DB1:
    130806 10:27:37 [Warning] Could not increase number of max_open_files to more than 65535 (request: 1049087)
    130806 10:27:37 [Warning] WSREP: Could not open saved state file for reading: /home/database/mysql//grastate.dat
    130806 10:27:37 [Warning] WSREP: (0d2e5276-fe72-11e2-9d9d-77047f92373b, 'tcp://0.0.0.0:4567') address 'tcp://192.168.1.85:4567' points to own listening address, blacklisting
    130806 10:27:38 [Warning] WSREP: Gap in state sequence. Need state transfer.
    130806 10:27:40 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (38b565ea-acd1-11e2-0800-579a44d43f8f): 1 (Operation not permitted)
    130806 10:30:19 [Warning] WSREP: last inactive check more than PT1.5S ago (PT20.2287S), skipping check
    130806 10:32:00 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.61507S), skipping check
    130806 10:33:03 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.6398S), skipping check
    130806 10:34:41 [Warning] WSREP: last inactive check more than PT1.5S ago (PT6.47264S), skipping check
    130806 10:38:04 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.56166S), skipping check
    130806 10:44:57 [Warning] WSREP: Could not find peer:
    130806 10:47:04 [Warning] WSREP: last inactive check more than PT1.5S ago (PT10.2864S), skipping check
    130806 10:47:27 [Warning] WSREP: Rejecting JOIN message from 0 (db1_node): new State Transfer required.
    130806 10:47:28 [ERROR] WSREP: Local state seqno (64285056) is greater than group seqno (64276104): states diverged. Aborting to avoid potential data loss. Remove '/home/database/mysql//grastate.dat' file and restart if you wish to continue. (FATAL)
    130806 10:51:14 [Warning] Could not increase number of max_open_files to more than 65535 (request: 1049087)
    130806 10:51:14 [Warning] WSREP: Could not open saved state file for reading: /home/database/mysql//grastate.dat
    130806 10:51:14 [Warning] WSREP: (5a0906fb-fe75-11e2-9944-efb4a1d25eb5, 'tcp://0.0.0.0:4567') address 'tcp://192.168.1.85:4567' points to own listening address, blacklisting
    130806 10:51:15 [Warning] WSREP: Gap in state sequence. Need state transfer.
    130806 10:51:17 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (38b565ea-acd1-11e2-0800-579a44d43f8f): 1 (Operation not permitted)
    130806 10:53:10 [Warning] WSREP: last inactive check more than PT1.5S ago (PT12.3021S), skipping check
    130806 10:53:20 [Warning] WSREP: last inactive check more than PT1.5S ago (PT4.09586S), skipping check
    130806 10:54:21 [Warning] WSREP: last inactive check more than PT1.5S ago (PT2.66241S), skipping check
    130806 11:09:42 [Warning] WSREP: last inactive check more than PT1.5S ago (PT7.10041S), skipping check
    130806 11:12:10 [Warning] WSREP: Rejecting JOIN message from 1 (db1_node): new State Transfer required.
    130806 11:12:10 [ERROR] WSREP: Local state seqno (64301015) is greater than group seqno (64290014): states diverged. Aborting to avoid potential data loss. Remove '/home/database/mysql//grastate.dat' file and restart if you wish to continue. (FATAL)
    DB3:
    130806 9:12:10 [Warning] WSREP: discarding established (time wait) f32bff04-fe66-11e2-0800-31ae218b35a7 (tcp://192.168.1.85:4567)
    130806 9:12:12 [Warning] WSREP: discarding established (time wait) f32bff04-fe66-11e2-0800-31ae218b35a7 (tcp://192.168.1.85:4567)
    130806 9:18:54 [Warning] WSREP: discarding established (time wait) f32bff04-fe66-11e2-0800-31ae218b35a7 (tcp://192.168.1.85:4567)
    130806 9:19:53 [Warning] WSREP: discarding established (time wait) f32bff04-fe66-11e2-0800-31ae218b35a7 (tcp://192.168.1.85:4567)
    130806 9:28:47 [Warning] WSREP: Rejecting JOIN message from 2 (db1_node): new State Transfer required.
    130806 10:34:43 [Warning] WSREP: discarding established (time wait) 0d2e5276-fe72-11e2-9d9d-77047f92373b (tcp://192.168.1.85:4567)
    130806 10:34:44 [Warning] WSREP: discarding established (time wait) 0d2e5276-fe72-11e2-9d9d-77047f92373b (tcp://192.168.1.85:4567)
    130806 10:34:46 [Warning] WSREP: discarding established (time wait) 0d2e5276-fe72-11e2-9d9d-77047f92373b (tcp://192.168.1.85:4567)
    130806 10:47:05 [Warning] WSREP: discarding established (time wait) 0d2e5276-fe72-11e2-9d9d-77047f92373b (tcp://192.168.1.85:4567)
    130806 10:47:27 [Warning] WSREP: Rejecting JOIN message from 0 (db1_node): new State Transfer required.
    130806 10:53:05 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)
    130806 11:09:43 [Warning] WSREP: discarding established (time wait) 5a0906fb-fe75-11e2-9944-efb4a1d25eb5 (tcp://192.168.1.85:4567)
    130806 11:09:44 [Warning] WSREP: Could not find peer: 5a0906fb-fe75-11e2-9944-efb4a1d25eb5
    130806 11:09:45 [Warning] WSREP: discarding established (time wait) 5a0906fb-fe75-11e2-9944-efb4a1d25eb5 (tcp://192.168.1.85:4567)
    130806 11:09:46 [Warning] WSREP: discarding established (time wait) 5a0906fb-fe75-11e2-9944-efb4a1d25eb5 (tcp://192.168.1.85:4567)
    130806 11:12:10 [Warning] WSREP: Rejecting JOIN message from 1 (db1_node): new State Transfer required.
    130806 11:12:10 [Warning] WSREP: gcs_caused() returned -1 (Operation not permitted)

  • #2
    Generally when Galera tells you things like this, you should listen.

    I also noted these messages:

    130806 10:27:37 [Warning] Could not increase number of max_open_files to more than 65535 (request: 1049087)

    That makes me suspect you have an open files limit problem on DB1 -- have you adjusted your ulimits?

    Comment

    Working...
    X