Announcement

Announcement Module
Collapse
No announcement yet.

Only able to start first node in cluster

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Only able to start first node in cluster

    Hello,

    I'm having a bit of trouble with my Percona XtraDB cluster. I'm not quite sure what started this as it was working just fine. I had to reboot one of the servers (there are 3 nodes) and it wouldn't start again. Then I restarted a second and it too wouldn't restart. Currently the only one that runs is the one I bootscrap by using the command /etc/init.d/mysql bootstrap-pxc. Below are entries from the log file and other details that I hope will help shed some light on this issue.

    Some details that might help:
    - rsync is the same version on all servers, v3.0.6
    - Percona versin install from rpm: Percona-XtraDB-Cluster-shared-55-5.5.34-25.9.607


    Can anyone assite me in troubleshooting this?

    P.S.
    Yep, I will certainly change the password after I get this working again :-)

    Details:

    mysql.cnf contents on all servers:
    [mysqld]
    wsrep_provider=/usr/lib/libgalera_smm.so
    wsrep_cluster_address=gcomm://192.168.10.31,192.168.10.32,192.168.10.39
    wsrep_slave_threads=8
    wsrep_sst_method=rsync
    binlog_format=ROW
    default_storage_engine=InnoDB
    innodb_locks_unsafe_for_binlog=1
    innodb_autoinc_lock_mode=2
    wsrep_sst_auth=sstuser:s3cret
    wsrep_replicate_myisam=on
    log-error=/var/log/mysqld.log



    From the client log that won't start:
    131206 04:20:54 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    131206 04:20:54 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.7Y8asL' --pid-file='/var/lib/mysql/mail3.devdigital.com-recover.pid'
    131206 04:20:57 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
    131206 4:20:57 [Note] WSREP: wsrep_start_position var submitted: '00000000-0000-0000-0000-000000000000:-1'
    131206 4:20:57 [Note] WSREP: Read nil XID from storage engines, skipping position init
    131206 4:20:57 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
    131206 4:20:57 [Note] WSREP: wsrep_load(): Galera 2.8(r165) by Codership Oy <info@codership.com> loaded successfully.
    131206 4:20:57 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    131206 4:20:57 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
    131206 4:20:57 [Note] WSREP: Passing config to GCS: base_host = 192.168.10.39; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 2147483647; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
    131206 4:20:57 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    131206 4:20:57 [Note] WSREP: wsrep_sst_grab()
    131206 4:20:57 [Note] WSREP: Start replication
    131206 4:20:57 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    131206 4:20:57 [Note] WSREP: protonet asio version 0
    131206 4:20:57 [Note] WSREP: backend: asio
    131206 4:20:57 [Note] WSREP: GMCast version 0
    131206 4:20:57 [Note] WSREP: (1880d3ce-5e60-11e3-bf16-efb486db5fe4, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
    131206 4:20:57 [Note] WSREP: (1880d3ce-5e60-11e3-bf16-efb486db5fe4, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
    131206 4:20:57 [Note] WSREP: EVS version 0
    131206 4:20:57 [Note] WSREP: PC version 0
    131206 4:20:57 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '192.168.10.31:,192.168.10.32:,192.168.10.39:'
    131206 4:20:57 [Warning] WSREP: (1880d3ce-5e60-11e3-bf16-efb486db5fe4, 'tcp://0.0.0.0:4567') address 'tcp://192.168.10.39:4567' points to own listening address, blacklisting
    131206 4:20:57 [Note] WSREP: (1880d3ce-5e60-11e3-bf16-efb486db5fe4, 'tcp://0.0.0.0:4567') address 'tcp://192.168.10.39:4567' pointing to uuid 1880d3ce-5e60-11e3-bf16-efb486db5fe4 is blacklisted, skipping
    131206 4:20:57 [Note] WSREP: declaring 20d04133-5e60-11e3-8215-6772cc3565ab stable
    131206 4:20:57 [Note] WSREP: Node 20d04133-5e60-11e3-8215-6772cc3565ab state prim
    131206 4:20:57 [Note] WSREP: view(view_id(PRIM,1880d3ce-5e60-11e3-bf16-efb486db5fe4,2) memb {
    1880d3ce-5e60-11e3-bf16-efb486db5fe4,
    20d04133-5e60-11e3-8215-6772cc3565ab,
    } joined {
    } left {
    } partitioned {
    })
    131206 4:20:57 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.10.32:4567
    131206 4:20:58 [Note] WSREP: gcomm: connected
    131206 4:20:58 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    131206 4:20:58 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    131206 4:20:58 [Note] WSREP: Opened channel 'my_wsrep_cluster'
    131206 4:20:58 [Note] WSREP: Waiting for SST to complete.
    131206 4:20:58 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
    131206 4:20:58 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 191adee8-5e60-11e3-bc41-7ee12e5406fe
    131206 4:20:58 [Note] WSREP: STATE EXCHANGE: sent state msg: 191adee8-5e60-11e3-bc41-7ee12e5406fe
    131206 4:20:58 [Note] WSREP: STATE EXCHANGE: got state msg: 191adee8-5e60-11e3-bc41-7ee12e5406fe from 0 (mail3.devdigital.com)
    131206 4:20:58 [Note] WSREP: STATE EXCHANGE: got state msg: 191adee8-5e60-11e3-bc41-7ee12e5406fe from 1 (mail1.devdigital.com)
    131206 4:20:58 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 1,
    members = 1/2 (joined/total),
    act_id = 47556,
    last_appl. = -1,
    protocols = 0/4/2 (gcs/repl/appl),
    group UUID = 6ebd7a54-51dd-11e3-bb4c-12d4c65d64a6
    131206 4:20:58 [Note] WSREP: Flow-control interval: [23, 23]
    131206 4:20:58 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 47556)
    131206 4:20:58 [Note] WSREP: State transfer required:
    Group state: 6ebd7a54-51dd-11e3-bb4c-12d4c65d64a6:47556
    Local state: 00000000-0000-0000-0000-000000000000:-1
    131206 4:20:58 [Note] WSREP: New cluster view: global state: 6ebd7a54-51dd-11e3-bb4c-12d4c65d64a6:47556, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 2
    131206 4:20:58 [Warning] WSREP: Gap in state sequence. Need state transfer.
    131206 4:21:00 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.10.39' --auth 'sstuser:s3cret' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '11723''
    131206 4:21:00 [Note] WSREP: Prepared SST request: rsync|192.168.10.39:4444/rsync_sst
    131206 4:21:00 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    131206 4:21:00 [Note] WSREP: Assign initial position for certification: 47556, protocol version: 2
    131206 4:21:00 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (6ebd7a54-51dd-11e3-bb4c-12d4c65d64a6): 1 (Operation not permitted)
    at galera/src/replicator_str.cpprepare_for_IST():445. IST will be unavailable.
    131206 4:21:00 [Note] WSREP: Node 0 (mail3.devdigital.com) requested state transfer from '*any*'. Selected 1 (mail1.devdigital.com)(SYNCED) as donor.
    131206 4:21:00 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 47556)
    131206 4:21:00 [Note] WSREP: Requesting state transfer: success, donor: 1
    131206 4:22:03 [Warning] WSREP: 1 (mail1.devdigital.com): State transfer to 0 (mail3.devdigital.com) failed: -1 (Operation not permitted)
    131206 4:22:03 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():719: Will never receive state. Need to abort.
    131206 4:22:03 [Note] WSREP: gcomm: terminating thread
    131206 4:22:03 [Note] WSREP: gcomm: joining thread
    131206 4:22:03 [Note] WSREP: gcomm: closing backend
    131206 4:22:04 [Note] WSREP: view(view_id(NON_PRIM,1880d3ce-5e60-11e3-bf16-efb486db5fe4,2) memb {
    1880d3ce-5e60-11e3-bf16-efb486db5fe4,
    } joined {
    } left {
    } partitioned {
    20d04133-5e60-11e3-8215-6772cc3565ab,
    })
    131206 4:22:04 [Note] WSREP: view((empty))
    131206 4:22:04 [Note] WSREP: gcomm: closed
    131206 4:22:04 [Note] WSREP: /usr/sbin/mysqld: Terminated.
    131206 04:22:04 mysqld_safe mysqld from pid file /var/lib/mysql/mail3.devdigital.com.pid ended
    WSREP_SST: [ERROR] Parent mysqld process (PID:11723) terminated unexpectedly. (20131206 04:22:04.988)
    WSREP_SST: [INFO] Joiner cleanup. (20131206 04:22:04.992)
    WSREP_SST: [INFO] Joiner cleanup done. (20131206 04:22:05.506)



  • #2
    While bootstrapping keep gcomm address empty(later you can change it). From the above log the state UUID is 00000000-0000-0000-0000-000000000000, that means its trying SST not IST.
    Try to up other nodes, and check what logs is coming.

    Comment

    Working...
    X