Announcement

Announcement Module
Collapse
No announcement yet.

wsrep rsync IST Operation not permitted - how to troubleshoot?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • wsrep rsync IST Operation not permitted - how to troubleshoot?

    Hi all,

    I have a node that can't join a cluster as wsrep rsync for IST is not permitted. How can I troubleshoot the issue, there isn't much else in the logs.

    The nodes are on the same IP subnet with no firewall. Both are new Centos 6.3 boxes with the latest Percona RPM's.

    Here is the config and log from the joiner (cluster name, hostname and IP addresses removed for security)

    --

    [client]
    port=3306
    socket=/var/lib/mysql/mysql.sock

    [mysqld]
    user=mysql
    datadir=/db/mysql
    port=3306
    socket=/var/lib/mysql/mysql.sock
    log-error=/db/logs/node2.err
    log_slow_queries=/db/logs/node2-slow.log
    tmpdir=/db/mysqltmp
    binlog_format=ROW
    wsrep_provider=/usr/lib64/libgalera_smm.so

    wsrep_cluster_address=gcomm://10.10.10.10

    wsrep_node_address=gcomm://10.10.10.11
    wsrep_slave_threads=2
    wsrep_cluster_name=test
    wsrep_sst_method=rsync
    wsrep_node_name=node2

    innodb_locks_unsafe_for_binlog=1
    innodb_autoinc_lock_mode=2

    [mysqld_safe]
    datadir=/db/mysql

    --

    121201 22:08:48 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    121201 22:08:48 [Note] WSREP: wsrep_sst_grab()
    121201 22:08:48 [Note] WSREP: Start replication
    121201 22:08:48 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    121201 22:08:48 [Note] WSREP: protonet asio version 0
    121201 22:08:48 [Note] WSREP: backend: asio
    121201 22:08:48 [Note] WSREP: GMCast version 0
    121201 22:08:48 [Note] WSREP: (ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
    121201 22:08:48 [Note] WSREP: (ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
    121201 22:08:48 [Note] WSREP: EVS version 0
    121201 22:08:48 [Note] WSREP: PC version 0
    121201 22:08:48 [Note] WSREP: gcomm: connecting to group 'test', peer '10.10.10.10:'
    121201 22:08:48 [Note] WSREP: declaring 77ade5d7-3c00-11e2-0800-5270a3596162 stable
    121201 22:08:48 [Note] WSREP: view(view_id(PRIM,77ade5d7-3c00-11e2-0800-5270a3596162,2) memb {
    77ade5d7-3c00-11e2-0800-5270a3596162,
    ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c,
    } joined {
    } left {
    } partitioned {
    })
    121201 22:08:49 [Note] WSREP: gcomm: connected
    121201 22:08:49 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    121201 22:08:49 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    121201 22:08:49 [Note] WSREP: Opened channel 'test'
    121201 22:08:49 [Note] WSREP: Waiting for SST to complete.
    121201 22:08:49 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
    121201 22:08:49 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    121201 22:08:49 [Note] WSREP: STATE EXCHANGE: sent state msg: af3f5ae2-3c03-11e2-0800-3f9d70376806
    121201 22:08:49 [Note] WSREP: STATE EXCHANGE: got state msg: af3f5ae2-3c03-11e2-0800-3f9d70376806 from 0 (node1)
    121201 22:08:49 [Note] WSREP: STATE EXCHANGE: got state msg: af3f5ae2-3c03-11e2-0800-3f9d70376806 from 1 (node2)
    121201 22:08:49 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 1,
    members = 1/2 (joined/total),
    act_id = 1,
    last_appl. = -1,
    protocols = 0/4/2 (gcs/repl/appl),
    group UUID = 77ae90f5-3c00-11e2-0800-eee19d0a75f0
    121201 22:08:49 [Note] WSREP: Flow-control interval: [23, 23]
    121201 22:08:49 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 1)
    121201 22:08:49 [Note] WSREP: State transfer required:
    Group state: 77ae90f5-3c00-11e2-0800-eee19d0a75f0:1
    Local state: 00000000-0000-0000-0000-000000000000:-1
    121201 22:08:49 [Note] WSREP: New cluster view: global state: 77ae90f5-3c00-11e2-0800-eee19d0a75f0:1, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 2
    121201 22:08:49 [Warning] WSREP: Gap in state sequence. Need state transfer.
    121201 22:08:51 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address 'gcomm' --auth '' --datadir '/db/mysql/' --defaults-file '/etc/my.cnf' --parent '25005''
    cat: /db/mysql//rsync_sst.pid: No such file or directory
    121201 22:08:51 [Note] WSREP: Prepared SST request: rsync|gcomm:4444/rsync_sst
    121201 22:08:51 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    121201 22:08:51 [Note] WSREP: Assign initial position for certification: 1, protocol version: 2
    121201 22:08:51 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (77ae90f5-3c00-11e2-0800-eee19d0a75f0): 1 (Operation not permitted)
    at galera/src/replicator_str.cpprepare_for_IST():440. IST will be unavailable.
    121201 22:08:51 [Note] WSREP: Node 1 (node2) requested state transfer from '*any*'. Selected 0 (node1)(SYNCED) as donor.
    121201 22:08:51 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1)
    121201 22:08:51 [Note] WSREP: Requesting state transfer: success, donor: 0
    121201 22:08:51 [Warning] WSREP: 0 (node1): State transfer to 1 (node2) failed: -1 (Operation not permitted)
    121201 22:08:51 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():712: Will never receive state. Need to abort.
    121201 22:08:51 [Note] WSREP: gcomm: terminating thread
    121201 22:08:51 [Note] WSREP: gcomm: joining thread
    121201 22:08:51 [Note] WSREP: gcomm: closing backend
    121201 22:08:52 [Note] WSREP: view(view_id(NON_PRIM,77ade5d7-3c00-11e2-0800-5270a3596162,2 ) memb {
    ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c,
    } joined {
    } left {
    } partitioned {
    77ade5d7-3c00-11e2-0800-5270a3596162,
    })
    121201 22:08:52 [Note] WSREP: view((empty))
    121201 22:08:52 [Note] WSREP: gcomm: closed
    121201 22:08:52 [Note] WSREP: /usr/sbin/mysqld: Terminated.
    121201 22:08:52 mysqld_safe mysqld from pid file /db/mysql/mysql.pid ended
    Parent mysqld process (PID:25005) terminated unexpectedly.
    WSREP_SST: [INFO] Joiner cleanup. (20121201 22:08:53.613)
    done.

    --

    Here is the info from the donor:

    --

    [client]
    port=3306
    socket=/var/lib/mysql/mysql.sock

    [mysqld]
    user=mysql
    datadir=/db/mysql
    port=3306
    socket=/var/lib/mysql/mysql.sock
    log-error=/db/logs/node1.err
    log_slow_queries=/db/logs/node1-slow.log
    tmpdir=/db/mysqltmp
    binlog_format=ROW
    server-id=1

    wsrep_provider=/usr/lib64/libgalera_smm.so

    wsrep_cluster_address=gcomm://

    wsrep_node_address=gcomm://10.10.10.10
    wsrep_slave_threads=2
    wsrep_cluster_name=test
    wsrep_sst_method=rsync
    wsrep_node_name=node1

    innodb_locks_unsafe_for_binlog=1
    innodb_autoinc_lock_mode=2

    [mysqld_safe]
    datadir=/db/mysql

    --

    121201 22:08:50 [Note] WSREP: New cluster view: global state: 77ae90f5-3c00-11e2-0800-eee19d0a75f0:1, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 2
    121201 22:08:50 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    121201 22:08:50 [Note] WSREP: Assign initial position for certification: 1, protocol version: 2
    121201 22:08:52 [Note] WSREP: Node 1 (node2) requested state transfer from '*any*'. Selected 0 (node1)(SYNCED) as donor.
    121201 22:08:52 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 1)
    121201 22:08:52 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    121201 22:08:52 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address 'gcomm:4444/rsync_sst' --auth '(null)' --socket '/var/lib/mysql/mysql.sock' --datadir '/db/mysql/' --defaults-file '/etc/my.cnf' --gtid '77ae90f5-3c00-11e2-0800-eee19d0a75f0:1''
    121201 22:08:52 [Note] WSREP: sst_donor_thread signaled with 0
    121201 22:08:52 [Note] WSREP: Flushing tables for SST...
    121201 22:08:52 [Note] WSREP: Provider paused at 77ae90f5-3c00-11e2-0800-eee19d0a75f0:1
    121201 22:08:52 [Note] WSREP: Tables flushed.
    rsync: failed to connect to gcomm: Connection refused (111)
    rsync error: error in socket IO (code 10) at clientserver.c(124) [sender=3.0.6]
    rsync returned code 10:
    121201 22:08:52 [ERROR] WSREP: Failed to read from: wsrep_sst_rsync --role 'donor' --address 'gcomm:4444/rsync_sst' --auth '(null)' --socket '/var/lib/mysql/mysql.sock' --datadir '/db/mysql/' --defaults-file '/etc/my.cnf' --gtid '77ae90f5-3c00-11e2-0800-eee19d0a75f0:1'
    121201 22:08:52 [Note] WSREP: Provider resumed.
    121201 22:08:52 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'donor' --address 'gcomm:4444/rsync_sst' --auth '(null)' --socket '/var/lib/mysql/mysql.sock' --datadir '/db/mysql/' --defaults-file '/etc/my.cnf' --gtid '77ae90f5-3c00-11e2-0800-eee19d0a75f0:1': 255 (Unknown error 255)
    121201 22:08:52 [Warning] WSREP: 0 (node1): State transfer to 1 (node2) failed: -1 (Operation not permitted)
    121201 22:08:52 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 1)
    121201 22:08:53 [Note] WSREP: view(view_id(PRIM,77ade5d7-3c00-11e2-0800-5270a3596162,3) memb {
    77ade5d7-3c00-11e2-0800-5270a3596162,
    } joined {
    } left {
    } partitioned {
    ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c,
    })
    121201 22:08:53 [Note] WSREP: forgetting ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c (tcp://10.10.10.11:4567)
    121201 22:08:53 [Note] WSREP: Member 0 (node1) synced with group.
    121201 22:08:53 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 1)
    121201 22:08:53 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
    121201 22:08:53 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b1ad3ae3-3c03-11e2-0800-9d0489fd1a38
    121201 22:08:53 [Note] WSREP: STATE EXCHANGE: sent state msg: b1ad3ae3-3c03-11e2-0800-9d0489fd1a38
    121201 22:08:53 [Note] WSREP: STATE EXCHANGE: got state msg: b1ad3ae3-3c03-11e2-0800-9d0489fd1a38 from 0 (node1)
    121201 22:08:53 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 2,
    members = 1/1 (joined/total),
    act_id = 1,
    last_appl. = 0,
    protocols = 0/4/2 (gcs/repl/appl),
    group UUID = 77ae90f5-3c00-11e2-0800-eee19d0a75f0
    121201 22:08:53 [Note] WSREP: Flow-control interval: [16, 16]
    121201 22:08:53 [Note] WSREP: Synchronized with group, ready for connections
    121201 22:08:53 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    121201 22:08:53 [Note] WSREP: New cluster view: global state: 77ae90f5-3c00-11e2-0800-eee19d0a75f0:1, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 2
    121201 22:08:53 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    121201 22:08:53 [Note] WSREP: Assign initial position for certification: 1, protocol version: 2
    121201 22:08:58 [Note] WSREP: cleaning up ae6e2cf2-3c03-11e2-0800-c9f2a1c90f4c (tcp://10.10.10.11:4567)

    --

    Thanks,
    Frank

  • #2
    I think this was resolved by removing

    wsrep_cluster_name

    Comment

    Working...
    X