GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.

    Hello everyone,

    I used mysqld to trace and see this error


    WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.


    Quote:
    130320 18:49:37 [Warning] You need to use --log-bin to make --log-slave-updates work.
    130320 18:49:37 [Note] WSREP: Read nil XID from storage engines, skipping position init
    130320 18:49:37 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
    130320 18:49:37 [Note] WSREP: wsrep_load(): Galera 2.3(r143) by Codership Oy loaded succesfully.
    130320 18:49:37 [ERROR] WSREP: Process completed with error: /sbin/ifconfig | grep -E '^[[:space:]]+inet addr:' | grep -m1 -v 'inet addr:127' | sed 's/:/ /' | awk '{ print $3 }': 2 (No such file or directory)
    130320 18:49:37 [Warning] WSREP: Failed to guess base node address. Set it explicitly via wsrep_node_address.
    130320 18:49:37 [Warning] WSREP: Guessing address for incoming client connections failed. Try setting wsrep_node_incoming_address explicitly.
    130320 18:49:37 [Note] WSREP: Found saved state: 9ad38334-9082-11e2-0800-6edb31989fd4:-1
    130320 18:49:37 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.
    130320 18:49:37 [Note] WSREP: Passing config to GCS: base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; pc.ignore_sb = true; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
    130320 18:49:37 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    130320 18:49:37 [Note] WSREP: wsrep_sst_grab()
    130320 18:49:37 [Note] WSREP: Start replication
    130320 18:49:37 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    130320 18:49:37 [Note] WSREP: protonet asio version 0
    130320 18:49:37 [Note] WSREP: backend: asio
    130320 18:49:37 [Note] WSREP: GMCast version 0
    130320 18:49:37 [Note] WSREP: (70fd13f2-91b0-11e2-0800-73cacdb90501, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
    130320 18:49:37 [Note] WSREP: (70fd13f2-91b0-11e2-0800-73cacdb90501, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
    130320 18:49:37 [Note] WSREP: EVS version 0
    130320 18:49:37 [Note] WSREP: PC version 0
    130320 18:49:37 [Note] WSREP: gcomm: connecting to group 'eclickz', peer '192.168.133.66:'
    130320 18:49:37 [Note] WSREP: (70fd13f2-91b0-11e2-0800-73cacdb90501, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.133.68:4567
    130320 18:49:37 [Note] WSREP: (70fd13f2-91b0-11e2-0800-73cacdb90501, 'tcp://0.0.0.0:4567') turning message relay requesting off
    130320 18:49:37 [Note] WSREP: declaring 60a03c55-919b-11e2-0800-5b5a359d2cb3 stable
    130320 18:49:37 [Note] WSREP: declaring c96d5052-918c-11e2-0800-210b57b08c72 stable
    130320 18:49:38 [Note] WSREP: view(view_id(PRIM,60a03c55-919b-11e2-0800-5b5a359d2cb3,352) memb {
    60a03c55-919b-11e2-0800-5b5a359d2cb3,
    70fd13f2-91b0-11e2-0800-73cacdb90501,
    c96d5052-918c-11e2-0800-210b57b08c72,
    } joined {
    } left {
    } partitioned {
    })
    130320 18:49:38 [Note] WSREP: gcomm: connected
    130320 18:49:38 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    130320 18:49:38 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    130320 18:49:38 [Note] WSREP: Opened channel 'eclickz'
    130320 18:49:38 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
    130320 18:49:38 [Note] WSREP: Waiting for SST to complete.
    130320 18:49:38 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    130320 18:49:38 [Note] WSREP: STATE EXCHANGE: sent state msg: 45816f5a-91b0-11e2-0800-a4fd5a8bbd3c
    130320 18:49:38 [Note] WSREP: STATE EXCHANGE: got state msg: 45816f5a-91b0-11e2-0800-a4fd5a8bbd3c from 0 (node03)
    130320 18:49:38 [Note] WSREP: STATE EXCHANGE: got state msg: 45816f5a-91b0-11e2-0800-a4fd5a8bbd3c from 2 (node1)
    130320 18:49:38 [Note] WSREP: STATE EXCHANGE: got state msg: 45816f5a-91b0-11e2-0800-a4fd5a8bbd3c from 1 (slave5.eclickz.com)
    130320 18:49:38 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 346,
    members = 2/3 (joined/total),
    act_id = 832701,
    last_appl. = -1,
    protocols = 0/4/2 (gcs/repl/appl),
    group UUID = 9ad38334-9082-11e2-0800-6edb31989fd4
    130320 18:49:38 [Note] WSREP: Flow-control interval: [28, 28]
    130320 18:49:38 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 832701)
    130320 18:49:38 [Note] WSREP: State transfer required:
    Group state: 9ad38334-9082-11e2-0800-6edb31989fd4:832701
    Local state: 9ad38334-9082-11e2-0800-6edb31989fd4:-1
    130320 18:49:38 [Note] WSREP: New cluster view: global state: 9ad38334-9082-11e2-0800-6edb31989fd4:832701, view# 347: Primary, number of nodes: 3, my index: 1, protocol version 2
    130320 18:49:38 [Warning] WSREP: Gap in state sequence. Need state transfer.
    130320 18:49:40 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.133.70:4567' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '38994''
    130320 18:49:40 [ERROR] WSREP: Failed to read 'ready ' from: wsrep_sst_rsync --role 'joiner' --address '192.168.133.70:4567' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '38994'
    Read: '(null)'
    130320 18:49:40 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '192.168.133.70:4567' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '38994': 2 (No such file or directory)
    130320 18:49:40 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
    130320 18:49:40 [ERROR] Aborting

    130320 18:49:42 [Note] WSREP: Closing send monitor...
    130320 18:49:42 [Note] WSREP: Closed send monitor.
    130320 18:49:42 [Note] WSREP: gcomm: terminating thread
    130320 18:49:42 [Note] WSREP: gcomm: joining thread
    130320 18:49:42 [Note] WSREP: gcomm: closing backend
    130320 18:49:42 [Note] WSREP: view(view_id(NON_PRIM,60a03c55-919b-11e2-0800-5b5a359d2cb3,3 52) memb {
    70fd13f2-91b0-11e2-0800-73cacdb90501,
    } joined {
    } left {
    } partitioned {
    60a03c55-919b-11e2-0800-5b5a359d2cb3,
    c96d5052-918c-11e2-0800-210b57b08c72,
    })
    130320 18:49:42 [Note] WSREP: view((empty))
    130320 18:49:42 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
    130320 18:49:42 [Note] WSREP: gcomm: closed
    130320 18:49:42 [Note] WSREP: Flow-control interval: [16, 16]
    130320 18:49:42 [Note] WSREP: Received NON-PRIMARY.
    130320 18:49:42 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 832701)
    130320 18:49:42 [Note] WSREP: Received self-leave message.
    130320 18:49:42 [Note] WSREP: Flow-control interval: [0, 0]
    130320 18:49:42 [Note] WSREP: Received SELF-LEAVE. Closing connection.
    130320 18:49:42 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 832701)
    130320 18:49:42 [Note] WSREP: RECV thread exiting 0: Success
    130320 18:49:42 [Note] WSREP: recv_thread() joined.
    130320 18:49:42 [Note] WSREP: Closing slave action queue.
    130320 18:49:42 [Note] WSREP: Service disconnected.
    130320 18:49:42 [Note] WSREP: rollbacker thread exiting
    130320 18:49:43 [Note] WSREP: Some threads may fail to exit.

    I could not resolve this. Please help me. Thanks

  • #2
    Seems that XtraDB can't get the list of IPs using ifconfig:

    130320 18:49:37 [ERROR] WSREP: Process completed with error: /sbin/ifconfig | grep -E '^[[:space:]]+inet addr:' | grep -m1 -v 'inet addr:127' | sed 's/:/ /' | awk '{ print $3 }': 2 (No such file or directory)

    it should be able to run the tool to get the information it needs. You can also use wsrep_node_address to set the IP in which rsync and other daemons should listen.

    Comment


    • #3
      I have the same problem (using rsync or xtrabackup sst methods).

      I used wsrep_node_address to mitigate the first error but can't get around the second:
      130320 18:49:38 [Warning] WSREP: Gap in state sequence. Need state transfer.
      130320 18:49:40 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.133.70:4567' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '38994''
      130320 18:49:40 [ERROR] WSREP: Failed to read 'ready ' from: wsrep_sst_rsync --role 'joiner' --address '192.168.133.70:4567' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '38994'
      Read: '(null)'
      130320 18:49:40 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '192.168.133.70:4567' --auth '' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '38994': 2 (No such file or directory)
      130320 18:49:40 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
      130320 18:49:40 [ERROR] Aborting


      Did you have any luck? I'm guessing the mysql process doesn't have permissions to run/access both /bin/ip and wsrep_sst_rsync but I haven't yet found a way to solve it (chown'd and chmod'd everything I can find that could be causing it).

      Any help would be much appreciated!

      Tristan.

      Comment


      • #4
        Just to give some more details on my setup.

        I have 3 nodes, they were all working beautifully until a switch died taking out all but one node.

        I made that node primary with gcomm:// and have since been trying to add the other two back to the cluster but I keep having problems.

        Firstly I was getting the one Miguel identified above and I mitigated that by using wsrep_node_address, but now I can't get past 'No such file or directory' as demonstrated below:

        Using xtrabackup:
        130530 15:31:50 [Warning] WSREP: Gap in state sequence. Need state transfer.
        130530 15:31:50 [Note] WSREP: Setting wsrep_ready to 0
        130530 15:31:50 [Note] WSREP: [debug]: closing client connections for PRIM
        130530 15:31:52 [Note] WSREP: waiting for client connections to close: 2
        130530 15:31:52 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address '176.123.62.2' --auth 'repl:****' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '24455''
        130530 15:31:52 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup --role 'joiner' --address '176.123.62.2' --auth 'repl:****' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '24455'
        Read: '(null)'
        130530 15:31:52 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '176.123.62.2' --auth 'repl:****' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '24455': 2 (No such file or directory)
        130530 15:31:52 [ERROR] WSREP: Failed to prepare for 'xtrabackup' SST. Unrecoverable.
        130530 15:31:52 [ERROR] Aborting
        Using rsync:
        130530 14:28:46 [Note] WSREP: State transfer required:
        Group state: 079517f3-9dd3-11e2-0800-91e2f9f7eca8:42281
        Local state: 079517f3-9dd3-11e2-0800-91e2f9f7eca8:19746
        130530 14:28:46 [Note] WSREP: New cluster view: global state: 079517f3-9dd3-11e2-0800-91e2f9f7eca8:42281, view# 10: Primary, number of nodes: 2, my index: 1, protocol version 2
        130530 14:28:46 [Warning] WSREP: Gap in state sequence. Need state transfer.
        130530 14:28:48 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '176.123.62.2' --auth 'repl:****' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '19421''
        130530 14:28:48 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '176.123.62.2' --auth 'repl:****'' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '19421'
        Read: '(null)'
        130530 14:28:48 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '176.123.62.2' --auth 'repl:****' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '19421': 2 (No such file or directory)
        130530 14:28:48 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
        130530 14:28:48 [ERROR] Aborting

        In my quest to find the problem I've run the above command at the shell
        # wsrep_sst_rsync --role 'joiner' --address '176.123.62.2' --auth 'repl:****' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '20314'
        ready 176.123.62.2:4444/rsync_sst
        WSREP_SST: [ERROR] Parent mysqld process (PID:20314) terminated unexpectedly. (20130530 16:44:21.212)
        WSREP_SST: [INFO] Joiner cleanup. (20130530 16:44:21.214)
        done.

        As you can see it works as expected and returns the <addr> WSREP seems to be expecting (other than a complaint that the PID of mysql isn't there - because it isn't as mysql has aborted).

        This is happening from two different nodes with the same error (both Ubuntu 12.04). I've upgraded one to 2.1.3 and the other two are still running 2.0.6.

        To me it looks like there's a problem spawning wsrep_sst_rsync within mysql but I'm having no luck finding the cause... I've also tried symlinking wsrep_sst* and innobackup* to /usr/sbin and mysql's data dir to no avail.

        I expect there's something stupid that I'm missing but I just can't find it,

        Your help would be greatly appreciated!

        ​Tristan.

        Comment

        Working...
        X