Emergency

wsrep_sst_xtrabackup-v2 failing on joiner node

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • wsrep_sst_xtrabackup-v2 failing on joiner node

    Hi - Im trying to bootstrap a new cluster. I have started node one with the following settings:

    [mysqld]

    wsrep_slave_threads = 2
    wsrep_cluster_address = gcomm://
    wsrep_provider = /usr/lib/galera3/libgalera_smm.so
    wsrep_node_address = 52.20.191.45
    wsrep_node_name = pxc-server1-772723051-1lc1x
    wsrep_cluster_name = k8scluster1
    wsrep_sst_method = xtrabackup-v2
    wsrep_sst_auth = "xtrabackup:XXXXXX"

    and node1 starts ok

    Im trying to start node2 with the following settings:

    [mysqld]

    wsrep_slave_threads = 2
    wsrep_cluster_address = gcomm://52.20.191.45 <----ADDRESS OF NODE1
    wsrep_provider = /usr/lib/galera3/libgalera_smm.so
    wsrep_node_address = 34.225.88.182
    wsrep_node_name = pxc-server2-1251698107-hhph0
    wsrep_cluster_name = k8scluster1
    wsrep_sst_method = xtrabackup-v2
    wsrep_sst_auth = "xtrabackup:XXXXXX"


    It looks like node2 finds node1 ok and tries to initiate an SST sync but then throws an error:

    2017-11-07T21:17:59.822566Z 0 [Note] WSREP: Member 0.0 (pxc-server1-772723051-1lc1x) synced with group.
    2017-11-07T21:19:26.597647Z WSREP_SST: [ERROR] ******************* FATAL ERROR **********************
    2017-11-07T21:19:26.598557Z WSREP_SST: [ERROR] Possible timeout in receving first data from donor in
    /usr/bin/wsrep_sst_xtrabackup-v2: line 971: gtid/keyring stage: No such file or directory
    2017-11-07T21:19:26.600029Z WSREP_SST: [ERROR] Cleanup after exit with status:1
    2017-11-07T21:19:26.605762Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '34.225.88.182' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '1' '' : 1 (Operation not permitted)
    2017-11-07T21:19:26.605789Z 0 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
    2017-11-07T21:19:26.605797Z 0 [ERROR] WSREP: SST script aborted with error 1 (Operation not permitted)
    2017-11-07T21:19:26.605824Z 0 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
    2017-11-07T21:19:26.605837Z 0 [ERROR] Aborting

    However the innobackup.backup.log on node1 (donor?) seems to indicate the SST was ok? Not sure on this

    171107 21:20:48 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
    xtrabackup: The latest check point (for incremental): '12137188'
    xtrabackup: Stopping log copying thread.
    .171107 21:20:48 >> log scanned up to (12137206)

    171107 21:20:48 Executing UNLOCK BINLOG
    171107 21:20:48 Executing UNLOCK TABLES
    171107 21:20:48 All tables unlocked
    171107 21:20:48 [00] Streaming ib_buffer_pool to <STDOUT>
    171107 21:20:48 [00] ...done
    171107 21:20:48 Backup created in directory '/tmp/pxc_sst_lHBlQHrf/donor_xb_mSeLlyOV/'
    MySQL binlog position: filename 'pxc-server1-772723051-1lc1x-bin.000002', position '2995820'
    171107 21:20:48 [00] Streaming <STDOUT>
    171107 21:20:48 [00] ...done
    171107 21:20:48 [00] Streaming <STDOUT>
    171107 21:20:48 [00] ...done
    xtrabackup: Transaction log of lsn (12137188) to (12137206) was copied.
    171107 21:20:48 completed OK!

    Any ideas on what i could have done wrong here or where i can look to troubleshoot?

    Please note that the wsrep_node_address that i have used for both instances is an aws elb that forwards tcp traffic to that node only ports 3306, 4567, 4568 and 4444

    logs are here http://s000.tinyupload.com/download....35967047689973

  • #2
    Do you have keyring configuration active.

    2017-11-07T21:19:26.598557Z WSREP_SST: [ERROR] Possible timeout in receving first data from donor in
    /usr/bin/wsrep_sst_xtrabackup-v2: line 971: gtid/keyring stage: No such file or directory

    Comment


    • #3
      Not intentionally. I have nothing set in conf files for it and
      SELECT PLUGIN_NAME, PLUGIN_STATUS FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME LIKE 'keyring%';
      returns
      Empty set (0.20 sec)

      Comment


      • #4
        Hmmm. The joiner is timing out. Can you turn on wsrep_debug=ON in the [sst] section of the config file? This will log more output during an SST.

        Comment

        Working...
        X