Announcement Module
No announcement yet.

Error joining to a xtradb cluster

Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error joining to a xtradb cluster

    Hi I'm testing a cluster Xtradb 5.5 on debian 64 from deb packages.

    Master goes ok and the first node goes ok, but the last node aborts because can't complete state transfer:

    Logs ahead ...

    120531 19:08:15 [Note] WSREP: Read nil XID from storage engines, skipping position init120531 19:08:15 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/'120531 19:08:15 [Note] WSREP: wsrep_load(): Galera 2.1dev(r112) by Codership Oy loaded succesfully.120531 19:08:15 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1120531 19:08:15 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.120531 19:08:15 [Note] WSREP: Passing config to GCS: base_host =; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3120531 19:08:15 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1120531 19:08:15 [Note] WSREP: wsrep_sst_grab()120531 19:08:15 [Note] WSREP: Start replication120531 19:08:15 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1120531 19:08:15 [Note] WSREP: protonet asio version 0120531 19:08:15 [Note] WSREP: backend: asio120531 19:08:15 [Note] WSREP: GMCast version 0120531 19:08:15 [Note] WSREP: (36412058-ab43-11e1-0800-85a41e0edd94, 'tcp://') listening at tcp:// 19:08:15 [Note] WSREP: (36412058-ab43-11e1-0800-85a41e0edd94, 'tcp://') multicast: , ttl: 1120531 19:08:15 [Note] WSREP: EVS version 0120531 19:08:15 [Note] WSREP: PC version 0120531 19:08:15 [Note] WSREP: gcomm: connecting to group 'perconatest', peer ''120531 19:08:15 [Note] WSREP: (36412058-ab43-11e1-0800-85a41e0edd94, 'tcp://') turning message relay requesting on, nonlive peers: tcp:// 120531 19:08:16 [Note] WSREP: declaring 89f786ac-ab4d-11e1-0800-83b84a64d80b stable120531 19:08:16 [Note] WSREP: declaring 968b58d2-ab2d-11e1-0800-f2cbfe2d8ed2 stable120531 19:08:16 [Note] WSREP: (36412058-ab43-11e1-0800-85a41e0edd94, 'tcp://') turning message relay requesting off120531 19:08:16 [Note] WSREP: view(view_id(PRIM,36412058-ab43-11e1-0800-85a41e0edd94,13) memb { 36412058-ab43-11e1-0800-85a41e0edd94, 89f786ac-ab4d-11e1-0800-83b84a64d80b, 968b58d2-ab2d-11e1-0800-f2cbfe2d8ed2,} joined {} left {} partitioned {})120531 19:08:16 [Note] WSREP: gcomm: connected120531 19:08:16 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636120531 19:08:16 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)120531 19:08:16 [Note] WSREP: Opened channel 'perconatest'120531 19:08:16 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3120531 19:08:16 [Note] WSREP: Waiting for SST to complete.120531 19:08:16 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 368de0a6-ab43-11e1-0800-d3c3b66b74d2120531 19:08:16 [Note] WSREP: STATE EXCHANGE: sent state msg: 368de0a6-ab43-11e1-0800-d3c3b66b74d2120531 19:08:16 [Note] WSREP: STATE EXCHANGE: got state msg: 368de0a6-ab43-11e1-0800-d3c3b66b74d2 from 0 (percona3)120531 19:08:16 [Note] WSREP: STATE EXCHANGE: got state msg: 368de0a6-ab43-11e1-0800-d3c3b66b74d2 from 1 (percona2)120531 19:08:16 [Note] WSREP: STATE EXCHANGE: got state msg: 368de0a6-ab43-11e1-0800-d3c3b66b74d2 from 2 (percona1)120531 19:08:16 [Note] WSREP: Quorum results: version = 2, component = PRIMARY, conf_id = 12, members = 2/3 (joined/total), act_id = 0, last_appl. = -1, protocols = 0/4/1 (gcs/repl/appl), group UUID = 4c3288d4-aa70-11e1-0800-de2b28cfee00120531 19:08:16 [Note] WSREP: Flow-control interval: [14, 28]120531 19:08:16 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)120531 19:08:16 [Note] WSREP: State transfer required: Group state: 4c3288d4-aa70-11e1-0800-de2b28cfee00:0 Local state: 00000000-0000-0000-0000-000000000000:-1120531 19:08:16 [Note] WSREP: New cluster view: global state: 4c3288d4-aa70-11e1-0800-de2b28cfee00:0, view# 13: Primary, number of nodes: 3, my index: 0, protocol version 1120531 19:08:16 [Warning] WSREP: Gap in state sequence. Need state transfer.120531 19:08:18 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' '' '' '/var/lib/mysql/' '/etc/mysql/my.cnf' '9005' 2>sst.err'120531 19:08:18 [Note] WSREP: Prepared SST request: xtrabackup| 19:08:18 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.120531 19:08:18 [Note] WSREP: Assign initial position for certification: 0, protocol version: 2120531 19:08:18 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (4c3288d4-aa70-11e1-0800-de2b28cfee00): 1 (Operation not permitted) at galera/src/replicator_str.cpprepare_for_IST():439. IST will be unavailable.120531 19:08:18 [Note] WSREP: Node 0 (percona3) requested state transfer from '*any*'. Selected 1 (percona2)(SYNCED) as donor.120531 19:08:18 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)120531 19:08:18 [Note] WSREP: Requesting state transfer: success, donor: 1120531 19:08:23 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup 'joiner' '' '' '/var/lib/mysql/' '/etc/mysql/my.cnf' '9005' 2>sst.err: 32 (Broken pipe)120531 19:08:23 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.120531 19:08:23 [ERROR] WSREP: SST failed: 32 (Broken pipe)120531 19:08:23 [ERROR] Aborting120531 19:08:23 [Warning] WSREP: 1 (percona2): State transfer to 0 (percona3) failed: -1 (Operation not permitted)120531 19:08:23 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():712: Will never receive state. Need to abort.120531 19:08:23 [Note] WSREP: gcomm: terminating thread120531 19:08:23 [Note] WSREP: gcomm: joining thread120531 19:08:23 [Note] WSREP: gcomm: closing backend120531 19:08:23 [Note] WSREP: view(view_id(NON_PRIM,36412058-ab43-11e1-0800-85a41e0edd94,13) memb { 36412058-ab43-11e1-0800-85a41e0edd94,} joined {} left {} partitioned { 89f786ac-ab4d-11e1-0800-83b84a64d80b, 968b58d2-ab2d-11e1-0800-f2cbfe2d8ed2,})120531 19:08:23 [Note] WSREP: view((empty))120531 19:08:23 [Note] WSREP: gcomm: closed120531 19:08:23 [Note] WSREP: /usr/sbin/mysqld: Terminated.Abortado

    there's also a sst.err which contains

    tar: Esto no parece un archivo tar (this doesn't seem to be a tar file)tar: Exiting with failure status due to previous errorsError while getting st data from donor node

    Any clues? Seems like a permission or file path configuration problem to me.

  • #2
    Please post the corresponding error log interval and sst.err from the donor node (percona2)


    • #3
      Hum the logs were lost in the void of stderr, but you pointed me on the right direction

      In the donor the sst.err said that can't found a process to kill. In another run of the joining I was able to see

      120601 11:43:24 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'donor' '******:4444/xtrabackup_sst' '(null)' '/var/lib/mysql/' '/etc/mysql/my.cnf' '4c3288d4-aa70-11e1-0800-de2b28cfee00' '0' '0''120601 11:43:24 [Note] WSREP: sst_donor_thread signaled with 0innobackupex finished with error: 9. Check /var/lib/mysql//innobackup.backup.log

      And in that file where the problem there was a

      innobackupex: Error: mysql child process has died: ERROR 1045 (28000): Access denied for user 'mysql'@'localhost' (using password: NO)

      Creating this user with all privileges solved the issue.

      But why? I couldn't find any doc explaining why this is neccessary. And the other node on the cluster started and replicated without this user.


      • #4
        Not sure. I think this must have something to do with local configuration - something is different on that donor. It also might make sense to double check the consistency of mysql.user table between the nodes.


        • #5
          Same here...
          After updating a node to the new version of XtraDBCluster I replication doesn't work because of problems with SST.

          After creating the mysql-user on the donor the replication works. Is there any file where I can set the credentials for the xtrabackup-replication?


          • #6
            hhhhmmmm this could be a little side-effect of percona's cluster limitations?

            I mean, documentation asserts that mysql datastore isn't replicated, but you can issue user commands to the cluster, which will be replicated across the cluster. User comands alter the mysql database, but then are lost. Is possible that the first node get the user command replicated on joining and then, after it's considered propagated, succesive nodes won't get this command issued? If it's true it could be a potential pitfall.


            • #7
              Hi everybody,

              I have the same pb and same error when I upgrade my Percona in 5.5.24.
              My cluster doesn't work I must reinstall it.
              In 5.5.23, is working without problem, so I don't understand why I have this problem. I'm not change my config.
              This is a bug in the patch upgrade ?

              Best regards



              • #8
                Do you have a [mysql] section in /etc/my.cnf or ~/.my.cnf that includes a username=mysql line?


                • #9
                  No but i find the issue.
                  This is because the rights my.cnf file was in 600 instead of 644


                  • #10
                    I think it SHOULD be 600, but the file must belong to mysql user.


                    • #11

                      You need to provide this configuration option in my.cnf:




                      • #12
                        lefred, thanks.

                        It worked for me.
                        I have set this in /etc/my.cnf for the DONOR.