Emergency

Running Docker pxc container joiner node always fails

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running Docker pxc container joiner node always fails

    I am trying to set this up without a discovery service, just two nodes connected across the network. Can someone check my work and let me know if you have any ideas on what is going wrong? I do not have an innodb logs (I see those requested every time someone has an issue), and the mysql.log on the donor is all but empty. Not sure if the container is setup to save the logs elsewhere, but I have spent a good amount of time in that container looking for them.

    Docker Run command for Donor:
    Code:
    docker run -d --name="xtradb-cluster-master" --restart=unless-stopped -v /data/xtradb:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=secret -e CLUSTER_NAME=XtraDBCluster -p 3306:3306 -p 4567-4568:4567-4568 percona/percona-xtradb-cluster
    Docker Run command for joiner:
    Code:
    docker run -i -t --name="xtradb-cluster-joiner" --restart=unless-stopped -v /data/xtradb:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=secret -e CLUSTER_NAME=XtraDBCluster -e CLUSTER_JOIN=10.21.1.34 -p 3306:3306 -p 4567-4568:4567-4568 percona/percona-xtradb-cluster
    I run the joiner with the -i -t so I can see the logs. Here are the logs for that joiner:

    Log coming in the next post because of the limit on characters used.

  • #2
    Code:
    2017-03-20T13:55:07.067549Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
    2017-03-20T13:55:07.069023Z 0 [Note] mysqld (mysqld 5.7.17-11-57) starting as process 1 ...
    2017-03-20T13:55:07.071533Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
    2017-03-20T13:55:07.071549Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
    2017-03-20T13:55:07.083303Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info@codership.com> loaded successfully.
    2017-03-20T13:55:07.083892Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
    2017-03-20T13:55:07.084305Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootsrap: 1
    2017-03-20T13:55:07.086212Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.17.0.2; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
    2017-03-20T13:55:07.099685Z 0 [Note] WSREP: GCache history reset: old(da904b65-0b5a-11e7-99d0-b7fed22124dd:0) -> new(00000000-0000-0000-0000-000000000000:-1)
    2017-03-20T13:55:07.100300Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    2017-03-20T13:55:07.100317Z 0 [Note] WSREP: wsrep_sst_grab()
    2017-03-20T13:55:07.100322Z 0 [Note] WSREP: Start replication
    2017-03-20T13:55:07.100332Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    2017-03-20T13:55:07.100404Z 0 [Note] WSREP: protonet asio version 0
    2017-03-20T13:55:07.100503Z 0 [Note] WSREP: Using CRC-32C for message checksums.
    2017-03-20T13:55:07.100558Z 0 [Note] WSREP: backend: asio
    2017-03-20T13:55:07.100628Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
    2017-03-20T13:55:07.100738Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
    2017-03-20T13:55:07.100745Z 0 [Note] WSREP: restore pc from disk failed
    2017-03-20T13:55:07.101198Z 0 [Note] WSREP: GMCast version 0
    2017-03-20T13:55:07.101377Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
    2017-03-20T13:55:07.101385Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
    2017-03-20T13:55:07.101750Z 0 [Note] WSREP: EVS version 0
    2017-03-20T13:55:07.101843Z 0 [Note] WSREP: gcomm: connecting to group 'XtraDBCluster', peer '10.21.1.34:,10.20.1.35:'
    2017-03-20T13:55:07.104095Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') connection established to d354cff5 tcp://172.17.0.1:4567
    2017-03-20T13:55:07.122258Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') connection established to e0e3516f tcp://10.21.1.34:4567
    2017-03-20T13:55:07.122447Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
    2017-03-20T13:55:07.610659Z 0 [Note] WSREP: declaring e0e3516f at tcp://10.21.1.34:4567 stable
    2017-03-20T13:55:07.614763Z 0 [Note] WSREP: Node e0e3516f state prim
    2017-03-20T13:55:07.618465Z 0 [Note] WSREP: view(view_id(PRIM,d354cff5,294) memb {
            d354cff5,0
            e0e3516f,0
    } joined {
    } left {
    } partitioned {
    })
    2017-03-20T13:55:07.618490Z 0 [Note] WSREP: save pc into disk
    2017-03-20T13:55:07.618767Z 0 [Note] WSREP: discarding pending addr without UUID: tcp://10.20.1.35:4567
    2017-03-20T13:55:07.618778Z 0 [Note] WSREP: discarding pending addr proto entry 0x36ff300
    2017-03-20T13:55:08.103116Z 0 [Note] WSREP: gcomm: connected
    2017-03-20T13:55:08.103174Z 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    2017-03-20T13:55:08.103249Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    2017-03-20T13:55:08.103255Z 0 [Note] WSREP: Opened channel 'XtraDBCluster'
    2017-03-20T13:55:08.103426Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
    2017-03-20T13:55:08.103642Z 0 [Note] WSREP: Waiting for SST to complete.
    2017-03-20T13:55:08.104028Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: d3edcee8-0d74-11e7-9760-cb6538dddaa7
    2017-03-20T13:55:08.107560Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: d3edcee8-0d74-11e7-9760-cb6538dddaa7
    2017-03-20T13:55:08.110874Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: d3edcee8-0d74-11e7-9760-cb6538dddaa7 from 0 (dca214d836ed)
    2017-03-20T13:55:08.114658Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: d3edcee8-0d74-11e7-9760-cb6538dddaa7 from 1 (7c3c607f46b4)
    2017-03-20T13:55:08.114671Z 0 [Note] WSREP: Quorum results:
            version    = 4,
            component  = PRIMARY,
            conf_id    = 15,
            members    = 1/2 (joined/total),
            act_id     = 0,
            last_appl. = -1,
            protocols  = 0/7/3 (gcs/repl/appl),
            group UUID = da904b65-0b5a-11e7-99d0-b7fed22124dd
    2017-03-20T13:55:08.114685Z 0 [Note] WSREP: Flow-control interval: [23, 23]
    2017-03-20T13:55:08.114690Z 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 0)
    2017-03-20T13:55:08.114765Z 1 [Note] WSREP: State transfer required:
            Group state: da904b65-0b5a-11e7-99d0-b7fed22124dd:0
            Local state: 00000000-0000-0000-0000-000000000000:-1
    2017-03-20T13:55:08.114792Z 1 [Note] WSREP: New cluster view: global state: da904b65-0b5a-11e7-99d0-b7fed22124dd:0, view# 16: Primary, number of nodes: 2, my index: 0, protocol version 3
    2017-03-20T13:55:08.114798Z 1 [Warning] WSREP: Gap in state sequence. Need state transfer.
    2017-03-20T13:55:08.115077Z 0 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '172.17.0.2' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix '' --parent '1'  '' '
    WSREP_SST: [INFO] The xtrabackup version is 2.4.6 (20170320 13:55:08.231)
    WSREP_SST: [INFO] Streaming with xbstream (20170320 13:55:08.422)
    WSREP_SST: [INFO] Using socat as streamer (20170320 13:55:08.424)
    WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20170320 13:55:08.428)
    WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20170320 13:55:08.467)
    2017-03-20T13:55:08.674571Z 1 [Note] WSREP: Prepared SST request: xtrabackup-v2|172.17.0.2:4444/xtrabackup_sst//1
    2017-03-20T13:55:08.674615Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    2017-03-20T13:55:08.674643Z 1 [Note] WSREP: REPL Protocols: 7 (3, 2)
    2017-03-20T13:55:08.674653Z 1 [Note] WSREP: Assign initial position for certification: 0, protocol version: 3
    2017-03-20T13:55:08.674742Z 0 [Note] WSREP: Service thread queue flushed.
    2017-03-20T13:55:08.674871Z 1 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (da904b65-0b5a-11e7-99d0-b7fed22124dd): 1 (Operation not permitted)
             at galera/src/replicator_str.cpp:prepare_for_IST():535. IST will be unavailable.
    2017-03-20T13:55:08.678612Z 0 [Note] WSREP: Member 0.0 (dca214d836ed) requested state transfer from '*any*'. Selected 1.0 (7c3c607f46b4)(SYNCED) as donor.
    2017-03-20T13:55:08.678626Z 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 0)
    2017-03-20T13:55:08.678698Z 1 [Note] WSREP: Requesting state transfer: success, donor: 1
    2017-03-20T13:55:08.678713Z 1 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(da904b65-0b5a-11e7-99d0-b7fed22124dd:0)
    2017-03-20T13:55:09.492715Z 0 [Warning] WSREP: 1.0 (7c3c607f46b4): State transfer to 0.0 (dca214d836ed) failed: -32 (Broken pipe)
    2017-03-20T13:55:09.492742Z 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():765: Will never receive state. Need to abort.
    2017-03-20T13:55:09.492781Z 0 [Note] WSREP: gcomm: terminating thread
    2017-03-20T13:55:09.492813Z 0 [Note] WSREP: gcomm: joining thread

    Comment


    • #3
      Code:
      2017-03-20T13:55:09.492917Z 0 [Note] WSREP: gcomm: closing backend
      2017-03-20T13:55:10.603335Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2017-03-20T13:55:12.603363Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') connection to peer e0e3516f with addr tcp://10.21.1.34:4567 timed out, no messages seen in PT3S
      2017-03-20T13:55:12.603507Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.21.1.34:4567
      2017-03-20T13:55:14.103368Z 0 [Note] WSREP: (d354cff5, 'tcp://0.0.0.0:4567') reconnecting to e0e3516f (tcp://10.21.1.34:4567), attempt 0
      2017-03-20T13:55:14.993268Z 0 [Note] WSREP: evs::proto(d354cff5, LEAVING, view_id(REG,d354cff5,294)) suspecting node: e0e3516f
      2017-03-20T13:55:14.993301Z 0 [Note] WSREP: evs::proto(d354cff5, LEAVING, view_id(REG,d354cff5,294)) suspected node without join message, declaring inactive
      2017-03-20T13:55:14.993342Z 0 [Note] WSREP: view(view_id(NON_PRIM,d354cff5,294) memb {
              d354cff5,0
      } joined {
      } left {
      } partitioned {
              e0e3516f,0
      })
      2017-03-20T13:55:14.993386Z 0 [Note] WSREP: view((empty))
      2017-03-20T13:55:14.993725Z 0 [Note] WSREP: gcomm: closed
      2017-03-20T13:55:14.993745Z 0 [Note] WSREP: mysqld: Terminated.
      13:55:14 UTC - mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
      Attempting to collect some information that could help diagnose the problem.
      As this is a crash and something is definitely wrong, the information
      collection process might fail.
      Please help us make Percona XtraDB Cluster better by reporting any
      bugs at https://bugs.launchpad.net/percona-xtradb-cluster
      
      key_buffer_size=0
      read_buffer_size=131072
      max_used_connections=0
      max_threads=152
      thread_count=2
      connection_count=0
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60215 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
      
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0 thread_stack 0x30000
      mysqld(my_print_stacktrace+0x2c)[0xebe56c]
      mysqld(handle_fatal_signal+0x479)[0x7a4b89]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f07c458e890]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x232)[0x7f07c2515532]
      /usr/lib/galera3/libgalera_smm.so(+0x77c2b)[0x7f07b7225c2b]
      /usr/lib/galera3/libgalera_smm.so(_Z13gcs_core_recvP8gcs_coreP12gcs_act_rcvdx+0x626)[0x7f07b7360be6]
      /usr/lib/galera3/libgalera_smm.so(+0x1b7904)[0x7f07b7365904]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x8064)[0x7f07c4587064]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f07c25c762d]
      You may download the Percona XtraDB Cluster operations manual by visiting
      http://www.percona.com/software/percona-xtradb-cluster/. You may find information
      in the manual which will help you identify the cause of the crash.

      Comment


      • #4
        YeeP,

        can you provide also the log from the donor?

        You do not need to run container with "-i -t" to see the log.

        To get the log from the container running in background, you can execute "docker logs -f <container_name>"

        Comment


        • #5
          Originally posted by vadimtk View Post
          YeeP,

          can you provide also the log from the donor?

          You do not need to run container with "-i -t" to see the log.

          To get the log from the container running in background, you can execute "docker logs -f <container_name>"
          Vadimtk - thanks for the tip on the logs:
          Code:
          2017-03-20T20:04:06.726406Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
          2017-03-20T20:04:06.728941Z 0 [Note] mysqld (mysqld 5.7.17-11-57) starting as process 1 ...
          2017-03-20T20:04:06.733653Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
          2017-03-20T20:04:06.733686Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
          2017-03-20T20:04:06.740248Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info@codership.com> loaded successfully.
          2017-03-20T20:04:06.741140Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
          2017-03-20T20:04:06.741693Z 0 [Note] WSREP: Found saved state: c3e581cb-0d76-11e7-836a-0a2b20e07bd6:0, safe_to_bootsrap: 0
          2017-03-20T20:04:06.743363Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.17.0.2; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
          2017-03-20T20:04:06.767899Z 0 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
          2017-03-20T20:04:06.767961Z 0 [Note] WSREP: wsrep_sst_grab()
          2017-03-20T20:04:06.767969Z 0 [Note] WSREP: Start replication
          2017-03-20T20:04:06.767985Z 0 [Note] WSREP: Setting initial position to c3e581cb-0d76-11e7-836a-0a2b20e07bd6:0
          2017-03-20T20:04:06.767992Z 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
          2017-03-20T20:04:06.767998Z 0 [ERROR] WSREP: wsrep::connect(gcomm://) failed: 7
          2017-03-20T20:04:06.768001Z 0 [ERROR] Aborting
          2017-03-20T20:04:06.768009Z 0 [Note] Giving 0 client threads a chance to die gracefully
          2017-03-20T20:04:06.768019Z 0 [Note] WSREP: Service disconnected.
          2017-03-20T20:04:09.768135Z 0 [Note] WSREP: Some threads may fail to exit.
          2017-03-20T20:04:09.768199Z 0 [Note] Binlog end
          2017-03-20T20:04:09.770835Z 0 [Note] mysqld: Shutdown complete
          2017-03-20T20:04:14.965787Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
          2017-03-20T20:04:14.968036Z 0 [Note] mysqld (mysqld 5.7.17-11-57) starting as process 1 ...
          2017-03-20T20:04:14.971438Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
          2017-03-20T20:04:14.971462Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
          2017-03-20T20:04:14.976203Z 0 [Note] WSREP: wsrep_load(): Galera 3.20(r7e383f7) by Codership Oy <info@codership.com> loaded successfully.
          2017-03-20T20:04:14.976477Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
          2017-03-20T20:04:14.977023Z 0 [Note] WSREP: Found saved state: c3e581cb-0d76-11e7-836a-0a2b20e07bd6:0, safe_to_bootsrap: 0
          2017-03-20T20:04:14.978883Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.17.0.2; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
          2017-03-20T20:04:15.000769Z 0 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
          2017-03-20T20:04:15.000831Z 0 [Note] WSREP: wsrep_sst_grab()
          2017-03-20T20:04:15.000838Z 0 [Note] WSREP: Start replication
          2017-03-20T20:04:15.000854Z 0 [Note] WSREP: Setting initial position to c3e581cb-0d76-11e7-836a-0a2b20e07bd6:0
          2017-03-20T20:04:15.000862Z 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
          2017-03-20T20:04:15.000868Z 0 [ERROR] WSREP: wsrep::connect(gcomm://) failed: 7
          2017-03-20T20:04:15.000872Z 0 [ERROR] Aborting
          2017-03-20T20:04:15.000879Z 0 [Note] Giving 0 client threads a chance to die gracefully
          2017-03-20T20:04:15.000890Z 0 [Note] WSREP: Service disconnected.
          2017-03-20T20:04:18.001036Z 0 [Note] WSREP: Some threads may fail to exit.
          2017-03-20T20:04:18.001107Z 0 [Note] Binlog end
          2017-03-20T20:04:18.003658Z 0 [Note] mysqld: Shutdown complete

          Comment


          • #6
            vadimtk: this one is a little better I cannot delete the previous post, but I deleted the data dir and started over because the donor was not crashing in the past. (still setting this up). This is a brand new instance run with the same commands, from the donor:
            Code:
            2017-03-20T21:10:44.506085Z 0 [Note] WSREP: (6673b6dc, 'tcp://0.0.0.0:4567') connection established to ae6ea4bf tcp://10.20.1.35:4567
            2017-03-20T21:10:44.509764Z 0 [Note] WSREP: (6673b6dc, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
            2017-03-20T21:10:45.005918Z 0 [Note] WSREP: declaring ae6ea4bf at tcp://10.20.1.35:4567 stable
            2017-03-20T21:10:45.009498Z 0 [Note] WSREP: Node 6673b6dc state prim
            2017-03-20T21:10:45.012880Z 0 [Note] WSREP: view(view_id(PRIM,6673b6dc,12) memb {
                    6673b6dc,0
                    ae6ea4bf,0
            } joined {
            } left {
            } partitioned {
            })
            2017-03-20T21:10:45.012910Z 0 [Note] WSREP: save pc into disk
            2017-03-20T21:10:45.013390Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
            2017-03-20T21:10:45.013884Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: aebdc4ca-0db1-11e7-9c0f-db7e1711d337
            2017-03-20T21:10:45.017229Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: aebdc4ca-0db1-11e7-9c0f-db7e1711d337
            2017-03-20T21:10:45.020509Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: aebdc4ca-0db1-11e7-9c0f-db7e1711d337 from 0 (c8e314fdc5d4)
            2017-03-20T21:10:45.504119Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: aebdc4ca-0db1-11e7-9c0f-db7e1711d337 from 1 (92495e38bf2d)
            2017-03-20T21:10:45.504162Z 0 [Note] WSREP: Quorum results:
                    version    = 4,
                    component  = PRIMARY,
                    conf_id    = 11,
                    members    = 1/2 (joined/total),
                    act_id     = 14,
                    last_appl. = 0,
                    protocols  = 0/7/3 (gcs/repl/appl),
                    group UUID = 581a154c-0db1-11e7-9a69-ff24de2d16d2
            2017-03-20T21:10:45.504175Z 0 [Note] WSREP: Flow-control interval: [23, 23]
            2017-03-20T21:10:45.504421Z 4 [Note] WSREP: New cluster view: global state: 581a154c-0db1-11e7-9a69-ff24de2d16d2:14, view# 12: Primary, number of nodes: 2, my index: 0, protocol version 3
            2017-03-20T21:10:45.504444Z 4 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2017-03-20T21:10:45.504476Z 4 [Note] WSREP: REPL Protocols: 7 (3, 2)
            2017-03-20T21:10:45.504488Z 4 [Note] WSREP: Assign initial position for certification: 14, protocol version: 3
            2017-03-20T21:10:45.504511Z 0 [Note] WSREP: Service thread queue flushed.
            2017-03-20T21:10:45.993086Z 0 [Note] WSREP: Member 1.0 (92495e38bf2d) requested state transfer from '*any*'. Selected 0.0 (c8e314fdc5d4)(SYNCED) as donor.
            2017-03-20T21:10:45.993133Z 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 14)
            2017-03-20T21:10:45.993306Z 4 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2017-03-20T21:10:45.993431Z 0 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''   '' --gtid '581a154c-0db1-11e7-9a69-ff24de2d16d2:14''
            2017-03-20T21:10:45.994039Z 4 [Note] WSREP: sst_donor_thread signaled with 0
            WSREP_SST: [INFO] The xtrabackup version is 2.4.6 (20170320 21:10:46.038)
            WSREP_SST: [INFO] Streaming with xbstream (20170320 21:10:46.229)
            WSREP_SST: [INFO] Using socat as streamer (20170320 21:10:46.232)
            WSREP_SST: [INFO] Using /tmp/tmp.eS5cYXKuIu as innobackupex temporary directory (20170320 21:10:46.245)
            WSREP_SST: [INFO] Streaming GTID file before SST (20170320 21:10:46.250)
            WSREP_SST: [INFO] Evaluating xbstream -c ${FILE_TO_STREAM} | socat -u stdio TCP:172.17.0.2:4444; RC=( ${PIPESTATUS[@]} ) (20170320 21:10:46.252)
            2017/03/20 21:10:46 socat[2225] E connect(6, AF=2 172.17.0.2:4444, 16): Connection refused
            WSREP_SST: [ERROR] Error while sending data to joiner node:  exit codes: 141 1 (20170320 21:10:46.258)
            WSREP_SST: [ERROR] Cleanup after exit with status:32 (20170320 21:10:46.260)
            WSREP_SST: [INFO] Cleaning up temporary directories (20170320 21:10:46.263)
            2017-03-20T21:10:46.269260Z 0 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''   '' --gtid '581a154c-0db1-11e7-9a69-ff24de2d16d2:14'
            2017-03-20T21:10:46.269308Z 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''   '' --gtid '581a154c-0db1-11e7-9a69-ff24de2d16d2:14': 32 (Broken pipe)
            2017-03-20T21:10:46.269393Z 0 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''   '' --gtid '581a154c-0db1-11e7-9a69-ff24de2d16d2:14'
            2017-03-20T21:10:46.273444Z 0 [Warning] WSREP: 0.0 (c8e314fdc5d4): State transfer to 1.0 (92495e38bf2d) failed: -32 (Broken pipe)
            2017-03-20T21:10:46.273473Z 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 14)
            2017-03-20T21:10:46.276951Z 0 [Note] WSREP: Member 0.0 (c8e314fdc5d4) synced with group.
            2017-03-20T21:10:46.276965Z 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 14)
            2017-03-20T21:10:46.277035Z 4 [Note] WSREP: Synchronized with group, ready for connections
            2017-03-20T21:10:46.277050Z 4 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2017-03-20T21:10:47.278086Z 0 [Note] WSREP: forgetting ae6ea4bf (tcp://10.20.1.35:4567)
            2017-03-20T21:10:47.278171Z 0 [Note] WSREP: Node 6673b6dc state prim
            2017-03-20T21:10:47.278221Z 0 [Note] WSREP: view(view_id(PRIM,6673b6dc,13) memb {
                    6673b6dc,0
            } joined {
            } left {
            } partitioned {
                    ae6ea4bf,0
            })
            2017-03-20T21:10:47.278237Z 0 [Note] WSREP: save pc into disk
            2017-03-20T21:10:47.278757Z 0 [Note] WSREP: forgetting ae6ea4bf (tcp://10.20.1.35:4567)
            2017-03-20T21:10:47.278776Z 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
            2017-03-20T21:10:47.279500Z 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b0176ff9-0db1-11e7-895f-d653150a4e73
            2017-03-20T21:10:47.279531Z 0 [Note] WSREP: STATE EXCHANGE: sent state msg: b0176ff9-0db1-11e7-895f-d653150a4e73
            2017-03-20T21:10:47.279540Z 0 [Note] WSREP: STATE EXCHANGE: got state msg: b0176ff9-0db1-11e7-895f-d653150a4e73 from 0 (c8e314fdc5d4)
            2017-03-20T21:10:47.279551Z 0 [Note] WSREP: Quorum results:
                    version    = 4,
                    component  = PRIMARY,
                    conf_id    = 12,
                    members    = 1/1 (joined/total),
                    act_id     = 14,
                    last_appl. = 0,
                    protocols  = 0/7/3 (gcs/repl/appl),
                    group UUID = 581a154c-0db1-11e7-9a69-ff24de2d16d2
            2017-03-20T21:10:47.279558Z 0 [Note] WSREP: Flow-control interval: [16, 16]
            2017-03-20T21:10:47.279714Z 1 [Note] WSREP: New cluster view: global state: 581a154c-0db1-11e7-9a69-ff24de2d16d2:14, view# 13: Primary, number of nodes: 1, my index: 0, protocol version 3
            2017-03-20T21:10:47.279743Z 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2017-03-20T21:10:47.279756Z 1 [Note] WSREP: REPL Protocols: 7 (3, 2)
            2017-03-20T21:10:47.279766Z 1 [Note] WSREP: Assign initial position for certification: 14, protocol version: 3
            2017-03-20T21:10:47.279787Z 0 [Note] WSREP: Service thread queue flushed.
            2017-03-20T21:10:47.761362Z 0 [Note] WSREP: (6673b6dc, 'tcp://0.0.0.0:4567') turning message relay requesting off
            2017-03-20T21:10:51.006660Z 0 [Note] WSREP: (6673b6dc, 'tcp://0.0.0.0:4567') connection established to ae6ea4bf tcp://10.20.1.35:4567
            2017-03-20T21:10:51.006697Z 0 [Warning] WSREP: discarding established (time wait) ae6ea4bf (tcp://10.20.1.35:4567)
            2017-03-20T21:10:52.762135Z 0 [Note] WSREP:  cleaning up ae6ea4bf (tcp://10.20.1.35:4567)

            Comment


            • #7
              It looks to me like Percona provides the config file(s) inside of the container that define the wsrep process settings and I would not need to mount a volume with my own settings unless I wanted to change that. That being said, I just got a vanilla mariadb 10.1 cluster to stand up with the following settings. If anyone can tell me what the difference is and what I can do to get the xtradb cluster working, I would much prefer it. Something I did notice is mariadb has you putting in the ip address of the node that the docker container is in and running on. the percona xtradb-cluster container specifies the CLUSTER_JOIN variable where (as I understand it) you define the address of the other nodes in the cluster to join.

              Node 1(10.21.1.34) Docker Run command:
              Code:
              docker run \
                --name mariadb-0 \
                -d \
                -v /root/mariadb:/etc/mysql/conf.d \
                -v /data/mariadb:/var/lib/mysql \
                -e MYSQL_INITDB_SKIP_TZINFO=yes \
                -e MYSQL_ROOT_PASSWORD=secret_pw \
                -p 3306:3306 \
                -p 4567:4567/udp \
                -p 4567-4568:4567-4568 \
                -p 4444:4444 \
                mariadb:10.1 \
                --wsrep-new-cluster \
                --wsrep_node_address=10.21.1.34
              Node 2(10.20.1.35) Docker Run command:
              Code:
              docker run \
                --name mariadb-1 \
                -d \
                -v /root/mariadb:/etc/mysql/conf.d \
                -v /data/mariadb:/var/lib/mysql \
                -e MYSQL_ROOT_PASSWORD=secret_pw \
                -p 3306:3306 \
                -p 4567:4567/udp \
                -p 4567-4568:4567-4568 \
                -p 4444:4444 \
                mariadb:10.1 \
                --wsrep_node_address=10.20.1.35
              Config file that was brought it (/root/mariadb/mysql_server.cnf)
              Code:
              #
              # Galera Cluster: mandatory settings
              #
              
              [server]
              bind-address=0.0.0.0
              binlog_format=row
              default_storage_engine=InnoDB
              innodb_autoinc_lock_mode=2
              innodb_locks_unsafe_for_binlog=1
              query_cache_size=0
              query_cache_type=0
              
              [galera]
              wsrep_on=ON
              wsrep_provider="/usr/lib/galera/libgalera_smm.so"
              wsrep_cluster_address="gcomm://10.21.1.34,10.20.1.35"
              wsrep-sst-method=rsync
              
              #
              # Optional setting
              #
              
              # Tune this value for your system, roughly 2x cores; see https://mariadb.com/kb/en/mariadb/galera-cluster-system-variables/#wsrep_slave_threads
              # wsrep_slave_threads=1
              
              # innodb_flush_log_at_trx_commit=0

              Comment


              • #8
                For comparison and future reference, here are my run commands using the xtradb-cluster docker container (currently not working). The Joiner log is in post #2 in this thread, the Donor log is post #6.

                Node 1(10.21.1.34) Docker Run command:
                Code:
                docker run \
                -d \
                --name="xtradb-cluster-master" \
                --restart=unless-stopped \
                -v /data/xtradb:/var/lib/mysql \
                -p 3306:3306 \
                -p 4567:4567/udp \
                -p 4567-4568:4567-4568 \
                -p 4444:4444 \
                -e MYSQL_ROOT_PASSWORD=secret_pw \
                -e CLUSTER_NAME=XtraDBCluster \
                percona/percona-xtradb-cluster
                Node 2(10.20.1.35) Docker Run command:
                Code:
                docker run \
                -d \
                --name="xtradb-cluster-joiner" \
                --restart=unless-stopped \
                -v /data/xtradb:/var/lib/mysql \
                -p 3306:3306 \
                -p 4567:4567/udp \
                -p 4567-4568:4567-4568 \
                -p 4444:4444 \
                -e MYSQL_ROOT_PASSWORD=secret_pw \
                -e CLUSTER_NAME=XtraDBCluster \
                -e CLUSTER_JOIN=10.21.1.34 \
                percona/percona-xtradb-cluster

                Comment


                • #9
                  On the donor you can see
                  2017-03-20T20:04:06.767992Z 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
                  2017-03-20T20:04:06.767998Z 0 [ERROR] WSREP: wsrep::connect(gcomm://) failed: 7
                  2017-03-20T20:04:06.768001Z 0 [ERROR] Aborting

                  so please do what the error suggests: edit the grastate.dat file manually and set safe_to_bootstrap to 1 you will need to shutdown donor and edit the file and start donor.

                  Comment

                  Working...
                  X