Buy Percona ServicesBuy Now!

Diagnosing SST errors with Percona XtraDB Cluster for MySQL

 | December 30, 2014 |  Posted In: MySQL, Percona XtraDB Cluster

PREVIOUS POST
NEXT POST

State Snapshot Transfer (SST) is used in Percona XtraDB Cluster (PXC) when a new node joins the cluster or to resync a failed node if Incremental State Transfer (IST) is no longer available. SST is triggered automatically but there is no magic: If it is not configured properly, it will not work and new nodes will never be able to join the cluster. Let’s have a look at a few classic issues.

Port for SST is not open

The donor and the joiner communicate on port 4444, and if the port is closed on one side, SST will always fail.

You will see in the error log of the donor that SST is started:

But then nothing happens, and some time later you will see a bunch of errors:

On the joiner side, you will see a similar sequence: SST is started, then hangs and is finally aborted:

The solution is of course to make sure that the ports are open on both sides.

SST is not correctly configured

Sometimes you will see an error like this on the donor:

And if you look at innobackup.backup.log:

What happened?

The default SST method is xtrabackup-v2 and for it to work, you need to specify a username/password in the my.cnf file:

And you also need to create the corresponding MySQL user:

So you should check that the user has been correctly created in MySQL and that wsrep_sst_auth is correctly set.

Galera versions do not match

Here is another set of errors you may see in the error log of the donor:

Here the issue is that you try to connect a node using Galera 2.x and a node running Galera 3.x. This can happen if you try to use a PXC 5.5 node and a PXC 5.6 node.

The right solution is probably to understand why you ended up with such inconsistent versions and make sure all nodes are using the same Percona XtraDB Cluster version and Galera version.

But if you know what you are doing, you can also instruct the node using Galera 3.x that it will communicate with Galera 2.x nodes by specifying in the my.cnf file:

Conclusion

SST errors can have multiple reasons for occurring, and the best way to diagnose the issue is to have a look at the error log of the donor and the joiner. Galera is in general quite verbose so you can follow the progress of SST on both nodes and see where it fails. Then it is mostly about being able to interpret the error messages.

PREVIOUS POST
NEXT POST
Stephane Combaudon

Stéphane joined Percona in July 2012, after working as a MySQL DBA for leading French companies such as Dailymotion and France Telecom. In real life, he lives in Paris with his wife and their twin daughters. When not in front of a computer or not spending time with his family, he likes playing chess and hiking.

2 Comments

  • 180424 12:36:07 [Note] WSREP: Read nil XID from storage engines, skipping position init
    180424 12:36:07 [Note] WSREP: wsrep_load(): loading provider library ‘none’
    180424 12:36:07 [Note] WSREP: Service disconnected.
    180424 12:36:08 [Note] WSREP: Some threads may fail to exit.
    Error in my_thread_global_end(): 1 threads didn’t exit
    180424 12:36:13 [Note] WSREP: Read nil XID from storage engines, skipping position init
    180424 12:36:13 [Note] WSREP: wsrep_load(): loading provider library ‘none’
    180424 12:36:13 [Note] WSREP: Service disconnected.
    180424 12:36:14 [Note] WSREP: Some threads may fail to exit.
    Error in my_thread_global_end(): 1 threads didn’t exit
    180424 12:36:20 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    180424 12:36:20 mysqld_safe Skipping wsrep-recover for empty datadir: /var/lib/mysql
    180424 12:36:20 mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
    180424 12:36:20 [Note] WSREP: wsrep_start_position var submitted: ‘00000000-0000-0000-0000-000000000000:-1’
    180424 12:36:20 [Note] WSREP: Read nil XID from storage engines, skipping position init
    180424 12:36:20 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/libgalera_smm.so’
    180424 12:36:20 [Note] WSREP: wsrep_load(): Galera 2.12(r318911d) by Codership Oy loaded successfully.
    180424 12:36:20 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat
    180424 12:36:20 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    180424 12:36:20 [Note] WSREP: Reusing existing ‘/var/lib/mysql//galera.cache’.
    180424 12:36:20 [Note] WSREP: Passing config to GCS: base_host = 10.225.213.207; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S; pc.weight = 1; protonet.backend = asio;
    180424 12:36:20 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    180424 12:36:20 [Note] WSREP: wsrep_sst_grab()
    180424 12:36:20 [Note] WSREP: Start replication
    180424 12:36:20 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    180424 12:36:20 [Note] WSREP: protonet asio version 0
    180424 12:36:20 [Note] WSREP: backend: asio
    180424 12:36:20 [Note] WSREP: GMCast version 0
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1
    180424 12:36:20 [Note] WSREP: EVS version 0
    180424 12:36:20 [Note] WSREP: PC version 0
    180424 12:36:20 [Note] WSREP: gcomm: connecting to group ‘XSODB-Cluster’, peer ‘10.225.213.207:,10.225.213.208:,10.226.222.155:,10.226.222.156:,10.226.222.157:,10.225.213.206:’
    180424 12:36:20 [Warning] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ points to own listening address, blacklisting
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers:
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:20 [Note] WSREP: (fd675a7d, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid fd675a7d is blacklisted, skipping
    180424 12:36:21 [Note] WSREP: declaring 4a777bd8 at tcp://10.226.222.156:4567 stable
    180424 12:36:21 [Note] WSREP: declaring 8fee1978 at tcp://10.225.213.206:4567 stable
    180424 12:36:21 [Note] WSREP: declaring c94810b6 at tcp://10.226.222.155:4567 stable
    180424 12:36:21 [Note] WSREP: Node 4a777bd8 state prim
    180424 12:36:21 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
    pc::Proto{uuid=fd675a7d,start_prim=0,npvo=0,ignore_sb=0,ignore_quorum=0,state=1,last_sent_seq=0,checksum=0,instances=
    4a777bd8,prim=1,un=0,last_seq=51,last_prim=view_id(PRIM,4a777bd8,213),to_seq=1307338,weight=1
    8fee1978,prim=1,un=0,last_seq=3,last_prim=view_id(PRIM,8fee1978,1),to_seq=2,weight=1
    c94810b6,prim=1,un=0,last_seq=102,last_prim=view_id(PRIM,4a777bd8,213),to_seq=1307338,weight=1
    fd675a7d,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,00000000,0),to_seq=-1,weight=1
    ,state_msgs=
    4a777bd8,pcmsg{ type=STATE, seq=0, flags= 0, node_map { 4a777bd8,prim=1,un=0,last_seq=51,last_prim=view_id(PRIM,4a777bd8,213),to_seq=1307338,weight=1
    c94810b6,prim=1,un=0,last_seq=102,last_prim=view_id(PRIM,4a777bd8,213),to_seq=1307338,weight=1
    }}
    8fee1978,pcmsg{ type=STATE, seq=0, flags= 0, node_map { 8fee1978,prim=1,un=0,last_seq=3,last_prim=view_id(PRIM,8fee1978,1),to_seq=2,weight=1
    }}
    c94810b6,pcmsg{ type=STATE, seq=0, flags= 0, node_map { 4a777bd8,prim=1,un=0,last_seq=51,last_prim=view_id(PRIM,4a777bd8,213),to_seq=1307338,weight=1
    c94810b6,prim=1,un=0,last_seq=102,last_prim=view_id(PRIM,4a777bd8,213),to_seq=1307338,weight=1
    }}
    fd675a7d,pcmsg{ type=STATE, seq=0, flags= 0, node_map { fd675a7d,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,00000000,0),to_seq=-1,weight=1
    }}
    ,current_view=view(view_id(REG,4a777bd8,214) memb {
    4a777bd8,
    8fee1978,
    c94810b6,
    fd675a7d,
    } joined {
    4a777bd8,
    8fee1978,
    c94810b6,
    } left {
    } partitioned {
    }),pc_view=view((empty)),mtu=2147483647}
    180424 12:36:21 [Note] WSREP: {v=0,t=1,ut=255,o=4,s=0,sr=0,as=-1,f=0,src=fd675a7d,srcvid=view_id(REG,4a777bd8,214),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=12,nl=(
    )
    } 64
    180424 12:36:21 [ERROR] WSREP: exception caused by message: {v=0,t=3,ut=255,o=1,s=0,sr=-1,as=0,f=4,src=4a777bd8,srcvid=view_id(REG,4a777bd8,214),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=5659928,nl=(
    )
    }
    state after handling message: evs::proto(evs::proto(fd675a7d, OPERATIONAL, view_id(REG,4a777bd8,214)), OPERATIONAL) {
    current_view=view(view_id(REG,4a777bd8,214) memb {
    4a777bd8,
    8fee1978,
    c94810b6,
    fd675a7d,
    } joined {
    } left {
    } partitioned {
    }),
    input_map=evs::input_map: {aru_seq=0,safe_seq=0,node_index=node: {idx=0,range=[1,0],safe_seq=0} node: {idx=1,range=[1,0],safe_seq=0} node: {idx=2,range=[1,0],safe_seq=0} node: {idx=3,range=[1,0],safe_seq=0} },
    fifo_seq=14,
    last_sent=0,
    known:
    4a777bd8 at tcp://10.226.222.156:4567
    {o=1,s=0,i=1,fs=5659928,}
    8fee1978 at tcp://10.225.213.206:4567
    {o=1,s=0,i=1,fs=151,}
    c94810b6 at tcp://10.226.222.155:4567
    {o=1,s=0,i=1,fs=5659855,}
    fd675a7d at
    {o=1,s=0,i=1,fs=-1,}
    }180424 12:36:21 [ERROR] WSREP: failed to open gcomm backend connection: 131: fd675a7d last prims not consistent (FATAL)
    at gcomm/src/pc_proto.cpp:is_prim():735
    180424 12:36:21 [ERROR] WSREP: gcs/src/gcs_core.cpp:long int gcs_core_open(gcs_core_t*, const char*, const char*, bool)():204: Failed to open backend connection: -131 (State not recoverable)
    180424 12:36:21 [ERROR] WSREP: gcs/src/gcs.cpp:long int gcs_open(gcs_conn_t*, const char*, const char*, bool)():1303: Failed to open channel ‘XSODB-Cluster’ at ‘gcomm://10.225.213.207,10.225.213.208,10.226.222.155,10.226.222.156,10.226.222.157,10.225.213.206’: -131 (State not recoverable)
    180424 12:36:21 [ERROR] WSREP: gcs connect failed: State not recoverable
    180424 12:36:21 [ERROR] WSREP: wsrep::connect() failed: 7
    180424 12:36:21 [ERROR] Aborting

    180424 12:36:21 [Note] WSREP: Service disconnected.
    180424 12:36:22 [Note] WSREP: Some threads may fail to exit.
    180424 12:36:22 [Note] /usr/sbin/mysqld: Shutdown complete

    180424 12:36:22 mysqld_safe mysqld from pid file /run/mysqld/mysql.pid ended
    180424 12:42:07 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    180424 12:42:07 mysqld_safe Skipping wsrep-recover for empty datadir: /var/lib/mysql
    180424 12:42:07 mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
    180424 12:42:07 [Note] WSREP: wsrep_start_position var submitted: ‘00000000-0000-0000-0000-000000000000:-1’
    180424 12:42:07 [Note] WSREP: Read nil XID from storage engines, skipping position init
    180424 12:42:07 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/libgalera_smm.so’
    180424 12:42:07 [Note] WSREP: wsrep_load(): Galera 2.12(r318911d) by Codership Oy loaded successfully.
    180424 12:42:07 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat
    180424 12:42:07 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    180424 12:42:07 [Note] WSREP: Reusing existing ‘/var/lib/mysql//galera.cache’.
    180424 12:42:07 [Note] WSREP: Passing config to GCS: base_host = 10.225.213.207; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S; pc.weight = 1; protonet.backend = asio;
    180424 12:42:07 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    180424 12:42:07 [Note] WSREP: wsrep_sst_grab()
    180424 12:42:07 [Note] WSREP: Start replication
    180424 12:42:07 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    180424 12:42:07 [Note] WSREP: protonet asio version 0
    180424 12:42:07 [Note] WSREP: backend: asio
    180424 12:42:07 [Note] WSREP: GMCast version 0
    180424 12:42:07 [Note] WSREP: (cc725d7a, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567
    180424 12:42:07 [Note] WSREP: (cc725d7a, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1
    180424 12:42:07 [Note] WSREP: EVS version 0
    180424 12:42:07 [Note] WSREP: PC version 0
    180424 12:42:07 [Note] WSREP: gcomm: connecting to group ‘XSODB-Cluster’, peer ‘10.225.213.207:,10.225.213.208:,10.226.222.155:,10.226.222.156:,10.226.222.157:,10.225.213.206:’
    180424 12:42:07 [Warning] WSREP: (cc725d7a, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ points to own listening address, blacklisting
    180424 12:42:07 [Note] WSREP: (cc725d7a, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid cc725d7a is blacklisted, skipping
    180424 12:42:07 [Note] WSREP: (cc725d7a, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers:
    180424 12:42:07 [Note] WSREP: (cc725d7a, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid cc725d7a is blacklisted, skipping
    180424 12:42:07 [Note] WSREP: (cc725d7a, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid cc725d7a is blacklisted, skipping
    180424 12:42:07 [Note] WSREP: (cc725d7a, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid cc725d7a is blacklisted, skipping
    180424 12:42:09 [Note] WSREP: declaring 4a777bd8 at tcp://10.226.222.156:4567 stable
    180424 12:42:09 [Note] WSREP: declaring c94810b6 at tcp://10.226.222.155:4567 stable
    180424 12:42:09 [Note] WSREP: Node 4a777bd8 state prim
    180424 12:42:09 [Note] WSREP: discarding pending addr without UUID: tcp://10.225.213.206:4567
    180424 12:42:09 [Note] WSREP: discarding pending addr proto entry 0x178c010
    180424 12:42:09 [Note] WSREP: discarding pending addr without UUID: tcp://10.225.213.208:4567
    180424 12:42:09 [Note] WSREP: discarding pending addr without UUID: tcp://10.226.222.157:4567
    180424 12:42:09 [Note] WSREP: gcomm: connected
    180424 12:42:09 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    180424 12:42:09 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    180424 12:42:09 [Note] WSREP: Opened channel ‘XSODB-Cluster’
    180424 12:42:09 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
    180424 12:42:09 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    180424 12:42:09 [Note] WSREP: Waiting for SST to complete.
    180424 12:42:09 [Note] WSREP: STATE EXCHANGE: sent state msg: 0f94c833-478e-11e8-ae83-8f5ee429bcd6
    180424 12:42:09 [Note] WSREP: STATE EXCHANGE: got state msg: 0f94c833-478e-11e8-ae83-8f5ee429bcd6 from 0 (XSODB-Node5)
    180424 12:42:09 [Note] WSREP: STATE EXCHANGE: got state msg: 0f94c833-478e-11e8-ae83-8f5ee429bcd6 from 1 (XSODB-Node4)
    180424 12:42:09 [Note] WSREP: STATE EXCHANGE: got state msg: 0f94c833-478e-11e8-ae83-8f5ee429bcd6 from 2 (XSODB-Node2)
    180424 12:42:09 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 177,
    members = 2/3 (joined/total),
    act_id = 1211995,
    last_appl. = -1,
    protocols = 0/4/3 (gcs/repl/appl),
    group UUID = 3d52910e-229b-11e8-83d2-ae20bb88815f
    180424 12:42:09 [Note] WSREP: Flow-control interval: [28, 28]
    180424 12:42:09 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 1211995)
    180424 12:42:09 [Note] WSREP: State transfer required:
    Group state: 3d52910e-229b-11e8-83d2-ae20bb88815f:1211995
    Local state: 00000000-0000-0000-0000-000000000000:-1
    180424 12:42:09 [Note] WSREP: New cluster view: global state: 3d52910e-229b-11e8-83d2-ae20bb88815f:1211995, view# 178: Primary, number of nodes: 3, my index: 2, protocol version 3
    180424 12:42:09 [Warning] WSREP: Gap in state sequence. Need state transfer.
    180424 12:42:09 [Note] WSREP: Running: ‘wsrep_sst_xtrabackup-v2 –role ‘joiner’ –address ‘10.225.213.207’ –auth ‘sst:secret’ –datadir ‘/var/lib/mysql/’ –defaults-file ‘/etc/my.cnf’ –parent ‘2042”
    WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20180424 12:42:10.076)
    180424 12:42:10 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.225.213.207:4444/xtrabackup_sst
    180424 12:42:10 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    180424 12:42:10 [Note] WSREP: Assign initial position for certification: 1211995, protocol version: 2
    180424 12:42:10 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (3d52910e-229b-11e8-83d2-ae20bb88815f): 1 (Operation not permitted)
    at galera/src/replicator_str.cpp:prepare_for_IST():447. IST will be unavailable.
    180424 12:42:10 [Note] WSREP: Node 2 (XSODB-Node2) requested state transfer from ‘*any*’. Selected 0 (XSODB-Node5)(SYNCED) as donor.
    180424 12:42:10 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1211995)
    180424 12:42:10 [Note] WSREP: Requesting state transfer: success, donor: 0
    180424 12:42:11 [Note] WSREP: (cc725d7a, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
    180424 12:42:20 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 –role ‘joiner’ –address ‘10.225.213.207’ –auth ‘sst:secret’ –datadir ‘/var/lib/mysql/’ –defaults-file ‘/etc/my.cnf’ –parent ‘2042’: 2 (No such file or directory)
    180424 12:42:20 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
    180424 12:42:20 [ERROR] WSREP: SST failed: 2 (No such file or directory)
    180424 12:42:20 [ERROR] Aborting

    180424 12:42:20 [Warning] WSREP: 0 (XSODB-Node5): State transfer to 2 (XSODB-Node2) failed: -22 (Invalid argument)
    180424 12:42:20 [ERROR] WSREP: gcs/src/gcs_group.cpp:long int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():717: Will never receive state. Need to abort.
    180424 12:42:20 [Note] WSREP: gcomm: terminating thread
    180424 12:42:20 [Note] WSREP: gcomm: joining thread
    180424 12:42:20 [Note] WSREP: gcomm: closing backend
    180424 12:42:20 [Note] WSREP: gcomm: closed
    180424 12:42:20 [Note] WSREP: /usr/sbin/mysqld: Terminated.
    180424 12:42:20 mysqld_safe mysqld from pid file /run/mysqld/mysql.pid ended
    180424 14:16:07 [Note] WSREP: Read nil XID from storage engines, skipping position init
    180424 14:16:07 [Note] WSREP: wsrep_load(): loading provider library ‘none’
    180424 14:16:07 [Note] WSREP: Service disconnected.
    180424 14:16:08 [Note] WSREP: Some threads may fail to exit.
    Error in my_thread_global_end(): 1 threads didn’t exit
    180424 14:16:13 [Note] WSREP: Read nil XID from storage engines, skipping position init
    180424 14:16:13 [Note] WSREP: wsrep_load(): loading provider library ‘none’
    180424 14:16:13 [Note] WSREP: Service disconnected.
    180424 14:16:14 [Note] WSREP: Some threads may fail to exit.
    Error in my_thread_global_end(): 1 threads didn’t exit
    180424 14:16:20 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    180424 14:16:20 mysqld_safe Skipping wsrep-recover for empty datadir: /var/lib/mysql
    180424 14:16:20 mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
    180424 14:16:20 [Note] WSREP: wsrep_start_position var submitted: ‘00000000-0000-0000-0000-000000000000:-1’
    180424 14:16:20 [Note] WSREP: Read nil XID from storage engines, skipping position init
    180424 14:16:20 [Note] WSREP: wsrep_load(): loading provider library ‘/usr/lib64/libgalera_smm.so’
    180424 14:16:20 [Note] WSREP: wsrep_load(): Galera 2.12(r318911d) by Codership Oy loaded successfully.
    180424 14:16:20 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat
    180424 14:16:20 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
    180424 14:16:20 [Note] WSREP: Reusing existing ‘/var/lib/mysql//galera.cache’.
    180424 14:16:20 [Note] WSREP: Passing config to GCS: base_host = 10.225.213.207; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S; pc.weight = 1; protonet.backend = asio;
    180424 14:16:20 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
    180424 14:16:20 [Note] WSREP: wsrep_sst_grab()
    180424 14:16:20 [Note] WSREP: Start replication
    180424 14:16:20 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
    180424 14:16:20 [Note] WSREP: protonet asio version 0
    180424 14:16:20 [Note] WSREP: backend: asio
    180424 14:16:20 [Note] WSREP: GMCast version 0
    180424 14:16:20 [Note] WSREP: (f5acba63, ‘tcp://0.0.0.0:4567’) listening at tcp://0.0.0.0:4567
    180424 14:16:20 [Note] WSREP: (f5acba63, ‘tcp://0.0.0.0:4567’) multicast: , ttl: 1
    180424 14:16:20 [Note] WSREP: EVS version 0
    180424 14:16:20 [Note] WSREP: PC version 0
    180424 14:16:20 [Note] WSREP: gcomm: connecting to group ‘XSODB-Cluster’, peer ‘10.225.213.207:,10.225.213.208:,10.226.222.155:,10.226.222.156:,10.226.222.157:,10.225.213.206:’
    180424 14:16:20 [Warning] WSREP: (f5acba63, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ points to own listening address, blacklisting
    180424 14:16:20 [Note] WSREP: (f5acba63, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid f5acba63 is blacklisted, skipping
    180424 14:16:20 [Note] WSREP: (f5acba63, ‘tcp://0.0.0.0:4567’) turning message relay requesting on, nonlive peers:
    180424 14:16:20 [Note] WSREP: (f5acba63, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid f5acba63 is blacklisted, skipping
    180424 14:16:20 [Note] WSREP: (f5acba63, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid f5acba63 is blacklisted, skipping
    180424 14:16:20 [Note] WSREP: (f5acba63, ‘tcp://0.0.0.0:4567’) address ‘tcp://10.225.213.207:4567’ pointing to uuid f5acba63 is blacklisted, skipping
    180424 14:16:21 [Note] WSREP: declaring 4a777bd8 at tcp://10.226.222.156:4567 stable
    180424 14:16:21 [Note] WSREP: declaring c94810b6 at tcp://10.226.222.155:4567 stable
    180424 14:16:22 [Note] WSREP: Node 4a777bd8 state prim
    180424 14:16:22 [Note] WSREP: discarding pending addr without UUID: tcp://10.225.213.206:4567
    180424 14:16:22 [Note] WSREP: discarding pending addr proto entry 0x162b010
    180424 14:16:22 [Note] WSREP: discarding pending addr without UUID: tcp://10.225.213.208:4567
    180424 14:16:22 [Note] WSREP: discarding pending addr without UUID: tcp://10.226.222.157:4567
    180424 14:16:23 [Note] WSREP: gcomm: connected
    180424 14:16:23 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
    180424 14:16:23 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
    180424 14:16:23 [Note] WSREP: Opened channel ‘XSODB-Cluster’
    180424 14:16:23 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
    180424 14:16:23 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
    180424 14:16:23 [Note] WSREP: Waiting for SST to complete.
    180424 14:16:23 [Note] WSREP: STATE EXCHANGE: sent state msg: 395827bd-479b-11e8-bac2-9a831b445722
    180424 14:16:23 [Note] WSREP: STATE EXCHANGE: got state msg: 395827bd-479b-11e8-bac2-9a831b445722 from 0 (XSODB-Node5)
    180424 14:16:23 [Note] WSREP: STATE EXCHANGE: got state msg: 395827bd-479b-11e8-bac2-9a831b445722 from 1 (XSODB-Node4)
    180424 14:16:23 [Note] WSREP: STATE EXCHANGE: got state msg: 395827bd-479b-11e8-bac2-9a831b445722 from 2 (XSODB-Node2)
    180424 14:16:23 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 181,
    members = 2/3 (joined/total),
    act_id = 1211997,
    last_appl. = -1,
    protocols = 0/4/3 (gcs/repl/appl),
    group UUID = 3d52910e-229b-11e8-83d2-ae20bb88815f
    180424 14:16:23 [Note] WSREP: Flow-control interval: [28, 28]
    180424 14:16:23 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 1211997)
    180424 14:16:23 [Note] WSREP: State transfer required:
    Group state: 3d52910e-229b-11e8-83d2-ae20bb88815f:1211997
    Local state: 00000000-0000-0000-0000-000000000000:-1
    180424 14:16:23 [Note] WSREP: New cluster view: global state: 3d52910e-229b-11e8-83d2-ae20bb88815f:1211997, view# 182: Primary, number of nodes: 3, my index: 2, protocol version 3
    180424 14:16:23 [Warning] WSREP: Gap in state sequence. Need state transfer.
    180424 14:16:23 [Note] WSREP: Running: ‘wsrep_sst_xtrabackup-v2 –role ‘joiner’ –address ‘10.225.213.207’ –auth ‘sst:secret’ –datadir ‘/var/lib/mysql/’ –defaults-file ‘/etc/my.cnf’ –parent ‘4258”
    WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20180424 14:16:23.634)
    180424 14:16:23 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.225.213.207:4444/xtrabackup_sst
    180424 14:16:23 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
    180424 14:16:23 [Note] WSREP: Assign initial position for certification: 1211997, protocol version: 2
    180424 14:16:23 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (3d52910e-229b-11e8-83d2-ae20bb88815f): 1 (Operation not permitted)
    at galera/src/replicator_str.cpp:prepare_for_IST():447. IST will be unavailable.
    180424 14:16:23 [Note] WSREP: Node 2 (XSODB-Node2) requested state transfer from ‘*any*’. Selected 0 (XSODB-Node5)(SYNCED) as donor.
    180424 14:16:23 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1211997)
    180424 14:16:23 [Note] WSREP: Requesting state transfer: success, donor: 0
    180424 14:16:23 [Note] WSREP: (f5acba63, ‘tcp://0.0.0.0:4567’) turning message relay requesting off
    180424 14:16:34 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 –role ‘joiner’ –address ‘10.225.213.207’ –auth ‘sst:secret’ –datadir ‘/var/lib/mysql/’ –defaults-file ‘/etc/my.cnf’ –parent ‘4258’: 2 (No such file or directory)
    180424 14:16:34 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
    180424 14:16:34 [ERROR] WSREP: SST failed: 2 (No such file or directory)
    180424 14:16:34 [ERROR] Aborting

    180424 14:16:34 [Warning] WSREP: 0 (XSODB-Node5): State transfer to 2 (XSODB-Node2) failed: -22 (Invalid argument)
    180424 14:16:34 [ERROR] WSREP: gcs/src/gcs_group.cpp:long int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():717: Will never receive state. Need to abort.
    180424 14:16:34 [Note] WSREP: gcomm: terminating thread
    180424 14:16:34 [Note] WSREP: gcomm: joining thread
    180424 14:16:34 [Note] WSREP: gcomm: closing backend
    180424 14:16:35 [Note] WSREP: gcomm: closed
    180424 14:16:35 [Note] WSREP: /usr/sbin/mysqld: Terminated.
    180424 14:16:35 mysqld_safe mysqld from pid file /run/mysqld/mysql.pid ended

    • Percona DB cluster of 3 nodes and m able to start the master node properly but unable to start the other two nodes

Leave a Reply