Primary node restart failed

  • Filter
  • Time
  • Show
Clear All
new posts

  • Primary node restart failed

    I've setuped a simple Master-Slave using weighted quorum http://www.codership.com/wiki/doku.p...eighted_quorum

    node-1 is a master:
    wsrep_provider_options="pc.weight=1; gcs.fc_master_slave=yes"

    node-2 is a slave:
    wsrep_provider_options="pc.weight=0; gcs.fc_master_slave=yes"

    /etc/hosts on both servers has ip address mappings

    When i restart mysql on master with `service mysql restart` it fails to join back cluster with a connection refused error in server log:

    130529 8:59:31 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer 'node-1:,node-2:'
    130529 8:59:31 [Warning] WSREP: (9a694329-c85f-11e2-0800-3f3cae6230d1, 'tcp://') address 'tcp://' points to own listening address, blacklisting
    130529 8:59:31 [Note] WSREP: (9a694329-c85f-11e2-0800-3f3cae6230d1, 'tcp://') address 'tcp://' pointing to uuid 9a694329-c85f-11e2-0800-3f3cae6230d1 is blacklisted, skipping
    130529 8:59:31 [Note] WSREP: declaring b27d9646-c85b-11e2-0800-eeeca83be563 stable
    130529 8:59:31 [Note] WSREP: view(view_id(NON_PRIM,9a694329-c85f-11e2-0800-3f3cae6230d1,16) memb {
    } joined {
    } left {
    } partitioned {
    130529 9:00:01 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
    at gcomm/src/pc.cpp:connect():139
    130529 9:00:01 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
    130529 9:00:01 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel 'my_wsrep_cluster' at 'gcomm://node-1,node-2': -110 (Connection timed out)
    130529 9:00:01 [ERROR] WSREP: gcs connect failed: Connection timed out
    130529 9:00:01 [ERROR] WSREP: wsrep::connect() failed: 6
    130529 9:00:01 [ERROR] Aborting

    To get back node-1 to cluster i've started it with `service mysql start --wsrep-cluster-address="gcomm://"`

    My question: is it expected behavior? it's a bit strange to get connectivity error, while node-2 is up and port 4567 is listening. when cluster contains only one primary node is it safe to restart mysql daemon on it?

  • #2
    Can you pin this behavior on the weighted quorum? Does it behave normally if you get rid of the weight?


    • #3
      Yes, without 'weight' settings it works normally.


      • #4
        Then by all means file a bug! http://www.percona.com/doc/percona-x...bugreport.html

        I haven't tested this feature, so I can't vouch for it.


        • #5

          I'm experiencing similar situation in an identical 2 nodes configuration (usefull for little shops with no shared storage).
          When restarting both nodes it is necessary to clear Quorum (service mysql bootstrap-pcx on master node, probably works also on slave) in order to clearly restart the galera cluster. So every restart of both nodes require manual intervention.
          I tried to reproduce the problem stopping each of the two nodes. Sequence of tests follows. Is there any patch or configuration of evs.* timing parameters that avoids this problems?

          I can not post tests due to character number limits.

          Thanks in advance,