GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Primary node restart failed

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Primary node restart failed

    I've setuped a simple Master-Slave using weighted quorum http://www.codership.com/wiki/doku.p...eighted_quorum

    node-1 is a master:
    wsrep_provider_options="pc.weight=1; gcs.fc_master_slave=yes"
    wsrep_cluster_address=gcomm://node-1,node-2

    node-2 is a slave:
    wsrep_provider_options="pc.weight=0; gcs.fc_master_slave=yes"
    wsrep_cluster_address=gcomm://node-1,node-2

    /etc/hosts on both servers has ip address mappings

    When i restart mysql on master with `service mysql restart` it fails to join back cluster with a connection refused error in server log:


    130529 8:59:31 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer 'node-1:,node-2:'
    130529 8:59:31 [Warning] WSREP: (9a694329-c85f-11e2-0800-3f3cae6230d1, 'tcp://0.0.0.0:4567') address 'tcp://10.137.48.175:4567' points to own listening address, blacklisting
    130529 8:59:31 [Note] WSREP: (9a694329-c85f-11e2-0800-3f3cae6230d1, 'tcp://0.0.0.0:4567') address 'tcp://10.137.48.175:4567' pointing to uuid 9a694329-c85f-11e2-0800-3f3cae6230d1 is blacklisted, skipping
    130529 8:59:31 [Note] WSREP: declaring b27d9646-c85b-11e2-0800-eeeca83be563 stable
    130529 8:59:31 [Note] WSREP: view(view_id(NON_PRIM,9a694329-c85f-11e2-0800-3f3cae6230d1,16) memb {
    9a694329-c85f-11e2-0800-3f3cae6230d1,
    b27d9646-c85b-11e2-0800-eeeca83be563,
    } joined {
    } left {
    } partitioned {
    faec897c-c85e-11e2-0800-2721f22ff1fc,
    })
    130529 9:00:01 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
    at gcomm/src/pc.cpp:connect():139
    130529 9:00:01 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():195: Failed to open backend connection: -110 (Connection timed out)
    130529 9:00:01 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel 'my_wsrep_cluster' at 'gcomm://node-1,node-2': -110 (Connection timed out)
    130529 9:00:01 [ERROR] WSREP: gcs connect failed: Connection timed out
    130529 9:00:01 [ERROR] WSREP: wsrep::connect() failed: 6
    130529 9:00:01 [ERROR] Aborting


    To get back node-1 to cluster i've started it with `service mysql start --wsrep-cluster-address="gcomm://"`

    My question: is it expected behavior? it's a bit strange to get connectivity error, while node-2 is up and port 4567 is listening. when cluster contains only one primary node is it safe to restart mysql daemon on it?

  • #2
    Can you pin this behavior on the weighted quorum? Does it behave normally if you get rid of the weight?

    Comment


    • #3
      Yes, without 'weight' settings it works normally.

      Comment


      • #4
        Then by all means file a bug! http://www.percona.com/doc/percona-x...bugreport.html

        I haven't tested this feature, so I can't vouch for it.

        Comment

        Working...
        X