GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Percona - all nodes down, how could this happen?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Percona - all nodes down, how could this happen?



    Please see the link

  • #2
    Hi abdel,

    Thanks for joining our community. However, please post your entire question here on this forum vs. linking to another community.
    Is this an emergency? Get immediate assistance from Percona Support 24/7. Click here.

    Comment


    • abdel
      abdel commented
      Editing a comment
      I tried that before getting "Error: Maximum number of characters exceeded. It cannot be more than 10000 characters. The current number of characters is 11599"

      We are running the following version of Percona using 3 nodes but something strange happened today, node1 and node3 were requesting sync while the node2 is in donor state. How could this happen? this is not fault tolerant.

      [root@afdoz mysql]# yum list installed | grep percona
      Percona-Server-shared-compat.x86_64 5.5.25a-rel27.1.277.rhel6 @percona
      Percona-XtraDB-Cluster-client.x86_64 1:5.5.24-23.6.340.rhel6 @percona-testing
      Percona-XtraDB-Cluster-galera.x86_64 2.0-1.113.rhel6 @percona-testing
      Percona-XtraDB-Cluster-server.x86_64 1:5.5.24-23.6.340.rhel6 @percona-testing
      percona-release.x86_64 0.0-1 installed
      percona-testing.noarch 0.0-1 installed
      percona-toolkit.noarch 2.1.3-2 @percona
      percona-xtrabackup.x86_64 2.0.1-446.rhel6 @percona
      Taking close look at logs, first node3 is out of sync, fine, then suddenly node1 requests transfer from any. Should node2 be a Donor or be in Synced state and wait for manual sync at certain time by administrator? Is there a way to make it so that when 2 out 3 nodes are requesting sync, to switch to synced state and wait for manual sync? Please see the logs below:

      130527 16:25:19 [Note] WSREP: (dc330826-5989-11e2-0800-bd955f45c72d, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://1.2.3.4:4567
      130527 16:25:20 [Note] WSREP: (dc330826-5989-11e2-0800-bd955f45c72d, 'tcp://0.0.0.0:4567') reconnecting to 597dbd56-6634-11e2-0800-a8763b7492b0 (tcp://1.2.3.4:4567), attempt 0
      130527 16:25:21 [Note] WSREP: evs:roto(dc330826-5989-11e2-0800-bd955f45c72d, OPERATIONAL, view_id(REG,1dd58aab-900e-11e2-0800-c1b9888cad0c,35)) suspecting node: 597dbd56-6634-11e2-0800-a8763b7492b0
      130527 16:25:21 [Note] WSREP: evs:roto(dc330826-5989-11e2-0800-bd955f45c72d, GATHER, view_id(REG,1dd58aab-900e-11e2-0800-c1b9888cad0c,35)) suspecting node: 597dbd56-6634-11e2-0800-a8763b7492b0
      130527 16:25:21 [Note] WSREP: declaring 1dd58aab-900e-11e2-0800-c1b9888cad0c stable
      130527 16:25:21 [Note] WSREP: view(view_id(PRIM,1dd58aab-900e-11e2-0800-c1b9888cad0c,36) memb {
      1dd58aab-900e-11e2-0800-c1b9888cad0c,
      dc330826-5989-11e2-0800-bd955f45c72d,
      } joined {
      } left {
      } partitioned {
      597dbd56-6634-11e2-0800-a8763b7492b0,
      })

      Error: Maximum number of characters exceeded. It cannot be more than 10000 characters. The current number of characters is 11599

  • #3
    Any answer will be appreciated.

    Comment


    • #4
      Abdel, I'm not sure I fully follow your question, but I think you are asking why two nodes can be joining at the same time when there is only one donor?

      Comment


      • abdel
        abdel commented
        Editing a comment
        The issue I'm having is all the nodes should not be down at the same time, donor state is down state. If two nodes are down, do not use the third as donor and wait for manual since at a certain time when nobody is using the system. Else my sites are down till at least 2 nodes are synced. If this takes minutes to since, then no problem but the sync takes an hour because it's a big database. How can I make it so that at least one node needs to be up? any configuration option. Thank you!

      • percona.jayj
        percona.jayj commented
        Editing a comment
        I see -- you're asking for a feature to prevent SST in the case there is only 1 node in the cluster. This doesn't exist, and donor management is left up to the operator.
    Working...
    X