Announcement

Announcement Module
Collapse
No announcement yet.

Questions about two events being logged in the error log

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions about two events being logged in the error log

    1. I have been noticing in my PXC error log that roughly every day (sometimes several times a day), I get about 16 of the exact same error messages logged consecutively:

    130729 15:44:53 [Note] WSREP: (29f1d3b8-f86d-11e2-0800-28466944d053, 'tcp://0.0.0.0:4567') address 'tcp://XXX.XXX.XXX.XXX:4567' pointing to uuid 29f1d3b8-f86d-11e2-0800-28466944d053 is blacklisted, skipping

    The only difference between the lines is the timestamp, all of which occur within about a 2 second period.

    2. The other issue that I am seeing occurs at 1am and 3am when our automated backup solution kicks in on this specific node:

    130730 1:02:28 [Note] WSREP: Provider paused at a8e8a277-6f03-11e2-0800-5896d9f10d3c:14031249
    130730 1:02:28 [Note] WSREP: Provider resumed.

    Several of these are logged consecutively. Does this mean that when this occurs this node is no longer a member of the cluster? I do not get a wsrep_notify_cmd hit and our load balancer has never logged that the server was down when polling the clustercheck xinetd script (I do realize that the time in which it is reporting as paused is very fast and it would be hard for the poller to catch it in a down state). Does it mean it is just pausing for flow control reasons? The other nodes in the cluster do not log any events during the time period in which this is occuring.

    Thanks for any insight.

  • #2
    Forgot to mention this is PXC 5.5.30 wsrep_23.7.4.r3843.

    Thanks.

    Comment


    • #3
      Nobody else is seeing this?

      Comment


      • #4
        Originally posted by gabeguillen View Post

        130729 15:44:53 [Note] WSREP: (29f1d3b8-f86d-11e2-0800-28466944d053, 'tcp://0.0.0.0:4567') address 'tcp://XXX.XXX.XXX.XXX:4567' pointing to uuid 29f1d3b8-f86d-11e2-0800-28466944d053 is blacklisted, skipping
        These are harmless by themselves, but they indicate that some extra state checking is happening (AFAICT).


        Originally posted by gabeguillen View Post

        2. The other issue that I am seeing occurs at 1am and 3am when our automated backup solution kicks in on this specific node:

        130730 1:02:28 [Note] WSREP: Provider paused at a8e8a277-6f03-11e2-0800-5896d9f10d3c:14031249
        130730 1:02:28 [Note] WSREP: Provider resumed.

        Several of these are logged consecutively. Does this mean that when this occurs this node is no longer a member of the cluster? I do not get a wsrep_notify_cmd hit and our load balancer has never logged that the server was down when polling the clustercheck xinetd script (I do realize that the time in which it is reporting as paused is very fast and it would be hard for the poller to catch it in a down state). Does it mean it is just pausing for flow control reasons? The other nodes in the cluster do not log any events during the time period in which this is occuring.
        This is triggered by the Galera provider on this node not being able to write locally. Typically this would be caused by a FTWRL (probably from your backup).

        A paused provider cannot write, and that will backup the local recv queue, and that in turn may cause flow control depending on your fc_limit and associated fc* settings in the wsrep_provider_options AND if your node is in the 'Synced' state. So FC is related, but not necessarily the case if you see this message.

        I can't necessarily account for why you'd see many of these, but what backup method are you using? That may explain it.

        Comment

        Working...
        X