Announcement

Announcement Module
Collapse
No announcement yet.

WSREP Error Failed to initialize backend

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • WSREP Error Failed to initialize backend

    Hello everyone -

    I am dealing with another DBA's installation of Percona Cluster this afternoon - and he's not available.

    I have searched the forums here and I have failed to find a post regarding the problem I am seeing.

    The sysadmin restarted NODE01 and MySQL (Percona Server) is not starting due to:

    120816 13:11:33 [ERROR] WSREP: gcs/src/gcs_backend.c:gcs_backend_init():87: Invalid backend URI: 0
    120816 13:11:33 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to initialize backend using '0': -22 (Invalid argument)
    120816 13:11:33 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1290: Failed to open channel 'c_l' at '0': -22 (Invalid argument)
    120816 13:11:33 [ERROR] WSREP: gcs connect failed: Invalid argument
    120816 13:11:33 [ERROR] WSREP: wsrep::connect() failed: 6
    120816 13:11:33 [ERROR] Aborting


    That's from the NODE1 error log.

    Any idea at all what's going on here?

    The system has been previously up and running in a four node arrangement. The other three nodes have MySQL up and running.

    Thanks in advance for ANY advice, thoughts or suggestions you may be able to provide.

    /David C.

  • #2
    Additional information that may help:

    uname -a
    Linux hsdb01 3.2.0-24-virtual #37-Ubuntu SMP Wed Apr 25 10:17:19 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

    free -m
    total used free shared buffers cached
    Mem: 16050 605 15444 0 30 155
    -/+ buffers/cache: 419 15630
    Swap: 3317 0 3317


    Thanks!

    /David C.

    Comment


    • #3
      OK - from what I can tell, WSREP is provided by something called Galera.

      I've been searching the Galera wiki - but the troubleshooting page is not active.
      http://www.codership.com/wiki/doku.php

      Thanks -

      /David C.

      Comment


      • #4
        Don't everybody jump in at once! Heh heh -

        What I did to get the node rejoined to the cluster was to:

        1) Copy the running process values from the ps aux | grep ^mysql command
        2) Modify the running process values from using the network where none of the nodes were listening to the network where they were listening

        Original process values shown by ps aux | grep ^mysql:

        /usr/local/mysql/bin/mysqld_safe --basedir=/usr/local/mysql --datadir=/data/cluster-data --plugin-dir=/usr/local/mysql/lib/mysql/plugin --user=mysql --log-error=/data/cluster-data/pcdb01.err --pid-file=/data/cluster-data/pcdb01.pid --wsrep_cluster_address=gcomm://10.4.8.142:4567

        Values passed to system as start up command that resulted in the node rejoining the cluster:

        /usr/local/mysql/bin/mysqld_safe --basedir=/usr/local/mysql --datadir=/data/cluster-data --plugin-dir=/usr/local/mysql/lib/mysql/plugin --user=mysql --log-error=/data/cluster-data/pcdb01.err --pid-file=/data/cluster-data/pcdb01.pid --wsrep_cluster_address=gcomm://10.10.13.142

        Discussion:
        What happened was that the maintenance performed by the sysadmin which resulted in needing to reboot the system was to add an additional network to eth0.

        Because none of the other nodes in the cluster had that network configured they were not able to communicate with that node.

        That node was using the last network added to eth0 as its default network route for all traffic.

        So - Percona Cluster (tertiary party MySQL product) relies heavily on Galera Cluster (quaternary party) which in turn relies on WSRep (quinary! party) for the communications layer between the nodes. (From what I am told, this is actually fairly common practice.)

        Once I had all that teased apart and understood the error message more accurately, I found the following page to be VERY helpful:

        http://www.codership.com/wiki/doku.php?id=info

        especially this section on joing a new node to the cluster - because rejoining an existing node works the same way:

        http://www.codership.com/wiki/doku.php?id=info#adding_anothe r_node_to_a_cluster

        I tried using the network address in the running process list above (10.4.8.142:4567) - but that failed with a 'cluster not found' error.

        At this point I new that I needed to find which address one of the other nodes was running on.

        A quick ps aux | grep ^mysql on node 2 showed that it was participating on the 10.10.13.142 address, not the 10.4.8.142 address.

        Once I passed the correct network address value to node 1 for start up, it came up right away and began using rsync to catch itself up with the other nodes on the cluster.

        I did not have to specify the port 4567 because that is the default port for Percona Cluster.

        I think that's about it.

        Any questions?

        /David C.

        Comment

        Working...
        X