Announcement

Announcement Module
Collapse
No announcement yet.

pmp-check-mysql-replication-running returning UNKNOWN for SQL running, IO Running, Error:

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • pmp-check-mysql-replication-running returning UNKNOWN for SQL running, IO Running, Error:

    Hi all,
    I am new the percona toolkit, and just downloaded the percona-monitoring-plugin for nagios.

    For my slaves that are replicating OK, for the replication running check they are returning UNKNOWN for a good slave...

    SHOW SLAVE STATUS;
    Slave_IO_State: Waiting for master to send event
    Slave_IO_Running: Yes
    Slave_SQL_Running: Yes
    Last_Errno: 0
    Last_Error:

    The logic in the 0.9 plugin seems to just return the default STATE_UNKNOWN=3 for the Running/Running/No error, which by my reckoning should be OK.

    I can't imagine that this is a bug, so I am presuming that I am missing the purpose of this check. Is is supposed to be dependent on some other check or something

    Thanks
    Tom


    --




    # ################################################## ########## ############
    # Set up constants, etc.
    # ################################################## ########## ############
    STATE_OK=0
    STATE_WARNING=1
    STATE_CRITICAL=2
    STATE_UNKNOWN=3
    STATE_DEPENDENT=4
    EXITSTATUS=$STATE_UNKNOWN

    # ################################################## ########## ############
    # Run the program.
    # ################################################## ########## ############
    main() {
    # Get options
    for o; do
    case "${o}" in
    -c) shift; OPT_CRIT="${1}"; shift; ;;
    --defaults-file) shift; OPT_DEFT="${1}"; shift; ;;
    -H) shift; OPT_HOST="${1}"; shift; ;;
    -l) shift; OPT_USER="${1}"; shift; ;;
    -p) shift; OPT_PASS="${1}"; shift; ;;
    -P) shift; OPT_PORT="${1}"; shift; ;;
    -S) shift; OPT_SOCK="${1}"; shift; ;;
    -w) shift; OPT_WARN="${1}"; shift; ;;
    --version) grep -A2 '^=head1 VERSION' "$0" | tail -n1; exit 0 ;;
    --help) perl -00 -ne 'm/^ Usage:/ && print' "$0"; exit 0 ;;
    -*) echo "Unknown option ${o}. Try --help."; exit 1; ;;
    esac
    done
    if [ -e '/etc/nagios/mysql.cnf' ]; then
    OPT_DEFT="${OPT_DEFT:-/etc/nagios/mysql.cnf}"
    fi

    # Get replication status into a temp file. TODO: move this into a subroutine
    # and test it.
    local TEMP=$(mktemp "/tmp/${0##*/}.XXXX") || exit $?
    trap 'rm -rf "${TEMP}" >/dev/null 2>&1' EXIT
    mysql_exec 'SHOW SLAVE STATUS\G' > "${TEMP}"
    if [ $? = 0 ]; then
    # SHOW SLAVE STATUS isn't an error if the server isn't a replica. The file
    # will be empty if that happens.
    if [ -s "${TEMP}" ]; then
    NOTE=$(awk '$1 ~ /_Running:|Last_Error:/{print substr($0, 1, 100)}' "${TEMP}")
    if grep 'Last_Error: .' "${TEMP}" >/dev/null 2>&1; then
    EXITSTATUS=$STATE_CRITICAL
    NOTE="CRIT $NOTE"
    elif grep '_Running: No' "${TEMP}" >/dev/null 2>&1; then
    if [ "${OPT_CRIT}" ]; then
    EXITSTATUS=$STATE_CRITICAL
    NOTE="CRIT $NOTE"
    else
    EXITSTATUS=$STATE_WARNING
    NOTE="WARN $NOTE"
    fi
    fi
    elif [ "${OPT_WARN}" ]; then
    # Empty file; not a replica, but that's not supposed to happen.
    NOTE="WARN This server is not configured as a replica."
    EXITSTATUS=$STATE_WARNING
    else
    # Empty file; not a replica.
    NOTE="OK This server is not configured as a replica."
    EXITSTATUS=$STATE_OK
    fi
    else
    EXITSTATUS=$STATE_UNKNOWN
    NOTE="UNK could not determine replication status"
    fi

    echo $NOTE
    exit $EXITSTATUS
    }

  • #2
    This is a really embarrassing bug. It turns out that I didn't have a test case for "OK" and in the process of fixing a failure for "Replication isn't even set up on this server," I introduced this bug.

    This is https://bugs.launchpad.net/percona-monitoring-plugins/+bug/9 36571

    Comment

    Working...
    X