Percona XtraDB Cluster/ Galera with Percona Monitoring Plugins

The Percona Monitoring Plugins (PMP) provide some free tools to make it easier to monitor PXC/Galera nodes.  Monitoring broadly falls into two categories: alerting and historical graphing, and the plugins support Nagios and Cacti, respectively, for those purposes.


An update to the PMP this summer (thanks to our Remote DBA team for supporting this!) added a Galera-specific host template that includes a variety of Galera-related stats, including:

  • Replication traffic and transaction counts and average trx size
  • Inbound and outbound (Send and Recv) queue sizes
  • Parallelization efficiency
  • Write conflicts (Local Cert Failures and Brute Force Aborts)
  • Cluster size
  • Flow control

You can see examples and descriptions of all the graphs in the manual.


There is not a Galera-specific Nagios plugin in the PMP yet, but there does exist a check that can pretty universally check any status variable you like called pmp-check-mysql-status.  We can pretty easily adapt this to check some key action-worthy Galera stats, but I hadn’t worked out the details until a customer requested it recently.

Checking for a Primary Cluster

Technically this is a cluster or cluster-partition state for whatever part of the cluster the queried node is a part of.  However, any single node could be disconnected from the rest of the cluster, so checking this on each node should be fine.  We can verify this with this check:

Local node state

We also want to verify the given node is ‘Synced’ into the cluster and not in some other state:

Note that we are only  warning when the state is not Synced — this is because it is perfectly valid for a node to be in the Donor/Desynced state.  This warning can alert us to a node in a less-than-ideal state without screaming about it, but you could certainly go critical instead.

Verify the Cluster Size

This is a bit of a sanity check, but we want to know how many nodes are in the cluster and either warn if we’re down a single node or go critical if we’re down more.  For a three node cluster, your check might look like this:

This is OK when we have 3 nodes, warns at 2 nodes and goes critical at 1 node (when we have no redundancy left).   You could certainly adjust thresholds differently depending on your normative cluster size.   This check is likely meaningless unless we’re also in a Primary cluster, so you could set a service dependency on the Primary Cluster check here.

Check for Flow Control

Flow control is really something to keep an eye on in your cluster. We can monitor the recent state of flow control like this:

This warns when FC exceeds 10% and goes critical after 90%.  This may need some fine tuning, but I believe it’s a general principle that some small amount of FC might be normal, but you want to know when it starts to get more excessive.


Alerting with Nagios and Graphing with Cacti tend to work best with per-host checks and graphs, but there are aspects of a PXC cluster that you may want to monitor from a cluster-wide perspective.  However, most of the things that can “go wrong” are easily detectable with per-host checks and you can get by without needing a custom script that is Galera-aware.

I’d also always recommend what I call a “service check” that connects through your VIP or load balancer to ensure that MySQL is available (regardless of underlying cluster state) and can do a query.  As long as that works (proving there is at least 1 Primary cluster node), you can likely sleep through any other cluster event.  🙂

Share this post

Comments (4)

  • AdriannaY


    I have some problems with “T” options when I run the command :

    [root@mypriv-bd3 ~]# /usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_local_state_status -C ‘!=’ -T str -w Synced

    I have this error: you specified -T but not -y. Try –help.

    ditto when I run : /usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_cluster_status -C == -T str -c non-Primary

    Is there a configuration setting or a particular change to do?

    Thanks in advance.

    November 19, 2013 at 10:23 am
  • Jay Janssen


    It works for me:

    [root@node1 ~]# /usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_local_state_comment -C ‘!=’ -T str -w Synced
    OK wsrep_local_state_comment (str) = Synced | wsrep_local_state_comment=Synced;Synced;;0;

    Make sure you are running the latest version of the plugins, it’s possible that this flag was modified in recent releases:

    [root@node1 ~]# rpm -qa | grep nagios

    November 20, 2013 at 8:16 am
  • AdriannaY

    Hello Jay;

    Thanks for your answer . I had percona-nagios-plugins-1.0.3-1.noarch.rpm on my machine , I installed the specified package now and everything goes well.


    November 20, 2013 at 10:19 am
  • Rares Dumitrescu

    hi. this is a super necro but :

    /usr/lib64/nagios/plugins/pmp-check-mysql-status -x wsrep_local_state_comment -C ‘!=’ -T str -w Synced
    ERROR 1682 (HY000) at line 1: Native table ‘performance_schema’.’global_variables’ has the wrong structure
    UNK could not get MySQL status/variables.

    this is happening on a percona 5.7 installation. mysql_upgrade has been run. i got nothing there.

    June 26, 2017 at 8:15 am

Comments are closed.