The important bit about the cluster is that each node should be monitored independently. There is no centralized node, the cluster is the set of active, connected nodes, and each can have a different view of the cluster. Further, many of these variables are relative to the node you query them from: for example, replication sent (from this node) and received (from writes on the rest of the cluster). Having data from all nodes helps tracking down the source of issues (i.e., where are the flow control messages coming from? Where did that 100MB transaction came from?).
Standard MySQL alerting should apply here. Percona XtraDB Cluster specific alerting should include:
Other optional alerting could be done on:
Metrics collection (i.e., long-term graphing) on the cluster should be done on:
- Queue sizes (wsrep_local_recv_queue, wsrep_local_send_queue)
- Flow control (wsrep_flow_control_sent, wsrep_flow_control_recv)
- Number of transactions in and out of this node (wsrep_replicated, wsrep_received)
- Number of transactions in and out in bytes (wsrep_replicated_bytes, wsrep_received_bytes)
- Replication conflicts (wsrep_local_cert_failures and wsrep_local_bf_aborts)
For general inquiries, please send us your question and someone will contact you.