EmergencyEMERGENCY? Get 24/7 Help Now!

Percona Monitoring and Management (PMM) Graphs Explained: Custom MongoDB Graphs and Metrics

 | March 10, 2017 |  Posted In: MongoDB, Percona Monitoring and Management

PREVIOUS POST
NEXT POST

Percona Monitoring and Management (PMM)This blog post is another in the series on the Percona Server for MongoDB 3.4 bundle release. In this blog post, we will cover how to add custom MongoDB graphs to Percona Monitoring and Management (PMM) and (for the daring Golang developers out there) how to add custom metrics to PMM’s percona/mongodb_exporter metric exporter.

To get to adding new graphs and metrics, we first need to go over how PMM gets metrics from your database nodes and how they become graphs.

Percona Monitoring and Management (PMM)

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL and MongoDB. It was developed by Percona on top of open-source technology. Behind the scenes, the graphing features this article covers use Prometheus (a popular time-series data store), Grafana (a popular visualisation tool), mongodb_exporter (our MongoDB database metric exporter) plus other technologies to provide database and operating system metric graphs for your database instances.

Prometheus

As mentioned, Percona Monitoring and Management uses Prometheus to gather and store database and operating system metrics. Prometheus works on an HTTP(s) pull-based architecture, where Prometheus “pulls” metrics from “exporters” on a schedule.

To provide a detailed view of your database hosts, you must enable two PMM monitoring services for MongoDB graphing capabilities:

  1. linux:metrics
  2. mongodb:metrics

See this link for more details on adding monitoring services to Percona Monitoring and Management: https://www.percona.com/doc/percona-monitoring-and-management/pmm-admin.html#adding-monitoring-services

It is important to note that not all metrics gathered by Percona Monitoring and Management are graphed. This is by design. Storing more metrics vs. what is graphed is very useful when more advanced insight is necessary. We also aim for PMM to be simple and straightforward to use, explaining why we don’t graph all of the nearly 1,000 metrics we collect per MongoDB node on each polling.

My personal monitoring philosophy is “monitor until it hurts and then take one step back.” In other words, try to get a much data as you can without impacting the database or adding monitoring resources/cost. Also, Prometheus stores large volumes of metrics efficiently due to compression and highly optimized data structures on disk. This offsets a lot of the cost of collecting extra metrics.

Later in this blog, I will show how to add a graph for an example metric that is currently gathered by PMM but is not graphed in PMM as of today. To see what metrics are available on PMM’s Prometheus instance, visit “http://<pmm-server>/promtheus/graph”.

prometheus/node_exporter (linux:metrics)

PMM’s OS-level metrics are provided to Prometheus via the 3rd-party exporter: prometheus/node_exporter.

This exporter provides 712 metrics per “pull” on my test CentOS 7 host that has:

  • 2 x CPUs
  • 3 x disks with 6 x LVMs
  • 2 x network interfaces

Note: more physical or virtual devices add to the number of node_exporter metrics.

The inner workings and addition of metrics to this exporter will not be covered in this blog post. Generally, the current metrics offered by node_exporter are more than enough.

Below is a full example of a single “pull” of metrics from node_exporter on my test host:

percona/mongodb_exporter (mongodb:metrics)

Behind the scenes of the “mongodb:metric” Percona Monitoring and Management service, percona/mongodb_exporter is the Prometheus exporter that provides detailed database metrics for graphing on the Percona Monitoring and Management server. percona/mongodb_exporter is a Percona fork of the project dcu/mongodb_exporter, with some valuable additional metrics for MongoDB sharding, storage engines, etc.

The mongodb_exporter process is designed to automatically detect the MongoDB node type, storage engine, etc., without any special configuration aside from the connection string.

As of Percona Monitoring and Management 1.1.1, here are the number of metrics collected from a single replication-enabled mongodb instance on a single ‘pull’ from Prometheus:

  • Percona Server for MongoDB 3.2 w/WiredTiger: 173 metrics per “pull”
  • Percona Server for MongoDB 3.2 w/RocksDB (with 3 x levels): 239 metrics per “pul”
  • Percona Server for MongoDB 3.2 w/MMAPv1: 172 metrics per “pul”‘

Note: each additional replica set member adds an additional seven metrics to the list of metrics.

On the sharding side of things, a “mongos” process within a cluster with one shard reports 58 metrics, with one extra metric added for each additional cluster shard and 3-4 extra metrics added for each additional “mongos” instance.

Below is a full example of a single “pull” of metrics from one RocksDB instance. Prometheus exporters provide metrics at the HTTP URL: “/metrics”. This is the exact same payload Prometheus would poll from the exporter:

Grafana

PMM uses Grafana to visualize metrics stored in Prometheus. Grafana uses a concept of “dashboards” to store the definitions of what it should visualize. PMM’s dashboards are hosted under the GitHub project: percona/grafana-dashboards.

Grafana can support multiple “data sources” for metrics. As Percona Monitoring and Management uses Prometheus for metric storage, this is the “data source” used in Grafana.

Adding Custom Graphs to Grafana

In this section I will create an example graph to help indicate the “efficiency” of queries on a Mongod instance over the last 5 minutes. This is a good example to use because we plan to add this exact metric in an upcoming version of PMM.

This graph will rely on two metrics that are already provided to PMM via percona/mongodb_exporter since at least version 1.0.0 (“$host” = the hostname of a given node – explained later):

  1. mongodb_mongod_metrics_query_executor_total{instance=”$host”, state=”scanned_objects”} – A metric representing the total number of documents scanned by the server.
  2. mongodb_mongod_metrics_document_total{instance=”$host”, state=”returned”} – A metric representing the total number of documents returned by the server.

The graph will compute the change in these two metrics over five minutes and create a percentage/ratio of scanned vs. returned documents. A host with ten scanned documents and one document returned would have 10% efficiency and a host that scanned 100 documents to return 100 documents would have 100% efficiency (a 1:1 ratio).

Often you will encounter Prometheus metrics that are “total” metric counters that increment from the time the server is (re)started. Both of the metrics our new graph requires are incremented counters and thus need to be “scaled” or “trimmed” to only show the last five minutes of metrics (in this example), not the total since the server was (re)started.

Prometheus offers a very useful query function for incremented counters called “increase()”: https://prometheus.io/docs/querying/functions/#increase(). The Prometheus increase() function allows queries to return the amount a metric counter has increased over a given time period, making this trivial to do! It is also unaffected by counter “resets” due to server restarts as increase() only returns increases in counters.

The increase() syntax requires a time range to be specified before the closing round-bracket. In our case we will as increase() to consider the last five minutes, which is expressed with “[5m]” at the end of the increase() function, seen in the following example.

The full Prometheus query I will use to create query efficiency graph is:

Note: sum() is required around the increase() functions when dividing two numbers in Prometheus queries.

Now, let’s make this a new graph! To do this you can create a new dashboard in PMM’s Grafana or add to an existing dashboard. In this example I’ll create a new dashboard with a single graph.

To add a new dashboard, press the Dashboard selector in PMM’s Grafana and select “Create New”:

This will create a new dashboard named “New Dashboard”.

Most of PMM’s graphing uses a variable named “$host” in place of a hostname/IP. You’ll notice “$host” was used in the “query efficiency” Prometheus query earlier. The variable is set using a Grafana feature called Templating.

Let’s add a “$host” variable to our new dashboard so we can change what host we graph without modifying our queries. First, press the gear icon at the top of the dashboard and select “Templating”:

Then press “New” to create a new Templating variable.

Set “Name” to be host, set “Data Source” to Prometheus and set “Query” to label_values(instance). Leave all other settings default:

Press “Add” to add the template variable, then save and reload the dashboard.

This will add a drop-down of unique hosts in Prometheus like this:

On the first dashboard row let’s add a new graph by opening the row menu on the far left of the first row and then select “Add Panel”:

Select “Graph” as the type. Click the title of the blank graph to open a menu, press “Edit”:

This opens Grafana’s graph editor with an empty graph and some input boxes seen below.

Next, let’s add our “query efficiency” Prometheus query (earlier in this article) to the “Query” input field and add a legend name to “Legend format”:

Now we have some graph data, but the Y-axis and title don’t explain very much about the metric. What does “1.0K” on the Y-Axis mean?

As our metric is a ratio, let’s display the Y-axis as a percentage by switching to the “Axes” tab, then selecting “percent (0.0-1.0)” as the “Unit” selection for the “Left Y” axis, like so:

Next let’s set a graph title. To do this, go to the “General” tab of the graph editor and set the “Title” field:

And now we have a “Query Efficiency” graph with an accurate Y-axis and Title(!):

“Back to Dashboard” on the top-right will take you back to the dashboard view. Always remember to save your dashboards after making changes!

Adding Custom Metrics to percona/mongodb_exporter

For those familiar with the Go programming language, adding custom metrics to percona/mongodb_exporter is fairly straightforward. The percona/mongodb_exporter uses the Prometheus Go client to export metrics gathered from queries to MongoDB.

Adding a completely new metric to the exporter is unfortunately too open-ended to explain in a blog. Instead, I will cover how an existing metric is exported by percona/mongodb_exporter. The process for a new metric will be similar.

To follow our previous example, here is an simplified example of how the metric: “mongodb_mongod_metrics_query_executor_total” is exported via percona/mongodb_exporter. This source of this metric is “db.serverStatus().metrics.queryExecutor” from MongoDB shell perspective.

First a new Prometheus metric is defined as a go ‘var’:


  1. https://github.com/percona/mongodb_exporter/blob/master/collector/mongod/metrics.go#L58-L64
  2. A struct for marshaling the BSON response from MongoDB and an “.Export()” function is defined for the struct (.Export() is called on the metric structs):

    Notice that the float64 values unmarshaled from the BSON are used in the .Set() for the Prometheus metric. All Prometheus values must be float64.
    https://github.com/percona/mongodb_exporter/blob/master/collector/mongod/metrics.go#L271-L281
  3. In this case the “QueryExecutorStats” is a sub-struct of a larger “MetricStats” struct above it, also with its own “.Export()” function:

    https://github.com/percona/mongodb_exporter/blob/master/collector/mongod/metrics.go#L405-L430
  4. Finally a .Collect() and  .Describe()” (also required functions) is called on the metric to collect and describe it:
    https://github.com/percona/mongodb_exporter/blob/master/collector/mongod/metrics.go#L451
    https://github.com/percona/mongodb_exporter/blob/master/collector/mongod/metrics.go#L485
  5. Later on in this code, “MetricStats” is passed the result of the query “db.serverStatus().metrics”. This can be seen at:
    https://github.com/percona/mongodb_exporter/blob/master/collector/mongod/server_status.go#L60
    https://github.com/percona/mongodb_exporter/blob/master/collector/mongod/server_status.go#L177-L179
    and
    https://github.com/percona/mongodb_exporter/blob/master/collector/mongod/server_status.go#L197-L207

For those unfamiliar with Go and/or unable to contribute new metrics to the project, we suggest you open a JIRA ticket for any feature requests for new metrics here: https://jira.percona.com/projects/PMM.

Conclusion

With the flexibility of the monitoring components of Percona Monitoring and Management, the sky is the limit on what can be done with database monitoring! Hopefully this blog gives you a taste of what is possible if you need to add a new graph, a new metric or both to Percona Monitoring and Management. Also, it is worth repeating that a large number of metrics gathered in Percona Monitoring and Management are not graphed. Perhaps what you’re looking for is already collected. See “http://<pmm-server>/prometheus” for more details on what metrics are stored in Prometheus.

We are always open to improving our dashboards, and we would love to hear about any custom graphs you create and how they help solve problems!

PREVIOUS POST
NEXT POST

Leave a Reply