Adding eBPF-Based Metrics to Percona Monitoring and Management

ebpf percona monitoring and managementI wanted to start this post with the words “eBPF is the hot new thing”, but I think it’s already too late to write that. Running eBPF programs in production is becoming less of a peculiarity and more of a normal way to operate. Famously, Facebook runs about 40 BPF programs on each server. There are multiple things you can do with BPF, but as a support engineer here in Percona, I’m mostly interested in the performance observability side of things. Running tools from the bcc project and writing short bpftrace programs is a great way to get insight into some particular performance problem. However, that’s still mostly not monitoring, at least not in a continuous way. Recently, I became curious to see if it’s possible to add metrics from BPF programs to Percona Monitoring and Management (PMM), and in this blog post, I’ll show that it’s surprisingly easy.

This post will not cover what’s BPF or means of writing BPF programs. You may watch Peter Zaitsev’s recent webinar “Using eBPF for Linux Performance Analysis” or read “Learn eBPF Tracing: Tutorial and Examples” by Brendan Gregg to understand the basics.


PMM is built on top of Prometheus, and thus the first thing that will be required is an exporter capable of converting BPF program’s output into metrics. Luckily, there’s already ebpf_exporter from Cloudflare. Next, we’ll need a reasonably modern kernel (ebpf_exporter’s readme mentions at least 4.14). It would probably be easier to use Ubuntu 20.04, which ships with the 5.4 LTS kernel, but I habitually went for CentOS, specifically version 8. On a side note, it’s possible to use the same approach for CentOS 7, but not all example BPF programs work. Even with CentOS 8, there are some issues along the way. Finally, we’ll need a BPF program. ebpf_exporter project comes with a set of examples, mimicking the aforementioned bcc tools, and I decided to go with a biolatency equivalent based on Kernel tracepoints.

Setting up the Environment

I used an extremely simple environment based on two CentOS 8 VMs run with Vagrant. The Vagrantfile used includes installation of the PMM2 server and setting up a monitored node with pmm-admin. Once the VMs are up, the other steps are manual to better show the process.

There’s just one pre-requisite for ebpf_exporter, and that’s the bcc package with its own dependencies. Unfortunately, at the time of writing this blog, CentOS 8.1 has only the 0.8.0 version of bcc, a rather old one, even though CentOS 7 provides version 0.10.0. Fortunately, in the “Stream” version of the OS, you can get the 0.11.0 version. I hope that the non-rolling versions of the OS will get updated packages soon, but for now, that’s how things are.

To change the version of CentOS 8 to Stream, a single command should be run:

With the OS side of things sorted out, we can proceed to more interesting stuff.

1. We’ll need to actually install the bcc package and its dependencies.

2. Once that’s done, we can test that the tools actually work. We should see PMM’s node_exporter monitoring the system.

A common issue with tools is a kernel and kernel-devel package version mismatch. Just make sure that you have a kernel-devel of the same version as your running kernel.

3. Install the ebpf_exporter. You can build it, but it’s easier to get the release version.

4. We’ll also need the bio-tracepoints.yaml file from the examples. You can get it alone, but I recommend cloning the whole repo so that you can explore the other examples as well.

5. Run the exporter and test its output.

It’s pretty easy to set up ebpf_exporter as a systemd service by modifying example files provided with node_exporter. Note that you’ll either need to run the program from root, or set up capabilities, as ebpf_exporter will need CAP_SYS_ADMIN capability.

6. Register newly-added exporter with a local pmm agent.

7. We don’t have any dashboards for the data yet, but we can check raw metrics in Prometheus. Navigate to http://pmm-url/prometheus/ to access its UI.

prometheus dashboard showing newly-added ebpf metrics

Setting up Grafana Dashboard

Now that we have the data, we need to represent it in a clear and convenient way. PMM’s Grafana allows you to set up custom dashboards, so that’s what we’ll need to do. I’ve prepared a very simple dashboard that has panels for BPF-based metrics alongside panels taken from the existing PMM dashboards, based on node_exporter. Dashboard’s JSON source is available alongside the Vagrantfile.

Metrics are provided by ebpf_exporter when running a bio program in the form of a histogram, which data can be used to calculate percentiles and averages, or used raw. Grafana has built-in support for histograms and heatmaps, and in this case, we’re going to be using heatmap panels. Heatmaps allow us to view the distribution over time, unlike the histogram that shows an instant representation of distribution. Looking at distribution changes over time can add more detail into otherwise “flat” data (like 95th percentile), and potentially show some otherwise hidden discontinuities. For example, on the next screenshot, you can see steady streams of 512KiB, 8kb, and 4kb write requests, on the “bio bytes write” panel. Percentile and average values are based on the same prometheus histogram data.

The screenshot below shows IO characteristics with sysbench running a very basic rw load.

Grafana Dashboard

Testing to See We Got The Right Data

No monitoring can be considered valid until it has been checked for correctness. As I’ve mentioned, the dashboard includes existing metrics, which we can treat as a source of truth. Thus, we need to test if the new metrics are showing the same load profile without skew. Inherently there will be some misalignment, because of different scraping intervals, but it shouldn’t be significant.

We can use a Flexible I/O tester program (fio for short) to generate an IO load profile, and then compare performance observations made by fio itself, node_exporter, and ebpf_exporter. This is not a benchmark, just a way to generate some load with actual performance metrics.