In Percona Monitoring and Management (PMM) 1.13 we have adopted Prometheus 2, and with this comes a dramatic improvement in resource usage, along with performance improvements!
What does it mean for you? This means you can have a significantly larger number of servers and database instances monitored by the same PMM installation. Or you can reduce the instance size you use to monitor your environment and save some money.
Let’s look at some stats!
We can see an approximate 5x and 8x reduction of CPU usage on these two PMM Servers. Depending on the workload, we see CPU usage reductions to range between 3x and 10x.
There is also less disk write bandwidth required:
On this instance, the bandwidth reduction is “just” 1.5x times. Note this is disk IO for the entire PMM system, which includes more than only the Prometheus component. Prometheus 2 itself promises much more significant IO bandwidth reduction according to official benchmarks
According to the same benchmark, you should expect disk space usage reduction by 33-50% for Prometheus 2 vs Prometheus 1.8. The numbers will be less for Percona Monitoring and Management, as it also stores Query Statistics outside of Prometheus.
Resource usage on the monitored hosts
Also, resource usage on the monitored hosts is significantly reduced:
Why does CPU usage go down on a monitored host with a Prometheus 2 upgrade? This is because PMM uses TLS for the Prometheus to monitored host communication. Before Prometheus 2, a full handshake was performed for every scrape, taking a lot of CPU time. This was optimized with Prometheus 2, resulting in a dramatic CPU usage decrease.
Query performance is also a lot better with Prometheus 2, meaning dashboards visually load a lot faster, though we did not do any specific benchmarks here to share the hard numbers. Note though this improvement only applies when you’re querying the data which is stored in Prometheus 2.
If you’re querying data that was originally stored in Prometheus 1.8, it will be queried through the much slower and less efficient “Remote Read” interface, being quite a bit slower and using a lot more CPU and memory resources.
If you love better efficiency and Performance, consider upgrading to PMM 1.13!