Buy Percona ServicesBuy Now!

Evaluation of PMP Profiling Tools

 | April 5, 2017 |  Posted In: Benchmarks, MySQL, Percona Toolkit


In this blog post, we’ll look at some of the available PMP profiling tools.

While debugging or analyzing issues with Percona Server for MySQL, we often need a quick understanding of what’s happening on the server. Percona experts frequently use the pt-pmp tool from Percona Toolkit (inspired by

The pt-pmp tool collects application stack traces GDB and then post-processes them. From this you get a condensed, ordered list of the stack traces. The list helps you understand where the application spent most of the time: either running something or waiting for something.

Getting a profile with pt-pmp is handy, but it has a cost: it’s quite intrusive. In order to get stack traces, GDB has to attach to each thread of your application, which results in interruptions. Under high loads, these stops can be quite significant (up to 15-30-60 secs). This means that the pt-pmp approach is not really usable in production.

Below I’ll describe how to reduce GDB overhead, and also what other tools can be used instead of GDB to get stack traces.

  • GDB
    By default, the symbol resolution process in GDB is very slow. As a result, getting stack traces with GDB is quite intrusive (especially under high loads).There are two options available that can help notably reduce GDB tracing overhead:

      1. Use readnever patch. RHEL and other distros based on it include GDB with the readnever patch applied. This patch allows you to avoid unnecessary symbol resolving with the  --readnever option. As a result you get  up to 10 times better speed.
      2. Use gdb_index. This feature was added to address symbol resolving issue by creating and embedding a special index into the binaries. This index is quite compact: I’ve created and embedded gdb_index for Percona server binary (it increases the size around 7-8MB). The addition of the gdb_index speeds up obtaining stack traces/resolving symbols two to three times.

  • eu-stack (elfutils)
    The eu-stack from the elfutils package prints the stack for each thread in a process or core file.Symbol resolving also is not very optimized in eu-stack. By default, if you run it under load it will take even more time than GDB. But eu-stack allows you to skip resolving completely, so it can get stack frames quickly and then resolve them without any impact on the workload later.
  • Quickstack
    Quickstack is a tool from Facebook that gets stack traces with minimal overheads.

Now let’s compare all the above profilers. We will measure the amount of time it needs to take all the stack traces from Percona Server for MySQL under a high load (sysbench OLTP_RW with 512 threads).

The results show that eu-stack (without resolving) got all stack traces in less than a second, and that Quickstack and GDB (with the readnever patch) got very close results. For other profilers, the time was around two to five times higher. This is quite unacceptable for profiling (especially in production).

There is one more note regarding the pt-pmp tool. The current version only supports GDB as the profiler. However, there is a development version of this tool that supports GDB, Quickstack, eu-stack and eu-stack with offline symbol resolving. It also allows you to look at stack traces for specific threads (tids). So for instance, in the case of Percona Server for MySQL, we can analyze just the purge, cleaner or IO threads.

Below are the command lines used in testing:

Alexey Stroganov

Alexey Stroganov is a Performance Engineer at Percona, where he works on improvements and features that makes Percona Server even more flexible, faster and scalable. Before joining Percona he worked on the performance testings/analysis of MySQL server and it components at MySQL AB/Sun/Oracle for more than ten years. During this time he was focused on performance evaluations, benchmarks, analysis, profiling, various optimizations and tunings.


  • Just a shout out for which uses libunwind and is fast

    • Peter,

      We know about , but it was not updated for 5 years, it is very hard to get it compiled in modern environments, that’s why we do not put it into recommended list.

      • Thanks for the explanation Vadim,

        I just use get_stacktrace from knielsen-pmp, and on RHEL6 and RHEL7 it compiles with:

        > make get_stacktrace
        g++ -g -O3 -fomit-frame-pointer -o get_stacktrace -Llib -lunwind-ptrace -lunwind-generic -lrt

        For quickstack, with cmake > 3.1 or higher, it builds on RHEL7 but on RHEL6 the default gcc is 4.4.7 which does not support nullptr, using a workaround from , and then 3 casts for “overloaded ‘to_string(const int&)’ is ambiguous” errors do get it to build on EL6.

        So for me (on RHEL 6 and 7), knielsen-pmp was easier to compile than quickstack.

  • Alexey,

    I wonder if time it takes to take stack traces is the most important measure of the overhead. I would assume the “stall” – the maximum time the MySQL (or other process) get blocked from serving traffic would be more critical ?

Leave a Reply