pt-pmp is a profiler tool that creates and summarizes full stack traces of processes on Linux. It was inspired by http://poormansprofiler.org and helped Percona Support resolve many performance issues. In this blog post, I will present an improved pt-pmp that can collect stack traces with minimal impact on the production environment.

TLDR; Starting from Percona Toolkit 3.6.0, use option –dumper set to eu (eu-stack) or pteu (pt-eustack-resolver) instead of default dumper gdb. This increases pt-pmp performance around seven times for dumper eu and 65 times for dumper pteu with no information loss. You need to install elfutils to explore these options. 

If you need summary only for specific threads, pass their numbers to option –tids (regular expressions supported).

In high concurrent environments, when you notice performance issues, stack traces that pt-pmp summarize can help you quickly find the solution.

For example, in the listing below, we clearly see that 101 threads are waiting for a metadata lock (MDL_wait::timed_wait(mdl.cc:1837)) that these threads try to acquire when open tables for the query (open_tables_for_query(sql_base.cc:6824)):

 

Once the reason for the stall is found, DBA can consider ways to recover from the situation.

However, this ease of use is not free. pt-pmp works by attaching gdb to the database server process and running the backtrace (bt) command for all running threads. This leads to consequences that Baron Schwartz described in his blog post in the year 2011:

  • The server freezes for the duration of the process. This is the most obvious impact on the running server: GDB forklifts the process and gets a stack trace from every thread, then lets it go on working. But this can take a while. It’s usually a couple of seconds, but on big servers with a lot of memory and many threads, it can take much longer (I’ve seen tens of seconds but heard reports of minutes).
  • The server can crash. I haven’t seen this myself, but I’ve heard reports from others.
  • The server can be left in an unstable state. Sometimes, when GDB detaches, the process’s state isn’t quite as it was before. I have rarely seen this, but the other day I had a customer experience a very slow-running server that was using tons of CPU time and lots of system CPU, and exhibiting the classic signs of InnoDB kernel_mutex contention. I was not able to find anything wrong, and was still trying to determine whether the sudden slowness was due to some cause such as an increase in traffic or just InnoDB having trouble, when the customer Skyped me to say that they’d restarted the server and it resolved the problems. This server had previously been analyzed with GDB around the time the problems began.

This is still true in 2024. That is why we added performance improvements for pt-pmp in Percona Toolkit 3.6.0.

eu-stack support

eu-stack is a utility from the elfutils package. eu-stack prints a stack for each thread in a process but, unlike gdb, does it with minimal impact.

pt-pmp now allows users to choose if they want to obtain stack traces with gdb (default) or with eu-stack. Additionally, Percona Toolkit now includes the utility pt-eustack-resolver that collects stack traces even faster. To collect summaries with eu-stack you need to set option –dumper (short form: -d) to eu (eu-stack) or pteu (pt-eustack-resolver).

eu-stack summaries have the same information as gdb summaries, so you won’t lose any data.

Let’s compare by example:

gdb:

eu-stack:

pt-eustack-resolver:

You see that pt-eustack-resolver runs significantly faster but removes source code coordinates. Otherwise, stack traces are not cropped.

–readnever support for gdb

Another performance improvement for pt-pmp is support of the option –readnever for gdb. This option tells gdb to don’t read debug symbol files when producing summaries. It runs way faster than gdb with default options but may produce unusable summaries if run for the release server:

While you can get traces similar to the below example if use a debug version of the database server, we recommend using this option only in cases when all other options cause noticeable performance impact.

Thread-specific traces

pt-pmp now supports option –tids (short form: -t). If you pass regular expressions, separated by commas, as argument for this option, pt-pmp will only print traces for threads, matching those expressions.

For example,

pt-pmp -t 21201,23846 will print summaries only for threads 21201,23846

pt-pmp -t ^25 will print summaries only for threads; those numbers start from 25

pt-pmp -t 21201,237.8 will print summaries only for thread 21201, and threads those numbers match regular expression 237.8, e.g. 23708 or 23758

You will find examples of the original stack trace, full summary, and summary for specified threads only in the Percona Toolkit test suite:

Option –tids takes effect when pt-pmp produces a summary, therefore you can use it on saved samples, created by any version of pt-pmp. This could be handy for Support engineers who work on data, sent by their customers.

Acknowledgments

These improvements are based on the work by Alexey Stroganov, originally published in Percona Labs. We intentionally removed quickstack support from the original patch because it does not produce useful stack traces with MySQL 8.0 and made gdb parameter –readnever optional.

Summary

Collecting stack traces and summarizing them with eu-stack is much faster than with default dumper gdb. Install elfutils and use pt-pmp with option –dumper set either to eu or pteu unless the use of gdb is absolutely necessary.

Filter examined threads with option –tids if needed.

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments