Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Profiling Software Using perf and Flame Graphs

November 20, 2019

Author

Agustín Gallego

MySQL

Percona Software

Share this Post:

In this blog post, we will see how to use perf (a.k.a.: perf_events) together with Flame Graphs. They are used to generate a graphical representation of what functions are being called within our software of choice. Percona Server for MySQL is used here, but it can be extended to any software you can take a resolved stack trace from.

Before moving forward, a word of caution. As with any profiling tool, DON’T run this in production systems unless you know what you are doing.

Installing Packages Needed

For simplicity, I’ll use commands for CentOS 7, but things should be the same for Debian-based distros (apt-get install linux-tools-$(uname -r) instead of the yum command is the only difference in the steps).

To install perf, simply issue:

SHELL> sudo yum install -y perf

1	SHELL> sudo yum install -y perf

To get Flame Graphs project:

SHELL> mkdir -p ~/src
SHELL> cd ~/src
SHELL> git clone https://github.com/brendangregg/FlameGraph

SHELL> mkdir -p ~/src

SHELL> cd ~/src

SHELL> git clone https://github.com/brendangregg/FlameGraph

That’s it! We are good to go.

Capturing Samples

Flame Graphs are a way of visualizing data, so we need to have some samples we can base off of. There are three ways in which we can do this. (Note that we will use the -p flag to only capture data from our process of interest, but we can potentially capture data from all the running processes if needed.)

1- Capture for a set amount of time only (ten seconds here):

SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld) -- sleep 10

1	SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld) -- sleep 10

2- Capture until we send the interrupt signal (CTRL-C):

SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld)

1	SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld)

3- Capture for the whole lifetime of the process:

SHELL> sudo perf record -a -F 99 -g -- /sbin/mysqld 
--defaults-file=/etc/percona-server.conf.d/mysqld.cnf --user=mysql

1 2	SHELL> sudo perf record -a -F 99 -g -- /sbin/mysqld --defaults-file=/etc/percona-server.conf.d/mysqld.cnf --user=mysql

SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld) -- mysql -e "SELECT * FROM db.table"

1	SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld) -- mysql -e "SELECT * FROM db.table"

We are forced to capture data from all processes in the first case of the third variant since it’s impossible to know the process ID (PID) number beforehand (with the command executed, we are actually starting the MySQL service). This type of command comes in handy when you want to have data from the exact beginning of the process, which is not possible otherwise.

In the second variant, we are running a query on an already-running MySQL service, so we can use the -p flag to capture data on the server process. This is handy if you want to capture data at the exact moment a job is running, for instance.

Preparing the Samples

After the initial capture, we will need to make the collected data “readable”. This is needed because it is stored in binary format by perf record. For this we will use:

SHELL> sudo perf script > perf.script

1	SHELL> sudo perf script > perf.script

It will read perf.data by default, which is the same default perf record uses for its output file. It can be overridden by using the -i flag and -o flag, respectively.

We will now be able to read the generated text file, as it will be in a human-readable form. However, when doing so, you will quickly realize why we need to aggregate all this data into a more intelligible form.

Generating the Flame Graphs

We can do the following in a one-liner, by piping the output of the first as input to the second. Since we didn’t add the FlameGraph git folder to our path, we will need to use full paths.

SHELL> ~/src/FlameGraph/stackcollapse-perf.pl perf.script | ~/src/FlameGraph/flamegraph.pl > flamegraph.svg

1	SHELL> ~/src/FlameGraph/stackcollapse-perf.pl perf.script \| ~/src/FlameGraph/flamegraph.pl > flamegraph.svg

We can now open the .svg file in any browser and start analyzing the information-rich graphs.

How Does it Look?

As an example, I will leave full commands, their outputs, and a screenshot of a flame graph generated by the process using data capture method #2. We will run an INSERT INTO … SELECT query to the database, so we can then analyze its execution.

SHELL> time sudo perf record -a -F 99 -g 
-p $(pgrep -x mysqld) 
-- mysql test -e "INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;"
Warning:
PID/TID switch overriding SYSTEM
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 1.909 MB perf.data (8214 samples) ]

real 1m24.366s
user 0m0.133s
sys 0m0.378s

SHELL> time sudo perf record -a -F 99 -g

-p $(pgrep -x mysqld)

-- mysql test -e "INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;"

Warning:

PID/TID switch overriding SYSTEM

[ perf record: Woken up 7 times to write data ]

[ perf record: Captured and wrote 1.909 MB perf.data (8214 samples) ]

real 1m24.366s

user 0m0.133s

sys 0m0.378s

SHELL> sudo perf script |  
~/src/FlameGraph/stackcollapse-perf.pl perf.script | 
~/src/FlameGraph/flamegraph.pl > mysql_select_into_flamegraph.svg

SHELL> sudo perf script |

~/src/FlameGraph/stackcollapse-perf.pl perf.script |

~/src/FlameGraph/flamegraph.pl > mysql_select_into_flamegraph.svg

The keen-eyed reader will notice we went one step further here and joined steps #2 and #3 via a pipe (|) to avoid writing to and reading from the perf.script output file. Additionally, there are time outputs so we can get an estimation on the amount of data the tool generates (~2Mb in 1min 25secs); this will, of course, vary depending on many factors, so take it with a pinch of salt, and test in your own environment.

The resulting flame graph is:

perf and Flame Graphs

One clear candidate for optimization is work around write_record: if we can make that function faster, there is a lot of potential for reducing overall execution time (squared in blue in the bottom left corner, we can see a total of ~60% of the samples were taken in this codepath). In the last section below we link to a blog post explaining more on how to interpret a Flame Graph, but for now, know you can mouse-over the function names and it will dynamically change the information shown at the bottom left corner. You may also visualize it better with the following guides in place:

flame graphs

Conclusion

For the Support team, we use this procedure in many cases where we need to have an in-depth view of what MySQL is executing, and for how long. This way, we can have a better insight into what operations are behind a specific workload and act accordingly. This procedure can be used either for optimizing or troubleshooting and is a very powerful tool in our tool belt! It’s known that humans are better at processing images rather than text, and this tool exploits that brilliantly, in my opinion.