In this blog post, we will see how to use perf (a.k.a.: perf_events) together with Flame Graphs. They are used to generate a graphical representation of what functions are being called within our software of choice. Percona Server for MySQL is used here, but it can be extended to any software you can take a resolved stack trace from.
Before moving forward, a word of caution. As with any profiling tool, DON’T run this in production systems unless you know what you are doing.
Installing Packages Needed
For simplicity, I’ll use commands for CentOS 7, but things should be the same for Debian-based distros (
apt-get install linux-tools-$(uname -r) instead of the yum command is the only difference in the steps).
To install perf, simply issue:
SHELL> sudo yum install -y perf
To get Flame Graphs project:
SHELL> mkdir -p ~/src
SHELL> cd ~/src
SHELL> git clone https://github.com/brendangregg/FlameGraph
That’s it! We are good to go.
Flame Graphs are a way of visualizing data, so we need to have some samples we can base off of. There are three ways in which we can do this. (Note that we will use the -p flag to only capture data from our process of interest, but we can potentially capture data from all the running processes if needed.)
1- Capture for a set amount of time only (ten seconds here):
SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld) -- sleep 10
2- Capture until we send the interrupt signal (CTRL-C):
SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld)
3- Capture for the whole lifetime of the process:
SHELL> sudo perf record -a -F 99 -g -- /sbin/mysqld \
SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld) -- mysql -e "SELECT * FROM db.table"
We are forced to capture data from all processes in the first case of the third variant since it’s impossible to know the process ID (PID) number beforehand (with the command executed, we are actually starting the MySQL service). This type of command comes in handy when you want to have data from the exact beginning of the process, which is not possible otherwise.
In the second variant, we are running a query on an already-running MySQL service, so we can use the -p flag to capture data on the server process. This is handy if you want to capture data at the exact moment a job is running, for instance.
Preparing the Samples
After the initial capture, we will need to make the collected data “readable”. This is needed because it is stored in binary format by perf record. For this we will use:
SHELL> sudo perf script > perf.script
It will read perf.data by default, which is the same default perf record uses for its output file. It can be overridden by using the -i flag and -o flag, respectively.
We will now be able to read the generated text file, as it will be in a human-readable form. However, when doing so, you will quickly realize why we need to aggregate all this data into a more intelligible form.
Generating the Flame Graphs
We can do the following in a one-liner, by piping the output of the first as input to the second. Since we didn’t add the FlameGraph git folder to our path, we will need to use full paths.
SHELL> ~/src/FlameGraph/stackcollapse-perf.pl perf.script | ~/src/FlameGraph/flamegraph.pl > flamegraph.svg
We can now open the .svg file in any browser and start analyzing the information-rich graphs.
How Does it Look?
As an example, I will leave full commands, their outputs, and a screenshot of a flame graph generated by the process using data capture method #2. We will run an
INSERT INTO … SELECT query to the database, so we can then analyze its execution.
SHELL> time sudo perf record -a -F 99 -g \
-p $(pgrep -x mysqld) \
-- mysql test -e "INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;"
PID/TID switch overriding SYSTEM
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 1.909 MB perf.data (8214 samples) ]
SHELL> sudo perf script | \
~/src/FlameGraph/stackcollapse-perf.pl perf.script | \
~/src/FlameGraph/flamegraph.pl > mysql_select_into_flamegraph.svg
The keen-eyed reader will notice we went one step further here and joined steps #2 and #3 via a pipe (
|) to avoid writing to and reading from the perf.script output file. Additionally, there are time outputs so we can get an estimation on the amount of data the tool generates (~2Mb in 1min 25secs); this will, of course, vary depending on many factors, so take it with a pinch of salt, and test in your own environment.
The resulting flame graph is:
One clear candidate for optimization is work around write_record: if we can make that function faster, there is a lot of potential for reducing overall execution time (squared in blue in the bottom left corner, we can see a total of ~60% of the samples were taken in this codepath). In the last section below we link to a blog post explaining more on how to interpret a Flame Graph, but for now, know you can mouse-over the function names and it will dynamically change the information shown at the bottom left corner. You may also visualize it better with the following guides in place:
For the Support team, we use this procedure in many cases where we need to have an in-depth view of what MySQL is executing, and for how long. This way, we can have a better insight into what operations are behind a specific workload and act accordingly. This procedure can be used either for optimizing or troubleshooting and is a very powerful tool in our tool belt! It’s known that humans are better at processing images rather than text, and this tool exploits that brilliantly, in my opinion.
Interpreting Flame Graphs (scroll down to the “Flame Graph Interpretation” section)
Flame Graphs 201, a great webinar by Marcos, if you want to dig deeper into this
Of course, Brendan Gregg (the mastermind behind the Flame Graph project) has even more information on this