High-load problems? Investigate them with ‘pt-query-digest’Fernando Ipar
I had the chance to work on an interesting case last week, and I thought I’d share what I think is a little known goodie from Percona Toolkit for MySQL called pt-query-digest.
One customer was suffering from periods of high load on their database server, leading to degraded application performance, and sometimes even short moments of downtime. As usual with these cases, we set up pt-stalk to trigger a capture if Threads_running went above a known ‘good’ value for this workload, so we would have lots of diagnostics information to review even if we couldn’t be logged in to the server at the time the problem happened.
As I began reviewing available captures, I was able to correlate most load peaks with temporary increases in user activity, so it became a matter of figuring out if the server was hitting its capacity, or if there was another limiting factor in the workload that made it scale poorly. This is when the ‘SHOW FULL PROCESSLIST’ captures generated by pt-stalk come in handy, but if you’ve used them, you know it can be a daunting task to review them in servers with thousands of connections.
Enter pt-query-digest. The tool supports its share of format parsers, but the one that saved the day here is also the simplest: the ‘raw’ log parser. If you specify –format rawlog, then the tool expects the input to have one query per line, and it will do it’s aggregation job on that. Granted, this means there is no run time data available, but at least you get a nice ranking of queries sorted by the number of times they appear in processlist.
In this case, I extracted the first snapshot from each *-processlist capture (under normal circumstances, this file will have –run-time snapshots separated by timestamps) and ran that by pt-query-digest –type rawlog –report-format profile, to get just the ranking of queries. To my luck, one query stood out right away, always appearing many times more than any other one during peaks. Rewriting that query (actually the customer removed it altogether, by merging it with another query that ended up having a much better execution plan, but that is beyond the scope of this post!) made the problem go away. The server took the next round of peak traffic time without issues.
If you regularly work with MySQL databases, getting familiar with Percona Toolkit for MySQL will let you do more work, in a safer way, and in less time. If you need to get started, there are good recorded webinars on the subject, and the upcoming Percona Live conference’s tutorials and sessions will include a wealth on information on everything MySQL-related, including Percona Toolkit.