For months when speaking with customers I have been positioning Percona Cloud Tools (PCT) as a valuable tool for the DBA/Developer/SysAdmin but only recently have I truly been able to harness the data and make a technical recommendation to a customer that I feel would have been very difficult to accomplish otherwise.
Let me provide some background: I was tasked with performing a Performance Audit for one of our customers (Performance Audits are extremely popular as they allow you to have a MySQL Expert confirm or reveal challenges within your MySQL environment and make your database run faster!) and as part of our conversation we discussed and agreed to install Percona Cloud Tools. We let the site run for a few days, and then I started my audit. What I noticed was that at regular intervals there was often a CPU spike, along with a corresponding drop in Queries Per Second (QPS), but that lasted only for a few seconds. We decided that further investigation was warranted as the customer was concerned the spikes impacted their users’ experience with the application.
Here are the tasks that Percona Cloud Tools made easy while I worked to identify the source of the CPU spike and QPS drop:
Once we were able to identify the problematic queries then the rest was routine query optimization – 10 minutes work using Percona Cloud Tools for what might have been an hour using traditional methods!
For those of you wondering how else this can be done, assuming you detected the CPU spike / QPS drop (perhaps you are using Graphite or other tool that can deliver per-second resolution) then you’d also need to be capturing the slow query log at a good enough resolution level (I prefer long_query_time=0 to just get it all), and then be adept at leveraging pt-query-digest with –since and –until options to narrow down your range of queries. The significant drawback to this approach is that each time you want to tweak your time range you probably need to stream through a fairly large slow log file multiple times which can be both CPU and disk intensive operations, which means it can take some time (probably minutes, maybe hours) depending on the size of your log file. Certainly a workable approach but nowhere near as quick as reloading a page in your browser 🙂
Resources
RELATED POSTS