November 26, 2014

What do we optimize with mk-query-digest ?

When we’re looking at mk-query-digest report we typically look at the Queries causing the most impact (sum of the query execution times) as well as queries having some longest samples. Why are we looking at these ?

Queries with highest Impact are important because looking at these queries and optimizing them typically helps to improve system capacity, which in the end often improves response time of the system for variety of queries. Also queries causing highest impact often correspond to either frequent or slowest transactions in the system which makes these queries important to look at.

Slowest Queries correspond to bad response times. If there is 30 sec query this means somebody had to wait for 30 seconds which is typically not acceptable for interactive queries. It is best to look at 95 percentile “time” rather than slowest time for the queries as there are often exceptionally high time which you can see which are unrelated to query itself – general system overload, row level locks, table locks all could cause query response time to be unreasonable. If Slowest query is not interactive query, for example part of a batch job it is often fine, assuming it is not causing system overload, interactive queries, which have real users waiting for them are often high priority to be optimized. It helps if batch jobs use different MySQL user compared to the web site itself as it makes it easier to remove such queries from analyzes.

What is Important to mention neither of these two query types really guarantees improvement for application performance (and even more so transaction we’re trying to optimize), it typically does so as transactions you’re trying to optimize typically will manifest itself in one of these query sets.

If you are looking for improving response time of particular transaction/user interaction you really have to look at it, rather than aggregating data for the all server – understanding what queries are being ran (together with other interactions – memcache, remote sites etc) is a key to understanding what causes performance issues.

If you have slow queries from interactive application it is true somebody is suffering (though it is unclear how critical this suffering is) – it is however perfectly possible to get bad performance with MySQL responsible for large portion of response time without any slow queries. We’ve seen pages which take 10.000, 20.000 and even more queries to generate. In such cases even with queries taking 1 ms in average you’re looking at 10,20 or more seconds to respond.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. dalin says:

    And if improving user interaction response times in a web site/application is what you are wanting, then you need to know that only ~15% of time is spent generating the HTML page and getting it to the client. The rest is things like transfering images, CSS, and JavaScript. If you haven’t yet, get YSlow and Google Page Speed and analyze your front-end performance.

  2. peter says:

    Dalin,

    Right. You’ve got to check your pages with YSlow as well. People often confront me with these 15% numbers quoting Steve Sounders and as with any general number it is oversimplification. The number applies to the cases when backend is well optimized already. If you have HTML generated in 100ms it is quite likely you’ve got to invest in Front End optimization first. However if you have backend performing very badly – think 30 seconds to load the page, or even time out all together the distribution is going to be completely broken.

    The “right” approach is of course to start from the top and go to the bottom – if HTML is leading contributer to client response time you optimize it, else focus on other things.

Speak Your Mind

*