Write contentions on the query cache

PREVIOUS POST
NEXT POST

While doing a performance audit for a customer a few weeks ago, I tried to improve the response time of their top slow query according to pt-query-digest‘s report. This query was run very frequently and had very unstable performance: during the time data was collected, response time varied from 50µs to 1s.

When I ran the query myself (a two-table join with a WHERE condition, the whole dataset was in memory), I always got a consistent response time (about 160ms). Of course, I wanted to know more about how MySQL executes this query. So I used commands you’re probably familiar with: EXPLAIN, SHOW PROFILE, SHOW STATUS LIKE 'Handler%'.
EXPLAIN and Handler counters only confirmed that the execution plan seemed reasonable and that fields were correctly indexed.

With SHOW PROFILE, I saw that most of the time was spent sending the result set to the client, which was not surprising as the result set was around 30,000 rows:

So the unstable response times did not come from a bad execution plan, but rather from contention/excessive waiting somewhere in the server. A good candidate for contention issues was the query cache as it was enabled. Contention on the query cache mutex when checking if the result set can be served from the is quite common.

It reminded me that I executed the SHOW PROFILE and SHOW STATUS LIKE 'Handler%' with the SQL_NO_CACHE hint. Will the output be different without SQL_NO_CACHE?

Indeed it was. If the Handler counters were the same, I got around 200 lines of output from SHOW PROFILE instead of the 15 lines above. Particularly interesting, a sequence was repeated on and on:

Total response time was still around 160ms, but with so many waits for the query cache locks, it was easy to imagine that this query could take ages as soon as there is competition for the lock.

Of course, the question was: why did MySQL need so many accesses to the query cache lock?

The answer is in the way the query cache works. Simply stated, the server wants to lock the query cache both when checking if a result is in the cache and when writing a result set into the cache. When writing, locking can occur several times: the server sends results to the cache before computing the entire result set (so the total size of the result set is not known), so the cache has to assign memory block by block. If a block is full and the server keeps sending rows, a new block must be assigned, requiring the cache to be locked!

More specifically, the size of the blocks is set by the value of the query_cache_min_res_unit variable (4KB by default). For the query I was working on, each row was about 6 bytes, so the result set was around 180KB. It means that around 50 accesses to the query cache were needed to cache the entire result set. This is what SHOW PROFILE revealed.

A solution to the problem could be raise the value of query_cache_min_res_unit. But this could increase the fragmentation of the cache and degrade performance for other queries. The best solution was simply to turn off the query cache.

Conclusions:

  • Once again, the query cache revealed bad scalability, but it was in an unusual manner.
  • When you have unstable response times for a query, you should suspect either some change in data access pattern (reads on disk vs reads in memory, execution plan instability) or contention somewhere while the query is executing.
  • You should not forget that improving queries also means improving the stability of the response time, and not only having low response times sometimes. In this case, a stable 160ms response time was preferred to a response time ranging from 50µs to 1s.

And a final note for sharp-eyed readers: why do we have 2 lines showing waits on the query cache lock instead of one in the output of SHOW PROFILE? This is specific to Percona Server 5.5 and used only for debugging purposes to have a finer granularity.

PREVIOUS POST
NEXT POST

Comments

  1. says

    The “Sending data” phase in the Query Profile does not mean it’s actually sending data. A better description for that phase is “executing query plan”. Some queries will spend a lot of time in the “Sending data” phase but don’t actually send any data.

  2. Stephane Combaudon says

    Daniël,

    Sure “Sending data” can have different meanings and is used in differents steps of execution. If you run SHOW PROFILE with many non-trivial queries, you will see that “Sending data” appears several times in the output whereas sending the result set is only one step in the execution plan.

  3. aftab says

    >The best solution was simply to turn off the query cache
    Alternatively you could have use SQL_NO_CACHE, would that help? Because turning the query_cache globally might effect other sqls? or perhaps disable query cache in the session?

  4. Stephane Combaudon says

    aftab,

    Yes SQL_NO_CACHE means that the query won’t be cached but doesn’t avoid other queries to be cached. It can be a good solution if caching a specific query degrades its response time whereas caching is good for other queries.
    In this specific case (that was simplified for this blog post), the query cache was not beneficial for any of the most used queries, so it was easier to disable it completely.

  5. Susane says

    Hi Stephane,

    Great post! Thank you!
    I work at a web hosting, and we have a server that have lots of databases, and constantly many queries on different databases appear with state “Waiting on query cache mutex”.I tried disable query cache, but this state continued appearing.What else I can do to eliminate, or decrease this state on my server?

  6. Stephane Combaudon says

    Hi Susane,

    How did you disable the query cache?
    SET GLOBAL query_cache_size = 0;
    SET GLOBAL query_cache_type = OFF;

    Or something else?

  7. Stephane Combaudon says

    Susane,

    As you’re using Percona Server, your best option would be to add query_cache_type = 0 in my.cnf and restart the server.

Leave a Reply

Your email address will not be published. Required fields are marked *