MySQL Query CachePeter Zaitsev
MySQL has a great feature called “Query Cache” which is quite helpful for MySQL Performance optimization tasks but there are number of things you need to know.
First let me clarify what MySQL Query Cache is – I’ve seen number of people being confused, thinking MySQL Query Cache is the same as Oracle Query Cache – meaning cache where execution plans are cached. MySQL Query Cache is not. It does not cache the plan but full result sets. This means it is much more efficient as query which required processing millions of rows now can be instantly summoned from query cache. It also means query has to be exactly the same and deterministic, so hit rate would generally be less. In any case it is completely different.
Query cache is great for certain applications, typically simple applications deployed on limited scale or applications dealing with small data sets. For example I’m using Query Cache on server which runs this blog. Updates are rather rare so per-table granularity is not the problem, I have only one server and number of queries is small so cache duplication is not the problem. Finally I do not want to hack wordpress to support eaccelerator cache or memcached. Well honestly speaking if performance would be problem I should have started with full page caching rather than MySQL level caching but it is other story.
Lets talk a bit about features and limitations of Query Cache:
Transparent Caching – Caching is fully transparent to the application, and what is very important it does not change semantics of the queries – you always get actual query results. Really there are some chevats – if you’re not using query_cache_wlock_invalidate=ON locking table for write would not invalidate query cache so you can get results even
if table is locked and is being prepared to be updated. So if you’re using query cache in default configuration you can’t assume locking table for write will mean no one will be able to read it – results still can come from query cache unless you enable query_cache_wlock_invalidate=ON.
Caching full queries only – Meaning it does not work for subselects, inline views, parts of the UNION. This is also common missunderstanding.
Works on packet level – This is one of the reason for previous item. Query cache catches network packets as they sent from client to the server, which means it can serve responses very fast doing no extra conversion or processing.
Works before parsing – One more reason for high performance is Query Cache performs query lookup in the cache before query parsing, so if result is served from query cache, query parsing step is saved.
Queries must be absolutely the same As no parsing is done before lookup queries are not normalized (would require parsing) before cache lookup, so they have to match byte by byte for cache hit to happen. This means if you would place dynamic comments in the query, have extra space or use different case – these would be different queries for query cache.
Only SELECT queries are cached SHOW commands or stored procedure calls are not, even if stored procedure would simply preform select to retrieve data from table.
Avoid comment (and space) in the start of the query – Query Cache does simple optimization to check if query can be cached. As I mentioned only SELECT queries are cached – so it looks at first letter of the query and if it is “S” it proceeds with query lookup in cache if not – skips it.
Does not support prepared statements and cursors Query Cache works with query text and want full result set at once. In prepared statements there is query with placeholders and additional parameter values which would need extra care – it is not implemented. Cursors get data in chunks so it is even harder to implement.
Might not work with transactions – Different transactions may see different states of the database, depending on the updates they have performed and even depending on snapshot they are working on. If you’re using statements outside of transaction you have best chance for them to be cached.
Query must be deterministic – Query might provide same result no matter how many times it is run, if data remains the same. So if query works with current data, uses non-deterministic functions such as UUID(), RAND(), CONNECTION_ID() etc it will not be cached.
Table level granularity in invalidation – If table gets modification all queries derived from this table are invalidated at once. Most of them quite likely would not have change their result set but MySQL has no way to identify which one of them would so it gets rid of all of them. This is one of the main features which limits query cache effectiveness – if you have high write application such as forums, query cache efficiency might be pretty low due to this. There is also way to set minimal TTL or anything like it which is allowed by other caching systems. Also note – all queries are removed from cache on table modifications – if there are a lot of queries being cached this might reduce update speed a bit.
Fragmentation over time – Over time Query Cache might get fragmented, which reduces performance. This can be seen as large value of Qcache_free_blocks relatively to Qcache_free_memory. FLUSH QUERY CACHE command can be used for query cache defragmentation but it may block query cache for rather long time for large query caches, which might be unsuitable for online applications.
Limited amount of usable memory – Queries are constantly being invalidated from query cache by table updates, this means number of queries in cache and memory used can’t grow forever even if your have very large amount of different queries being run. Of course in some cases you have tables which are never modified which would flood query cahe but it unusual. So you might want to set query cache to certain value and watch Qcache_free_memory and Qcache_lowmem_prunes – If you’re not getting much of lowmem_prunes and free_memory stays high you can reduce query_cache appropriately. Otherwise you might wish to increase it and see if efficiency increases.
Demand operating mode If you just enable qury cache it will operate in “Cache everything” mode. In certain caches you might want to cache only some of the queries – in this case you can set query_cache_type to “DEMAND” and use only SQL_CACHE hint for queries which you want to have cached – such as SELECT SQL_CACHE col from foo where id=5. If you run in default mode you can also use SQL_NO_CACHE to block caching for certain queries, which you know do not need to be cached.
Counting query cache efficiency There are few ways you can look at query_cache efficiency. First looking at number of your selects – Com_select and see how many of them are cached. Query Cache efficiency would be Qcache_hits/(Com_select+Qcache_hits). As you can see we have to add Qcache_hits to Com_select to get total number of queries as if query cache hit happens Com_select is not incremented. But if you have just 20% Cache hit rate does it mean it is not worth it ? Not really it depends on which queries are cached, as well as overhead query cache provides. One portion of query cache overhead is of course inserts so you can see how much of inserted queries are used: Qcache_hits/Qcache_inserts Other portion of overhead comes from modification statements which you can calculate by (Com_insert+Com_delete+Com_update+Com_replace)/Qcache_hits
. These are some numbers you can play with but it is hard to tell what is good or bad as a lot depends on statement complexity as well as how much work is saved by query cache.
I would just mention couple of points – as protection from one query wiping your all query cache option query_cache_limit was implemented which limits result set which can be stored in query cache. If you need larger queries to be cached you might increase it, if you most important queries are smaller you can decrease it. The other one is Qcache_lowmem_prunes – This one is used to identify if you have enough memory for query cache. Note however due to fragmentation lowmem_prunes can be triggered even if there is some free space, just badly fragmented.
Looking at performance I’ve seen query cache offering about double performance for simple queries with select done by primary key, obviously there is no upper boundary – Very complex queries producing small result set will be offering best speed up.
So when it is good idea to use query cache ?
Third party application – You can’t change how it works with MySQL to add caching but you can enable query cache so it works faster.
Low load applications – If you’re building application which is not designed for extreme load, like many personal application query cache might be all you need. Especially if it is mostly read only scenario.
Why Look for alternatives ?
There are few reasons why Query Cache might be not cache for your application:
It caches queries Application objects might need several queries to compose so it is efficient to cache whole objects rather than individual queries.
No control on invalidation Table granularity is often too bad. With other caches you may implement version based or timeout based invalidation which can offer much better hit ratio for certain application.
It is not that fast Query Cache is fast compared to running the queries but it is still not as fast as specially designed systems such as memcached or local shared memory.
It can’t retrieve multiple objects at the same time You have to query cache object by object which adds latency, there is no way you can request all objects you need to be retrieved at the same time (again memcached has it)
It is not distributed If you have 10 slaves and use query cache on all of them cache content will likely be the same, so you have multiple copies of the same data in cache effectively wasting memory. Distirbuted caching systems can effectively use memory on multiple systems so there is no duplication.
Memcached is probably the most popular distributed caching system and it works great. I should write an article comparing performance of various caching systems some time.