Should we give a MySQL Query Cache a second chance ?Peter Zaitsev
Over last few years I’ve been suggesting more people to disable Query Cache than to enable it. It can cause contention problems as well as stalls and due to coarse invalidation is not as efficient as it could be. These are however mostly due to neglect Query Cache received over almost 10 years, with very little changes done to the core of original implementation which appeared in MySQL 4.0 Query Cache was designed to work with single core systems and relatively small memory amounts, both of which are things of the past.
However, if you think about Core idea of the MySQL Query Cache – it is great. The transparent cache which does not require any extra handling from an application side and which just makes things faster without adding complexity to the application (as memcached does for example). Query Cache is also one of the few cache implementations which I’d call an Active Cache meaning it takes care of the misses itself, without you have to do anything about them. This property is different from completely transparent cache and you’ll see why below.
So what I think could be done with MySQL Query Cache so it once again is appealing to improve performance for MySQL Applications ?
Make it Lockless Can we re-implement MySQL Query Cache so it is lockless ? I believe it may be possible doe to rather simple nature of algorithms involved, at least we can use same “cheat” as MySQL 5.5 uses for Innodb Buffer Pool and have multiple MySQL Query Cache instances hashed by query hash. It would help to reduce complexity dramatically.
Fast Invalidation Right now when you update the table all queries have to be invalidated from the query cache… which can take a lot of time if you have millions of queries cache for given table. Instead we could track versions for tables (increment each time table is updated) and check whenever version matches for all tables on which query depends on. If not you need to re-generate the query result. Removing of stale queries in this case can be done through some sort of garbage collection.
Support for Delayed Invalidation Many MySQL application do not need absolutely up to date data, something which is couple of seconds or sometimes minutes old can do just as well. This especially applies to reads issued to Slaves – because these are potentially lagging we can’t expect to get absolutely up to date data anyway. I’d like to see support for something like SQL_CACHE_TTL=60 to specify we’re OK to get query result which is up to 60 stable. We can also get query_cache_ttl as session variable, which is very helpful for connections opened to the slaves, which would allow to read old data as application is designed to deal with it anyway.
Smart Query Matching Query Cache requires queries to be exactly the same which does not work well with queries containing dynamic comments as well as simply multiple instances by the same query typed in by different developers. Supporting different comments is especially important as many application put extra information in the comments to assist diagnosing the application, such as where this query comes from or what application user triggered such query. Query Cache matching can be done a lot smarter. At Percona Server we’ve done the first step by allowing you to strip comments when checking query cache for match.
Protection from Multiple Misses So you have item invalidated from query cache and there are multiple connection requesting the same item. What MySQL will do ? It will allow all of them to execute concurrently and when one of them will produce result which can be cached and used by future queries. What would be smarter thing to do is to detect the result for exactly same query is already being generated, wait for completion and when simply use it.
More Granular Invalidation Can we get something less coarse than per table invalidation ? It is tricky but I believe it can be done for rather large set of applications. Many applications in MySQL have very large number of queries doing updates through primary key and use primary key for selects. I believe in such cases we can track it in a more granular fashion. This is one I’m most unsure about though as MySQL Query Cache is most useful for complex expensive queries, for which tracking exact rows used from query syntax may be hard.
Query Cache Storage Engines MySQL Query Cache now stores queries in local memory, which is great however memory amount might be limited especially in constrained environments as EC2. We could implement API which would allow one to use local memory, Memcache or for example local flash storage for storage.
Compression You can see it as a property of Storage Engine but let me put it separately as well. I believe Query Storage would benefit a lot from fast compression, such as QuickLZ. Results sets for most queries is highly compressible and light compression can compress results at faster than wire speed for 1Gb networks. Plus if we can get MySQL protocol to support such light compression in addition to zlib we could stream compressed results directly from storage and only de-compress it on the client.
Delayed Updates Delayed Updates is complement to delayed invalidation. In many applications you have updates which are kind of not important you just need them to propagate sometime. For example if I’m maintaining counter I may not want changing that to have complete invalidation. Having the option to have some updates not to invalidate query cache can be powerful, though sometimes dangerous feature. I’m not quire sure if it is worth doing or if delayed invalidation is enough.
Improve Manageability What is in the MySQL Query Cache ? How many hits did we get for this query item ? Right now none of such information is available which makes it hard to understand how exactly MySQL Query Cache is working in many cases. This however can be improved and data about Query Cache content and operation can be presented as some INFORMATION_SCHEMA tables.
Self Refresh MySQL Query Cache causes not uniform performance as you get very fast responses for hits but you might need to take significant hit for misses. This is especially the problem for very expensive queries. The good thing about MySQL Query Cache is it knows exactly what is needed to refresh its content and so it can do it automatically, preventing some queries with SQL_CACHE_TTL from expiring as well as re-populating the cache after table update invalidated a lot of items. Frankly if you improve query cache manageability and make stats available this can even be done by external script which can have application-specific tunings to prioritize which queries are more important to refresh for given applications.
As a Summary I believe there is a lot of things which can be done to Query Cache to make it cool again. The question is whenever anyone from “MySQL Market Players” – Oracle, MariaDB, Drizzle, Percona Server, will allocate resources to implement at least some portion of items from this list.
I also challenge you to post your own query cache wishes and ideas here, as far as I’m concerned this list is far from being complete.