Downloads

Blog

Active Cache for MySQL

January 10, 2010

Author

Peter Zaitsev

Insight for Developers

MySQL

Share this Post:

One of the problems I have with Memcache is this cache is passive, this means it only stores cached data. This means application using Memcache has to has to special logic to handle misses from the cache, being careful updating the cache – you may have multiple data modifications happening at the same time. Finally you have to pay with increased latency constructing the items expired from the cache, while they could have been refreshed in the background. I think all of these problems could be solved with concept of active cache

The idea with Active Cache is very simple – for any data retrieval operation cache would actually know how to construct the object, so you will never get a miss from the cache, unless there is an error. From existing tools this probably lies out best on registering the jobs with Gearman.

The updates of the data in this case should go through the same system so you can get serialization (or other logic) for your data updates.

You could also use the same functions updating the data when it expires. This could be exposed as explicit logic, something like expires in 300 seconds, start refresh in 200 seconds as well as automated.

The logic for automatic handling could be as follows – after the key has expired we can purge its value but keep it in cache with “expired” flag. If we can see for the same key we get a lot of requests when it is expired cache could decide to refresh such keys based on available bandwidth.

Another extension to common caching methods I’d like to see is having max_age specified on GET request. In many applications expiration is not data driven but rather request driven. Consider for example posting the blog comment on this blog. If you’re the user who posted the comment you have to see it instantly to avoid bad experience. At the same time other users can continue reading stale data – if they see comment appearing 10 seconds later they will not have any bad user experience.

Finally I think Active Cache could be very helpful handling write back scenarios. There are many cases when there is a lot of updates happening to the data – counters, last login, scores etc which do not really need to be reflected in the database instantly. If cache itself “knows” how to update the data you could define the policies on how frequently the data object needs to be synced to database.

I’d like to hear some feedback if you think such concept would be helpful for your applications and if you think there are existing tools and technologies which can be used to conveniently build things like this.

0 0 votes

Article Rating

11 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Patrick Casey

16 years ago

Sounds like you’ve just re-invented a three tier application with the “active cache” taking the role of the traditional application server and/or container. If you push enough logic into your cache so that it can properly handle object construction, generation, etc, its either redundant with your application server or replacing its persistence layer.

Peter van Dijk

16 years ago

Just a thought, but having a more active cache infrastructure would definitely be helpful if you wanted to, for example:
– SELECT * FROM table WHERE x; // put the result set into the cache per row
– SELECT specific_field FROM table WHERE x; // could then automagically pull the specific field from the data cached by the previous query

It’s probably not for everyone, but it was just a random idea of the sort of things that might be possible for a system that’s “loosely aware” of the underlying data structures being used.

I think in a general sense, i still prefer to go for the “other approach” of storing cooked data in memcached if possible, rather than raw query data. It doesnt really make sense in all circumstances though.
In one specific case, in our system there is a specific object which is frequently used and takes 5 queries to load. Despite we could cache the 5 queries individually, we’d rather just cache the object as it’s capable of invalidating itself from the cache, because it saves us a lot of overhead.

Have you considered that your active cache suggestion could potentially be implemented by somehow extending the mysql client infrastructure?
ie. mysql_query(“SELECT * FROM X”, ….., cache=true);
That way you could provide seamless integration for those people that want a simple but highly effective caching solution.

Author

Peter Zaitsev

16 years ago

Patrick,

History tend to repeat itself with slight variations. Modern shift to Web Based technologies is conceptually very similar to use of terminals to mainframe in 60s-70s. NoSQL existed in form of non relational databases, there are many such examples.

In general a lot of things done for years in J2EE are coming to LAMP in modified form, tuned for a lot more adhoc form.

Designing three tier application is one thing, evolving simply PHP (for example) application by adding memcache, gearman or any other components to take care of important issues in a simple way is completely different.

http://krow.livejournal.com/

16 years ago

Hi!

You are describing what a number of people have been looking at creating 🙂

Not a crazy idea at all, the problem is picking the correct interface (aka… how will someone use this).

Cheers,
-Brian

Darius Jahandarie

16 years ago

Regarding Patrick’s comment, I think it is important to realize what memcached has actually brought to the table. Memcached is merely a utility for a distributed key-value in-memory store.

One could argue that memcached is “replacing” the data-tier of your architecture, but in reality it is only evolving it to handle certain cases differently (e.x., hot data -> memcached, cold data -> MySQL).

Going on to what Peter is saying here, I believe it is a middle-ware abstraction of a lot of redundant communications that could happen between the logic-tier and data-tier of your architecture.

If created properly, the logic would still happen in the logic-tier, and the data would still be handled in the data-tier, it is just that the data-tier would be aware of what type of data it is handling and therefore be able to store and return it efficiently.

Naturally, any idea taken to any extreme may result in problems, but I believe what Peter is suggesting could have some serious chances if the core idea is realized to its fullest.

Darius

Eric Bergen

16 years ago

The problem with moving the object creation logic into the cache layer is you end up maintaining two separate codebases, one for the application and the other in the active cache. I don’t think it buys you much other than eliminating some race conditions. Two separate codebases probably won’t be a big deal if you maintain a one to one mapping between memcache key and database row but if memcache is stored whole objects which are normalized in the database the logic in the caching layer can be quite complex trying to translate between the two for a write through cache.

Peter van Dijk

16 years ago

Eric – Even though it’d be heaps of fun to handling object creation at the cache layer, like you said, i’m sure it’d be an absolute nightmare to implement and maintain. By implementing it at the application layer, we do have a race condition which would result in the cache can become inconsistent, however we’ve always mitigated the effect of this by mandating that the objects in question can only be saved if loaded directly from the database rather than the cache. Not a solution for everyone i’m sure, but it works great for us.

Back to the original topic, I think that being able to specify what gets cached, and how it’s cached, is fairly important – there’s no way that mysql (or by extension any cache) can automatically understand what the application is trying to do. This is (sort of) the fundamental reason why things like memcached exist, and also the reason why the query caches in MySql dont work particularly well. (dont shoot me for that last comment, i’m sure they work just fine for some people)

If you could push some caching logic down closer to the data, then it might be useful in a number of circumstances, but at the end of the day you still need to be able to control what goes into your cache and what doesnt – which is really only something that you’d want the application to handle unless you’re supremely confident in your ability to write caching software at the database level.

The hard part is finding a balance between the two – retaining application control, and proximity to the data.
– Memcached sits on the application side of the fence.
– MySql has it’s own caches which sit completely on the other side.
– The ‘Active Cache’ would sit somewhere between the two – the question is where and how.

Author

Peter Zaitsev

16 years ago

Eric,

I think Active cache is kind of merged API and Cache layer, so the point is your application should be only calling that code. I liked the Gearman concept here as it allows to glue multiple applications together – think for some objects constructed by Java code while others by Perl 🙂

alex

16 years ago

i’m looking for exactly this! 🙂

i’m wondering, why there’s no extended mysql-query-cache, which allows you to use cached results, even the table was alreasy updated. i think that would be much faster than to realize this via php and memcached.

Eric Bergen

16 years ago

I did some work on a query cache patch that could specify a ttl for specific queries instead of flushing the cache when the table was updated. After experimenting with it for a while it became obvious that memcache and caching the results outside of mysql was the way to go.

Daniel

16 years ago

I think you are running into the same problem that I’ve been wondering about.

I like mysql/RDBMS because if the transaction handling, indexing capability, and “information density”, which is a 4 byte integer takes up 4 bytes, and is not converted to its string representation.

On the other hand, I like memcache, redis, tokyotyrant, and other such key/value stores, because of the performance and distribution (the document storage format)

I think some of the core of the problem you are talking about depends on how you look at the data in the database.

If you look at the database as an “interesting” form of data storage, then the objects (or tables…) are probably already in the format you are talking about.

If you look at the database as a sets of data (foreign keys, 3rd/boyce codd normal form, etc..), then your idea looks like it might make a good model front end, especially if you can solve the replication delay issue.