One of the problems I have with Memcache is this cache is passive, this means it only stores cached data. This means application using Memcache has to has to special logic to handle misses from the cache, being careful updating the cache – you may have multiple data modifications happening at the same time. Finally you have to pay with increased latency constructing the items expired from the cache, while they could have been refreshed in the background. I think all of these problems could be solved with concept of active cache
The idea with Active Cache is very simple – for any data retrieval operation cache would actually know how to construct the object, so you will never get a miss from the cache, unless there is an error. From existing tools this probably lies out best on registering the jobs with Gearman.
The updates of the data in this case should go through the same system so you can get serialization (or other logic) for your data updates.
You could also use the same functions updating the data when it expires. This could be exposed as explicit logic, something like expires in 300 seconds, start refresh in 200 seconds as well as automated.
The logic for automatic handling could be as follows – after the key has expired we can purge its value but keep it in cache with “expired” flag. If we can see for the same key we get a lot of requests when it is expired cache could decide to refresh such keys based on available bandwidth.
Another extension to common caching methods I’d like to see is having max_age specified on GET request. In many applications expiration is not data driven but rather request driven. Consider for example posting the blog comment on this blog. If you’re the user who posted the comment you have to see it instantly to avoid bad experience. At the same time other users can continue reading stale data – if they see comment appearing 10 seconds later they will not have any bad user experience.
Finally I think Active Cache could be very helpful handling write back scenarios. There are many cases when there is a lot of updates happening to the data – counters, last login, scores etc which do not really need to be reflected in the database instantly. If cache itself “knows” how to update the data you could define the policies on how frequently the data object needs to be synced to database.
I’d like to hear some feedback if you think such concept would be helpful for your applications and if you think there are existing tools and technologies which can be used to conveniently build things like this.