November 29, 2014

Caching could be the last thing you want to do

I recently had a run-in with a very popular PHP ecommerce package which makes me want to voice a recurring mistake I see in how many web applications are architected.

What is that mistake?

The ecommerce package I was working with depended on caching.  Out of the box it couldn’t serve 10 pages/second unless I enabled some features which were designed to be “optional” (but clearly they weren’t).

I think with great tools like memcached it is easy to get carried away and use it as the mallet for every performance problem, but in many cases it should not be your first choice.  Here is why:

  • Caching might not work for all visitors - You look at a page, it loads fast.  But is this the same for every user?  Caching can sometimes be an optimization that makes the average user have a faster experience, but in reality you should be caring more that all users get a good experience (Peter explains why here, talking about six sigma).  In practice it can often be the same user that has all the cache misses, which can make this problem even worse.
  • Caching can reduce visibility – You look at the performance profile of what takes the most time for a page to load and start trying to apply optimization.  The problem is that the profile you are looking at may skew what you should really be optimizing.  The real need (thinking six sigma again) is to know what the miss path costs, but it is somewhat hidden.
  • Cache management is really hard – have you planned for cache stampeding, or many cache items being invalidated at the same time?

What alternative approach should be taken?

Caching should be seen more as a burden that many applications just can’t live without.  You don’t want that burden until you have exhausted all other easily reachable optimizations.

What other optimizations are possible?

Before implementing caching, here is a non-exhaustive checklist to run through:

  • Do you understand every execution plan of every query? If you don’t, set long_query_time=0 and use mk-query-digest to capture queries.  Run them through MySQL’s EXPLAIN command.
  • Do your queries SELECT *, only to use subset of columns?  Or do you extract many rows, only to use a subset? If so, you are extracting too much data, and (potentially) limiting further optimizations like covering indexes.
  • Do you have information about how many queries were required to generate each page? Or more specifically do you know that each one of those queries is required, and that none of those queries could potentially be eliminated or merged?

I believe this post can be summed up as “Optimization rarely decreases complexity. Avoid adding complexity by only optimizing what is necessary to meet your goals.”  – a quote from Justin’s slides on instrumentation-for-php.  In terms of future-proofing design, many applications are better off keeping it simple and (at least initially) refusing the temptation to try and solve some problems “like the big guys do”.

About Morgan Tocker

Morgan is a former Percona employee.
He was the Director of Training at Percona. He was formerly a Technical Instructor for MySQL and Sun Microsystems. He has also previously worked in the MySQL Support Team, and provided DRBD support.

Comments

  1. Twirrim says:

    If it’s the ecommerce solution I think you’re talking about (name like a colour), it’s truly horrible what it can do to a server. It does appear to go a long way out of its way to ensure from a code level that everything meets ACID compliance, which for ecommerce is understandably important.
    I spent a month or so, about a year ago now, around other jobs trying to create an optimised single-server hosting solution for it that my employer at the time could sell. I found some benefit, but not significant, using memcached. There were huge gains to be made in three main areas, and only one of those was the database:

    1) Opcode caching. It’s highly modularised code. That’s great in some ways, but you make other sacrifices along the way. I can’t remember the exact figures but I seem to recall most page accesses resulting in something like 100 separate php pages being called. They have an option in the app itself to compile (or words to that effect) but I still found standard opcode caching to be effective. Install and enable APC for very easy and large performance boosts.

    2) Switching to the OurDelta binaries saw a big improvement on the database end. I’m sure using InnoDB Plugin would cover most of those too. CentOS/RHES that I was dealing with don’t have MySQL 5.1.xx available natively, and I could have gone with the official binaries but decided to take advantage of all optimisations possible.

    3) Ditch Apache, use nginx & FastCGI. Hard to understate how much of a performance drag Apache can be, especially when the server is under a lot of load. On the top end system (16 cores, lots of RAM) I could near double the number of simultaneous users. The more visitors, the more memory Apache sucks up with new instances (and CPU time loading said instances).

    I never got around to experimenting with Varnish in front of it, that might also present some benefits if someone is experimenting / benchmarking.

    Most of the database stuff that seemed wrong or suboptimal about it weren’t ones that had any obvious low-hanging fruit, short of digging into the code and trying to overhaul it’s approach, something I don’t have the skill for (I’m a Sysadmin, not a PHP programmer / DBA.)

  2. @Twirrim – I should clarify and say I am not recommending disabling opcode caching. I don’t think you are implying that, but just so other readers understand as well ;)

    This problem is not necessarily isolated to one package, so I won’t confirm or deny the name. Someone else recently asked me for advice on a CMS that out of the box was doing 300+ queries to MySQL. On EC2 each query was taking up to a few ms, and they wanted to serve the total pages in ~50ms. I told them me helping tune MySQL queries was like asking “how do I make the speed of light faster?”.

    The underlying architecture that led to 300 queries is where the fingers should be pointed.

  3. carumba says:

    @Twirrim – similar experience here. We did experiment with varnish fronting things and did see a nice speedup. However some code changes are needed.

    Here’s a site we did some work for a hosting provider. http://nginx.openstream.ch/

    But you are right, out the box performance is abysmal.

  4. Evil Buck says:

    Far too often is caching the solution. This reminds me of an old colleague’s solution to bad architecture and code. More thought ended up going into the caching than fixing the code.

  5. Sheeri says:

    Morgan,

    I agree 100%. It’s all too tempting to do the easier option — using memcached is far less work than doing the methodical (and boring) work of optimizing each and every single query that is actually used.

    mk-query-digest is an awesome tool — I use it to do query reviews for clients, and it’s tedious, but very beneficial. But like anything, you have to do the work, and in the long run, shortcuts will just add to the problems.

  6. Andrey K. Korolyov says:

    Morgan, is it about Magento?

  7. Ulf Wendel says:

    I agree that caches are not always the best solution. However, if you use MySQL and PHP 5.3 and a simple client side query cache may cure your problem, check out the mysqlnd query cache plugin: http://www.slideshare.net/nixnutz/buildin-query-caching-for-all-php-mysql-extensionsapishttp://pecl.php.net/package/mysqlnd_qc . Its very much transparent, it works with all PHP MySQL APIs, it has flexible storage, it has slam defense built-in and it can cure those ecommerce applications that make you sometimes wonder how the software could win a price – http://www.slideshare.net/nixnutz/mysqlnd-query-cache-plugin-benchmark-report-4734189 .

  8. Ulf – I would consider the mysqlnd query cache a mitigation strategy.

    It’s good to see that it has stampeding protection, but it will still stuffer the reduced visibility / not working for all visitors problem :( If I inherit an application and I’m stuck in the same situation again, this will be one of the tools I try.

    .. but if I am in charge of designing architecture for a new application, caches will be still be disabled :)

  9. Andrey – I’m not telling :)

  10. peter says:

    Hi,

    It is worth to say you need to pay attention to the COULD keyword, and yes it is the bait title – there are many better candidates on what you could do LAST with your application. The point is caching is way to often overrated. I’d also suggest to:

    – It is good to use cache to improve capacity of the system, but not relay on it for response time. In too many cases uncached path is bad and visitors are suffering. There are cases like with Opcode Cache when you can get effectively 100% hit ratio so it is less of the problem.

    – Think about warmup. Whatever cache is it can be lost and you may need to start operating with empty cache. The behavior of application in such case often comes as surprise. Best test it.

    – Disable caching (memcache, mysql query cache etc) for development so you can see performance of uncached execution path. You need to test things with caching too to ensure it works well fo course.

  11. Brian says:

    Caching is about common sense not rules. Too often I see people looking at a framework or product as a black box without looking at the architecture of how it works. If I don’t understand how something making a decision about when and how to cache is difficult.

  12. I agree with Morgan’s point here (at least the underlying one). Know what your application is doing and don’t over complicate your system before you need to. If you are doing 300 queries on page load, at least know that you are and why are. We run with a toolbar in development that tells us how many queries and how long each took and the number of result rows. This helps us key an eye on what is going on during the page load.

  13. squid says:

    LoL, Magento

  14. Sean says:

    Magento is bloatware. That overly complex Zen framework, heavy OOP coding style, and of course too many queries…all means that it will never be fast. Parse times of 1sec per page on average hardware is ridiculous. You need to have a large hardware budget and/or money to hire Percona to do some great consulting.

Speak Your Mind

*