I feel like I’ve been seeing this a lot lately.
occasionally, seemingly innocuous selects take unacceptably long.
Or
Over the past few weeks, we’ve been having bizarre outages during which everything seems to grind to a halt… and then fixes itself within 5 minutes. We’ve got plenty of memory, we’re not running into swap, and we can’t find any queries that would seem to trigger outages: just tons of simple session requests all hung up for no obvious reason.
Problems like this are always hard to debug. If it happens twice a week for 5 minutes at a time, your chance of getting someone logged onto the machine to watch it in action are pretty slim. And of course, when they do look at it, they see nothing wrong on the surface; it takes some very clever, very fast work with OS-level debugging and tracing utilities to really prove what’s happening.
The two cases mentioned above were caused by scalability/concurrency/locking problems in the query cache. (One was on Windows, and we fixed it by guessing. The other was on GNU/Linux and Maciek isolated it with his elite skillz.) So if you’re having random lockups, you might try disabling the query cache, and see if that solves it.
Hopefully this blog post will show up on Google and save someone time and money!