I had an interesting tuning case few days ago. The system serving high traffic using Innodb tables would be stalling every so often causing even very simple queries both reads and writes taking long time to complete, with progress almost paused (dropping from thousands to tens of queries per second).
On the surface the problem looked simple – in the processlist every so often you would see a lot of queries, mostly selects taking 10+ seconds while at the same time there was no significant iowait, neither high CPU usage. Closer examination showed there were hundreds of queries stuck in the innodb queue, with innodb_thread_concurrency set to 8
Happily enough innodb_thread_concurrency is the variable which can be set online so it is easy to try a few different values and see what works best. In this case we decided to try removing restriction on runnable queries all together by setting it to 0.
The change made things different. Now there was no more trivial selects taking a lot of time but there were a lot of write queries taking a lot of time being… many of which SHOW INNODB STATUS showed as waiting on famous Innodb AUTO-INC lock.
That would be an easy excuse to blame everything on this lock, suggest upgrading to MySQL 5.1 and call it a day. Learning more about the system I however found out there are only some 20 inserts per second going to such table (which should not be the big deal even if they are serialized) plus it is not only inserts to table with auto increment were stalling but also inserts to the tables which did not have auto-increment columns and general updates. The stall was not caused by row level locks too.
Taking a closer look at Innodb flush activity showed this is checkpoint related problem. Unfortunately Innodb fuzzy checkpointing algorithm is not very smart causing stalls waiting for large portion of buffer pool is flushed before queries can proceed. As this is a common problem we have the patch for it.
What is very interesting in this case how true issues can really hide because of layers of false problems. The queries we first saw in the processlist were SELECTs but this is just because most queries in the application are selects and as soon as the thread concurrency slots were busy it is mostly selects which were stuck waiting.
The next symptom was AUTO-INC lock which again was a false one – because single insert slows down a lot of inserts pile up waiting on the AUTO-INC lock. Though this is not the main problem the main problem lives below – why the original insert is taking so long letting hundreds of AUTO-INC insert to pile up.
This was also very interesting case about optimal innodb_thread_concurrency. I continue reading (and hearing at the conferences) conflicting recommendations – some suggest to set 0 to avoid queue blocks, other suggest some low value to reduce internal contention. In reality though this option best setting is highly application dependent and you should not blindly use somebody elses advice on it. Try different values and see what works best for you. As I mentioned in the start of this post changing innodb_thread_concurrency is happily online operation.