October 25, 2014

Drilling down to the source of the problem

I had an interesting tuning case few days ago. The system serving high traffic using Innodb tables would be stalling every so often causing even very simple queries both reads and writes taking long time to complete, with progress almost paused (dropping from thousands to tens of queries per second).

On the surface the problem looked simple – in the processlist every so often you would see a lot of queries, mostly selects taking 10+ seconds while at the same time there was no significant iowait, neither high CPU usage. Closer examination showed there were hundreds of queries stuck in the innodb queue, with innodb_thread_concurrency set to 8

Happily enough innodb_thread_concurrency is the variable which can be set online so it is easy to try a few different values and see what works best. In this case we decided to try removing restriction on runnable queries all together by setting it to 0.

The change made things different. Now there was no more trivial selects taking a lot of time but there were a lot of write queries taking a lot of time being… many of which SHOW INNODB STATUS showed as waiting on famous Innodb AUTO-INC lock.

That would be an easy excuse to blame everything on this lock, suggest upgrading to MySQL 5.1 and call it a day. Learning more about the system I however found out there are only some 20 inserts per second going to such table (which should not be the big deal even if they are serialized) plus it is not only inserts to table with auto increment were stalling but also inserts to the tables which did not have auto-increment columns and general updates. The stall was not caused by row level locks too.

Taking a closer look at Innodb flush activity showed this is checkpoint related problem. Unfortunately Innodb fuzzy checkpointing algorithm is not very smart causing stalls waiting for large portion of buffer pool is flushed before queries can proceed. As this is a common problem we have the patch for it.

What is very interesting in this case how true issues can really hide because of layers of false problems. The queries we first saw in the processlist were SELECTs but this is just because most queries in the application are selects and as soon as the thread concurrency slots were busy it is mostly selects which were stuck waiting.

The next symptom was AUTO-INC lock which again was a false one – because single insert slows down a lot of inserts pile up waiting on the AUTO-INC lock. Though this is not the main problem the main problem lives below – why the original insert is taking so long letting hundreds of AUTO-INC insert to pile up.

This was also very interesting case about optimal innodb_thread_concurrency. I continue reading (and hearing at the conferences) conflicting recommendations – some suggest to set 0 to avoid queue blocks, other suggest some low value to reduce internal contention. In reality though this option best setting is highly application dependent and you should not blindly use somebody elses advice on it. Try different values and see what works best for you. As I mentioned in the start of this post changing innodb_thread_concurrency is happily online operation.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Peter,

    thanks – indeed quite an interesting case and makes feel good knowing we have the patch for it :) Also, I didn’t know innodb_thread_concurrency can be changed online – this is useful to experiment with.

  2. Hey Peter,

    I had a question about changing innodb_thread_concurrency. I changed it on a server yesterday from 8 to 0 and it caused the server to lockup. I am running the hi-perf 5.0.67 release, and I have some tables that use BEFORE INSERT triggers to update another value in a different table. I was curious if you might have any idea why this would happen. Here are a couple outputs from my log file when I made the change:

    081113 16:41:32 InnoDB: Error: Freeing a trx which is declared to be processing
    InnoDB: inside InnoDB.
    TRANSACTION 1 2897690526, not started, process no 17811, OS thread id 1197627712, thread declared inside InnoDB 499
    MySQL thread id 243641148, query id 2415637111 XX.XX.XX.XX MY_USERNAME

    081113 16:41:32 InnoDB: Error: Freeing a trx which is declared to be processing
    InnoDB: inside InnoDB.
    TRANSACTION 1 2897690560, not started, process no 17811, OS thread id 1196562752, thread declared inside InnoDB 500
    MySQL thread id 243641179, query id 2415637163 XX.XX.XX.XX MY_USERNAME

    I had 7 of these show up in my logs, and then I had a couple hundred statements show up in processlist that wouldn’t finish.

  3. Also, was curious what is the name of the patch that addresses Innodb fuzzy checkpointing algorithm? You just linked to the page with all the patches. Thanks!

  4. peter says:

    Davy,

    Did you restart with zero innodb_thread_concurrency or did you change it online ?
    I’ve seen minor glitches sometimes (even in stick release) changing it online, though I thougth these were only related to counting transactions.

    About patch name – it is part of Innodb IO Patches:
    innodb_io_patches.patch

  5. Peter,

    Thanks for the response. I changed the value online. When you say counting transactions, do you mean queries like the one below? This is one of the main queries that I saw that was stuck:

    SELECT @rank:=@rank-1 AS rank, a.user_id, a.card_id, b.user_name, b.image_url FROM game_winners a
    INNER JOIN users b ON a.user_id = b.user_id WHERE a.game_id = 5 ORDER BY a.win_time DESC LIMIT 16;

  6. peter says:

    No. I mean in SHOW INNODB STATUS:

    1 queries inside InnoDB, 0 queries in queue
    1 read views open inside InnoDB

    Sometimes when you check it online the number of “queries inside innodb” will stuck on 1 or something because MySQL will simply stop counting the queries entering/exiting the queue.

  7. Yasufumi says:

    Davy,

    I succeeded to reproduce the problem you met. I think it is MySQL’s bug. I have posted the bug report.

    http://bugs.mysql.com/bug.php?id=40760

Speak Your Mind

*