Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Drilling down to the source of the problem

November 9, 2008

Author

Peter Zaitsev

Insight for Developers

Share this Post:

I had an interesting tuning case few days ago. The system serving high traffic using Innodb tables would be stalling every so often causing even very simple queries both reads and writes taking long time to complete, with progress almost paused (dropping from thousands to tens of queries per second).

On the surface the problem looked simple – in the processlist every so often you would see a lot of queries, mostly selects taking 10+ seconds while at the same time there was no significant iowait, neither high CPU usage. Closer examination showed there were hundreds of queries stuck in the innodb queue, with innodb_thread_concurrency set to 8

Happily enough innodb_thread_concurrency is the variable which can be set online so it is easy to try a few different values and see what works best. In this case we decided to try removing restriction on runnable queries all together by setting it to 0.

The change made things different. Now there was no more trivial selects taking a lot of time but there were a lot of write queries taking a lot of time being… many of which SHOW INNODB STATUS showed as waiting on famous Innodb AUTO-INC lock.

That would be an easy excuse to blame everything on this lock, suggest upgrading to MySQL 5.1 and call it a day. Learning more about the system I however found out there are only some 20 inserts per second going to such table (which should not be the big deal even if they are serialized) plus it is not only inserts to table with auto increment were stalling but also inserts to the tables which did not have auto-increment columns and general updates. The stall was not caused by row level locks too.

Taking a closer look at Innodb flush activity showed this is checkpoint related problem. Unfortunately Innodb fuzzy checkpointing algorithm is not very smart causing stalls waiting for large portion of buffer pool is flushed before queries can proceed. As this is a common problem we have the patch for it.

What is very interesting in this case how true issues can really hide because of layers of false problems. The queries we first saw in the processlist were SELECTs but this is just because most queries in the application are selects and as soon as the thread concurrency slots were busy it is mostly selects which were stuck waiting.

The next symptom was AUTO-INC lock which again was a false one – because single insert slows down a lot of inserts pile up waiting on the AUTO-INC lock. Though this is not the main problem the main problem lives below – why the original insert is taking so long letting hundreds of AUTO-INC insert to pile up.

This was also very interesting case about optimal innodb_thread_concurrency. I continue reading (and hearing at the conferences) conflicting recommendations – some suggest to set 0 to avoid queue blocks, other suggest some low value to reduce internal contention. In reality though this option best setting is highly application dependent and you should not blindly use somebody elses advice on it. Try different values and see what works best for you. As I mentioned in the start of this post changing innodb_thread_concurrency is happily online operation.

0 0 votes

Article Rating

7 Comments

Oldest

Newest Most Voted

Aurimas Mikalauskas

17 years ago

Peter,

thanks – indeed quite an interesting case and makes feel good knowing we have the patch for it 🙂 Also, I didn’t know innodb_thread_concurrency can be changed online – this is useful to experiment with.

Davy Campano

17 years ago

Hey Peter,

I had a question about changing innodb_thread_concurrency. I changed it on a server yesterday from 8 to 0 and it caused the server to lockup. I am running the hi-perf 5.0.67 release, and I have some tables that use BEFORE INSERT triggers to update another value in a different table. I was curious if you might have any idea why this would happen. Here are a couple outputs from my log file when I made the change:

081113 16:41:32 InnoDB: Error: Freeing a trx which is declared to be processing
InnoDB: inside InnoDB.
TRANSACTION 1 2897690526, not started, process no 17811, OS thread id 1197627712, thread declared inside InnoDB 499
MySQL thread id 243641148, query id 2415637111 XX.XX.XX.XX MY_USERNAME

081113 16:41:32 InnoDB: Error: Freeing a trx which is declared to be processing
InnoDB: inside InnoDB.
TRANSACTION 1 2897690560, not started, process no 17811, OS thread id 1196562752, thread declared inside InnoDB 500
MySQL thread id 243641179, query id 2415637163 XX.XX.XX.XX MY_USERNAME

I had 7 of these show up in my logs, and then I had a couple hundred statements show up in processlist that wouldn’t finish.

Davy Campano

17 years ago

Also, was curious what is the name of the patch that addresses Innodb fuzzy checkpointing algorithm? You just linked to the page with all the patches. Thanks!

Author

Peter Zaitsev

17 years ago

Davy,

Did you restart with zero innodb_thread_concurrency or did you change it online ?
I’ve seen minor glitches sometimes (even in stick release) changing it online, though I thougth these were only related to counting transactions.

About patch name – it is part of Innodb IO Patches:
innodb_io_patches.patch

Davy Campano

17 years ago

Peter,

Thanks for the response. I changed the value online. When you say counting transactions, do you mean queries like the one below? This is one of the main queries that I saw that was stuck:

SELECT @rank:=@rank-1 AS rank, a.user_id, a.card_id, b.user_name, b.image_url FROM game_winners a
INNER JOIN users b ON a.user_id = b.user_id WHERE a.game_id = 5 ORDER BY a.win_time DESC LIMIT 16;

Author

Peter Zaitsev

17 years ago

No. I mean in SHOW INNODB STATUS:

1 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB

Sometimes when you check it online the number of “queries inside innodb” will stuck on 1 or something because MySQL will simply stop counting the queries entering/exiting the queue.