I’m again returning to InnoDB scalability and related bug #15815 as it hurts many users and customers using multi-cpu servers.
Short intro into problem:
On 4-CPU box 1 thread executes full-table scan select query for 8 sec,
but with 4 threads – each thread executes query for 240 sec.
It is very strange as threads use only SELECT queries and ideally there should be no
any problem in concurrent enviroment, especially for CPU-bound workload.
I did the profiling which show the problem is with “buffer pool” mutex, which protects innodb_buffer_pool.
In details, for each scanned row InnoDB calls block_get / block_release functions
which aquire/release the block related to the current row. And the problem functions
block_get / block_release use mutex_lock(buffer_pool_mutex) / mutex_unlock(buffer_pool_mutex)
calls. So global mutex is accessed for each scanned rows, what in multi-CPU/multi-threading
enviroment results in “mutex ping-pong” problem.
Looking in source code of block_get / block_release function I don’t see obviosly reasons
to use global lock which can not be weaken to block level. I tried to replace buffer_pool_mutex in
these place to block_mutex and I got impressive results: now
each of 4 threads executes query for 11 sec.
So performance is increased by 240/11 ~= 21 times, and if before we had negative scalability, currently
scalabilty factor is 2.9 (the result with 4 threads = 2.9 * result with 1 thread).
The current patch can not be considered as stable as it touches many InnoDB subsystems,
but it looks like right direction to solve problem.