Internals of InnoDB mutexesVadim Tkachenko
InnoDB uses its own mutexes and read-write locks instead of POSIX-mutexes pthread_mutex*, the main reason for that is performance, but InnoDB’s implementation isn’t ideal and on modern SMP boxes can cause serious performance problems.
Let’s look on InnoDB mutex (schematic for simplification):
for (i=0; i< innodb_spin_locks;i++)
if (mutex_try_lock()) return; // success
// we are here if spin loop was not successful
reserve_cell in array of cell;
wait on condition_variable in reserved cell;
innodb_spin_locks is configurable via system variable innodb_sync_spin_loops (default value is 20)
There we have:
1. Spin-loop – InnoDB uses spin-loop in hopes thread locked mutex is very fast and will release mutex while current thread runs in spins, so there is saving of expensive context switching. However we can’t run spin very long as it eats CPU resources and after some loops it is reasonable to give CPU resource to concurrent threads.
2. Wait on condition variable using cells from synchronization array. Synchronization array becomes from age of dinosaurs and Windows 95, where count of condition variables was limited, so InnoDB had to implement workaround. The same synchronization array is performance problematic, as it by itself uses mutex inside to protect cells.
With big respect to InnoDB team they still take care of users of Windows 95 I think it is better to remove synchronization array and use condition variables directly.
I’ve made the patch which simply replace array. We have reports from partners that this patch increase scalability on Solaris/Opteron boxes dramatically – so you can test it if you have multi-CPU boxes and high-concurrency load on InnoDB.