Yesterday, over at my personal blog, I blogged about the impact of the MySQL slow query log. Since we’re working on Percona Server 5.6, I did wonder if this was a good opportunity to re-examine how we could provide slow query log type functionality to our users.
The slow query log code inside the MySQL server does several things that aren’t good for concurrency on modern systems. For starters, it takes a mutex while both doing formatting of what to write and while doing the write(2) system call itself. This is, in fact, entirely unneccesary.
From the man page for the write system call:
If the file was open(2)ed with O_APPEND, the file offset is first set to the end of the file before writing. The adjustment of the file offset and the write operation are performed as an atomic step.
So we can pretty easily see that for our case of writing the slow query log, if we had a buffer that was the log entry, we could just call write(2) and never take a mutex. We trade some memory usage (for the buffer) for concurrency – pushing all the concurrency problems down into the operating system.
The main challenge with this strategy is log file rotation, as if you just close(2) a file descriptor that others may be write(2)ing to then bad things may happen (namely that the file descriptor gets reused for something else.. say an innodb data file, and writing part of your slow query log as an innodb page will not end in hugs and puppies).
So, in order to test my theory, I sat down and wrote an audit plugin. It turns out I had to extend the MySQL plugin API to be able to reproduce all the information in the slow query log. The code for my prototype can be found at lp:~stewart/percona-server/5.6-slow-query-plugin-prototype (this revision) – this will probably build fine against 5.5 too, but I haven’t tested. To build it you’ll need the Intel Threading Building Blocks headers installed as I used their tbb::atomic<> implementation.
My solution to log file rotation is to basically count how many people may be using it and waiting until this drops to zero before closing the file descriptor. This avoids taking any locks around calling write(), which is the most time consuming part of writing the log. I’m not convinced if this is the best solution or not, but it appears to work and I think it’s race free.
No improvement like this is complete without benchmarks. My benchmarks (“mysqlslap -q ‘select 1;’ –number-of-queries=1000000 –concurrency=64” on my laptop) come out at the following:
So that’s a quite promising start.