There is the rare bug which I ran into every so often. Last time I’ve seen it about 3 years ago on MySQL 4.1 and I hoped it is long fixed since… but it looks like it is not. I now get to see MySQL 5.4.2 in the funny state.
When you see bug happening you would see MySQL log flooded with error messages like this:
091119 23:03:34 [ERROR] Error in accept: Resource temporarily unavailable
091119 23:03:34 [ERROR] Error in accept: Resource temporarily unavailable
091119 23:03:34 [ERROR] Error in accept: Resource temporarily unavailable
091119 23:03:34 [ERROR] Error in accept: Resource temporarily unavailable
filling out disk space
Depending on the case you may be able to connect to MySQL through Unix Socket or TCP/IP or neither.
It also looks like there is a correlation between having a lot of tables and such condition.
Previously I was unlucky with seeing this issue in production so we had to restart MySQL quickly currently I have a test MySQL showing some behavior.
Here is what strace tells me:
[percona@test9 msb_5_4_2]$ strace -f -p 19229
Process 19229 attached with 23 threads – interrupt to quit
[pid 19286] rt_sigtimedwait([HUP QUIT ALRM TERM TSTP],
[pid 19285] futex(0x165962ec, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19284] select(0, NULL, NULL, NULL, {0, 765000}
[pid 19283] select(0, NULL, NULL, NULL, {0, 412000}
[pid 19248] futex(0x1781193c, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19247] futex(0x178118bc, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19246] futex(0x1781183c, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19245] futex(0x178117bc, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19244] futex(0x1781173c, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19232] futex(0x1781113c, FUTEX_WAIT_PRIVATE, 165, NULL
[pid 19229] accept(16392,
[pid 19241] futex(0x178115bc, FUTEX_WAIT_PRIVATE, 319, NULL
[pid 19243] futex(0x178116bc, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19242] futex(0x1781163c, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19240] futex(0x1781153c, FUTEX_WAIT_PRIVATE, 3, NULL
[pid 19239] futex(0x178114bc, FUTEX_WAIT_PRIVATE, 7, NULL
[pid 19238] futex(0x1781143c, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19237] futex(0x178113bc, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19236] futex(0x1781133c, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19235] futex(0x178112bc, FUTEX_WAIT_PRIVATE, 7, NULL
[pid 19234] futex(0x1781123c, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19233] futex(0x178111bc, FUTEX_WAIT_PRIVATE, 207, NULL
[pid 19231] futex(0x178110bc, FUTEX_WAIT_PRIVATE, 1, NULL
[pid 19229] <... accept resumed> 0x7fffffac70b0, [18423225245113516048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 19229] accept(16392, 0x7fffffac70b0, [18423225245113516048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 19229] accept(16392, 0x7fffffac70b0, [18423225245113516048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 19229] accept(16392, 0x7fffffac70b0, [18423225245113516048]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 19229] fcntl(16392, F_SETFL, O_RDWR) = 0
[pid 19229] fcntl(16392, F_SETFL, O_RDWR) = 0
[pid 19229] select(16394, [1025 1040 1042 1044
….
8 9709 9714 9716 9717 15368 15369], NULL, NULL, NULL*** buffer overflow detected ***: strace terminated
======= Backtrace: =========
/lib64/libc.so.6(__chk_fail+0x2f)[0x35f36e6aff]
/lib64/libc.so.6[0x35f36e5ad3]
strace[0x408b60]
So as you can see accept gets pretty high socket number – probably because of large innodb_open_files I tested with in this case – which all can be used during recovery.
Note the process gets accept on the socket to fail with EAGAIN (which is not well described in the manual) and later getting call to select call with 16384 sockets. This does not seems to be the healthy number to work with SELECT call and It is quite possible something goes wrong because of it.
If you have any ideas why could be going wrong in this case.