One thing I noticed during the observation was that there were roughly 2,000 new connections to MySQL per second during peak times. This is a high number by any account.
When a new connection to MySQL is made, it can go into the back_log, which effectively serves as a queue for new connections on operating system size to allow MySQL to handle spikes. Although MySQL connections are quite fast compared to many other databases it can become the bottleneck. For a more in depth discussion of what goes on during a connection and ideas for improvement, see this post by Domas]),
With MySQL 5.5 default back_log of 50 and 2000 connections created per second it will take just 0.025 seconds to fill the queue completely if requests are not served, which means even very short stall in main thread which is accepting connections will cause for some connection attempts to be refused.
What back_log variable specifies is second parameter to listen() operating system call. The maximum value on Linux is directed by tcp_max_syn_backlog sysctl parameter unless syncookies are enabled.
The simple solution in this case was to increase the value of back_log to one that could handle longer bursts Increasing back_log to 1000 in this case would give us enough of a queue to handle up to 500ms stall which is good enough for most cases, so it’s important to understand your application workload and tune specifically to your needs.
This got me thinking of what the disadvantages of setting this value equal to the OS limit (/proc/sys/net/ipv4/tcp_max_syn_backlog) and the only limitation would be that you could potentially have a large number of connections waiting on connection instead of failing quickly and potentially connecting to different server, but that can be fixed by setting the client connect timeout setting (which not a lot of people do).
Another important setting if you’re working with many connections per second is thread_cache which impacts cost of connections dramatically. You want it to be set to the value so no more than couple of threads are created every second during normal operation.
I would note if you’re having more than 1000 connections/sec you’re getting pretty close to what could be the limit and you should consider techniques to reduce number of connections. Namely persistent connections and connection pools might be good solution for many applications.