I already wrote kind of about same topic a while ago and now interesting real life case makes me to write again 🙂
Most Web applications we’re working with have single tier web architecture, meaning there is just single set of apache servers server requests and nothing else – no dedicated server for static content, no squid in front nothing else. This architecture is frequently used even for medium size web sites which have millions of page views per day.
Typically single Apache server in this configuration will have rather high MaxClients settings (in hundreds) and would argue web site performance suffers if the value is decreased, only few however understand why they need MaxClients to be set to some high number.
First lets talk about performance and concurrency. It is often considered the higher concurrency is better, in fact however systems typically perform best with limited concurrency when they are able to saturate all resources but yet not cause scheduling and switching overhead to become serious problem. Depending on application and hardware this “optimal” number can be different and it is best to find one by benchmarking. If you want some ballpark figure it can be something like 2*(Num_CPUs+Num_disks), sometimes less sometimes more.
With optimal concurrency you get optimal throughput, which is number of transactions per second. In benchmarks which perform operations in the loop as fast as possible response time might not be best in such case (at least max and 95%) but in real world it is usually OK as you’re not planning to run your web servers at peak capacity they can handle and this is what is usually good to keep response time within the limits.
Sometimes I see people to use the following formula to count optimal number of children – I have Page generated in 1.0 seconds in average and I’d like to handle 100 req/sec so I need 100 children to keep with the load. This Formula looks right from glance view while it is really wrong as page generation time depends on concurrency. Average generation time for concurrency of 4 will be quite different from concurrency of 100. You can use this formula but you need to consider response time growth with growth of concurrency and not keep it static.
So with Web applications your need limited number of workers – (I’d guess 20-30) to get best performance, this is of course if all your operations are local – meaning you only deal with your local network – Database Server, File Servers etc, if you’re querying external web services or do any kind of other network IO situation is a bit different. So why do do you need large number of Apache children (assuming pre-fork mode) to get decent performance ?
Network IO – This is actually the only valid one. If you’re querying Amazon Web Service for example to display affiliate goods and you have timeout of 3 seconds you better have enough apache children (fast-cgi processes etc) to handle this worse case. So if you have 10 page views/sec you need at least 30 workers, otherwise if Amazon (or your network connection to Amazon) slows down site may become inaccessible. If you want your visitors to wait for 3 seconds or use some caching or lower timeout is other story 🙂
Handling Keep Alive – Keep alive connection in Apache keeps child busy and especially if you have KeepAliveTimeout high can consume a lot of them. For dynamic pages you typically do not need keep alive enabled and you’d better to server images from some other server anyway (possibly even apache still, but configured differently)
Spoon Feeding Clients – Slow clients may take a lot to get the page back and apache child is busy until content is fully sent. This can take a lot of time if client is slow and also allows to DOS site pretty easily by pretending very slow client and starting number of downloads. To avoid this problem keep Timeout low and better avoid apache talking to web clients directly – use something which will do the buffering and can spoon feed clients without much overhead. The danger of spoon feeding lies in ever changing nature of the Internet. Slow down or packet loss may happen somewhere and if you have large amount of users from that segment amount of spoon feeding needed may skyrocket.
Why are apache children are problem at all ?
Well, because they are fat and ugly. Seriously with modern PHP applications each apache children may require 64MB of memory, sometimes even more. Part of this memory is kept between requests so considerable amount of memory is used even if child is serving static page hit at the moment. Besides excessive memory use (which is inefficient resource usage issue) you have other issues such as requiring a lot of connections to the database and then if database slows down all these hundreds of connections may start running queries at the same time, which does not help database to recover from slowdown quickly. It also requires more treatment from OS and generally inefficient.
Now couple of War stories.
I decided to post this thing as on weekend one Chinese bot almost took down the site we’re working with. That site is still low traffic and we thought 300 apache children is fine for now until we find time to configure things properly. But the bot have come and started Spidering the site – it was not bad, waiting for a few seconds between request submission, and at other time we would not notice it but it was extremely slow at getting the data back. There were page downloads taking 10 minutes+ in the apache status. As it was accepting the data, just very slowly, apache TimeOut was not triggered. Disabling bot is easy however the fact single bot on slow connection may affect things so dramatically is uneasy.
This thing remind me the problem we had years ago while I was with SpyLOG. We had our data center connection become bad at last mile giving packet loss of about 1% looks like small number but it was enough to skyrocket number of connections in all different states. The main problem at that time was different – It was Linux Kernel 2.2.x which had problem with SYN backlog being linearly scanned rather than hashed (so a lot of outstanding connections in SYN_RECV state caused system overload) but other nasty things also were happening.