Content delivery system design mistakesPeter Zaitsev
This week I helped dealing with performance problems (part MySQL related and part related to LAMP in general) of system which does quite a bit of content delivery, serving file downloads and images – something a lot of web sites need to do these days. There were quite a bit of mistakes in design for this one which I though worth to note, adding some issues seen in other systems.
Note this list applies to static content distribution, dynamic content has some of its own issues which need different treatment.
DNS TTL Settings The system was using DNS based load balancing, using something like img23.domain.com to serve some of the images. I’m not big fan of purely DNS based load balancing and HA but it works if configured well. In this case however the problem was zero TTL set in DNS configuration. This obviously adds latency especially for “aggregate” pages which may require images to be pulled from 10 different image servers.
Keep Alive In my previous post I wrote you often do not need keep alive for dynamic pages (there are also exceptions) but you really should have Keep Alive enabled while serving images. It especially hurts not to have one if 30 thumbnails are loaded per page if you do not have one.
Use Proper Web Server This one is pretty interesting. Many learned apache is not good for serving static content especially with many thousands of keep-alive connections. Lighttpd is often named as faster alternative. It is surely true if you server static content from memory or serving large files (in which case read-ahead helps) but if you’re serving many millions of thumbnails it may not work well. Lighttpd 1.4 is single threaded and that single threads is used to handle both network and IO which is not going to scale especially if you have storage build from large number of hard drives. This problem is fixed in Lighttpd 1.5 which is however still in pre-release (though we have it running in production). There are other solutions such as nginx which does not seems to have this problem. My point with this item is not advice you which exactly web server to use but to point out serving dynamic content, serving in-memory static content and serving static content from the disk are different tasks and should not be mixed.
Use noatime mount option Serving images you rarely need last access time tracked for them, especially as this tracking can be quite expensive if there are a lot of different files accessed concurrently. It is especially worth to note updating access time needs io even in case content is in OS cache. Simple solution is use noatime mount option for your static content partition. Some web servers also support for O_NOATIME file open flag on newer Linux versions, which also can be used.
Using PHP to serve files This usually comes in play if security control for files or traffic limit per user needs to be enforced. Serving file by simply reading from PHP or any other heavy programming language and sending it back is worse thing you can do. The optimal solution would be to using some server modules for access control or hacking one if you need some special functionality, few people however have skills to do this kind of job. The other solution is to use X-send-file or similar custom headers which make PHP script just to check restrictions and have web server to send the file. This technique actually was used in the project we’re speaking about with exception of resuming file downloads. In lighttpd 1.4 you could not tell server to send only part of the file so it had to be implemented in php. However even small portion of such downloads caused a lot of trouble especially as lighttpd tries to buffer all content it gets from PHP script which means many megabytes for large files. Partial file sending happily was added in lighttpd 1.5 which was one of the reason to move to it quickly.
Thumbnail size In many cases you need multiple sizes of thumbnails, such as very small size to show in list mode, standard size to show in preview mode and may be something else. It looks very attractive to keep only one thumbnail and simply set browser image sizes if you need to stick it to smaller space. It is not so good in practice however. First browsers may not resize image with optimal quality and it may just look ugly compared to properly created thumbnails. It also makes things a lot slower. I for example seen 8K thumbnail in the case when smallest one should have been less than 1KB in size. With 30 images on the page it is 30KB worth of download vs 240KB which makes quite a bit of difference especially on slower connection speeds.
Forgetting to set expire Static content rarely changes, in many implementations it may never change as loading new version of the same object will result in different url. This means you really need to have expire headers for them set, otherwise you will be getting a lot of cache revalidation requests which still require stat() calls on web server side and which can be avoided. How far in the future you want expire to be… it depends on the application of course. It can range for few hours for objects changes to which you want to be quickly visible to infinity for objects which only change with their urls.
Server side caching The benefit of server side caching is not so obvious for serving static content, or better say it depends on situation. If you serve a lot of small files which can fit in squid (for example) in-memory cache it may work quite well and in fact will give better memory utilization as OS cache has single page granularity (meaning 100 bytes file will still take 4KB in cache). For serving large set of files (which does not fit in memory) you can have performance to go down as request will be frequently made to the main server to get the file. Also it frequently does not make sense to use disk cache for static content as getting it from the server may be close in speed. It also of course depends on the server which you’re using – apache in prefork mode (ie same server used for static and dynamic content) would likely to benefit a lot from one.
Using different servers As I already mentioned serving different content requires different skills from web servers or different configuration, so it is good idea to use dedicated server name, such as static.domain.com for serving static content, so even if you do not need it now you have flexibility in the future. Also even for same server it allows to configure different virtual host with different settings easily.