I read very nice post by Matt today and it has many good insights though I can’t say I agree on all points.
First there is a lot of people out where which put it as disk is everything. Remember Paul Tuckfield saying “You should ask how many disks they have instead of how many systems they have” on MySQL UC2008 Scalability Panel ? Indeed disks MAY be the most important part in your system performance or it may not be. Different people get to deal with different systems and so acquire different feeling about percentage of cases when disk would be the problem.
However it is not always the case. There are whole classes of systems where Disk performance is not that important – consider for example systems where most of the database fits in memory. These days we can get 64G of memory for pretty commodity prices and this allows you to get a lot of data in. Many Web 2.0 sites in particular design for having everything they need in memory, which is not that hard after you got VC funding as soon as you can shard your data properly so you would not have to replicate everything
There are also cases when storage size becomes the limit rather than number of IOs it can handle. Imagine for example data archive storage. On BoardReader for example we’re limited by space and how much data we can comfortably put in MySQL Instance without getting in too much trouble with backups etc – because Sphinx is doing all search heavy lifting we only need to insert new data and fetch few rows to show search results which does not require too much IO capacity or high in memory fit.
For large number of applications optimizing IO performance will be number one problem do not get me wrong, I just it is not always the case.
Another mistake is to measure IO capacity in spindles – So your hard drive can do 200 random reads per second and your SSD drive can do 10000 does it mean if you have array with 50 hard drives it would perform as good as SSD ? Not really. Leaving all potential serialization issues along you need enough concurrency to utilize multiple hard drives and with 50 drives you would need at least 50 outstanding requests all the time to fully utilize them. So for example these 50 drives will unlikely be helpful solving replication delay or speeding up this 3 hours reporting query or 5 hours ALTER TABLE
Let me also comment on the memory fit diagram – understanding your working set is paramount but taking number from other application is one of the worst mistakes you can make. One application can become CPU bound even with 5% of total data set in memory while over may require full 100%. Performance gains graphs in relation to fit in memory will also be different.
Now let me come to filesystem layout issues This is where a lot of roads meet and you have to consider a lot of topics – security, manageability, performance. I tend to advice different things depending on what is the most important. Assuming you have fixed amount of disk at your disposal and if you have BBU most likely data files is where all IO will go. If you dedicate RAID1 volume for Operating system and another one for Transactional Logs you will often wast 4 hard drives from performance standpoint without a good reason.
Having OS and tmp away of the data is a good idea as Matt says – you do not want your runaway logs or extreme temporary space usage to stop the database, but they usually can keep the same RAID volume.
I tend to keep Innodb data files (and whole datadir) and log files on the same partition as well because this makes it very easy to use LVM for backups while assuming you have BBU on the raid volume impact on transactional log write speed is usually minimal. At the same time if reliability is desired it might be good idea to keep Binary Logs on the separate volume – you may need them to do point in time recovery if you want to recover to last committed transaction in case your data RAID volume was trashed.
Regarding SWAP – I tend to have it but not optimize for it, putting it on shared drive (or OS+Stuff drive if there is one) – you do not want your database box to swap actively – if it does your’re in trouble anyway so it is better to fix the things rather than optimize swap performance.
I also fully support Matt in his views of black boxes. If you want to be able to resolve your performance problems you need transparency, you need to have X-Ray vision, you need to understand what is happening and why. A lot of pieces – filesystem, disk scheduling, database buffer management can be rather complex but you need to have at least basic understanding of the processes to be able to reliably identify problems.
At the same time I would be very careful with your assumptions – in many instances I’ve seen different parts of the system working not as I would expect them to and not as it would make sense in my opinion but quite differently.