So lets imagine you have the server handling 100.000 user accounts. You can see the CPU,IO and Network usage is below 10% of capacity – does it mean you can count on server being able to
handle 1.000.000 of accounts ? Not really, and there are few reasons why, I’ll name most important of them:
Contention – This is probably the most obvious one. MySQL (and systems in general) do not scale perfectly with numbers of CPUs and number of concurrent requests. Reduced efficiency of CPU cache, Mutex contention and database lock contention all come here. Some of them are preventable and can be reduced by code changes, such as there have been a lot of advanced in scalability of MySQL by improving locking code design, others, such as row level locks would require application changes to allow more concurrent process. The scalability numbers depend a lot on the system scale, software and workload.
Data size impact There are different type of applications out of there. Some (minority?) vary the load independently or almost independently of data size. You can think about Google search engine – the data size on which search operation is performed is constant, no matter if you’re serving 10 queries a day or a billion. True you probably would need to hold many copies of data to support high load but this is a scaling through copying not the amount of data you have in the system. Wikipedia is a similar case – the data size does not depend on the amount of readers, though writers contribute to the data size by creating new articles and increasing amount of versions in system. Applications such as Facebook, Flickr or Twitter have a very clear correlation between traffic and users.
Each registered user will in average have N MB of data stored in database, and traffic system is getting is somewhat proportional (though often not linear) to amount of users.
For systems of the first type the data size grows independently of traffic so it is fine to measure system capacity in Transactions per second. If system can handle twice amount of transactions per second it may be able to handle double the load. For systems of the second type you better use Transactions/Second/User or Transactions/Second/MB (which is similar measures as users in average have certain amount of data each). Doubling traffic for system of such type means handling twice amount of transactions on the twice amount of data.
Increasing amount of data is very serious implication for system performance. Some queries have relatively small impact (having LOG(N) scalability), others may have linear or even square complexity which
means increasing data size puts a very serious strain on the system. What is also very important and often forgotten is caching. Having twice amount of data means having half the cache – if you previously had 20% of data fits in memory, now it is only 10%. The impact of cache on performance is very application dependent as well and may vary from insignificant to dramatic.
You’re in the highest danger if you have very high portion of your database (or working) set fits in memory, hence having CPU bound workload. As your data growths you may frequently find load becoming IO bound and hence things becoming 10x slower (or more) sometimes with very modest size increase. I’ve seen things slowing down about 10x from less than 50% increase in the data size.
I see the data size impact often omitted in “consolidation” tests – when you would get a new server and see it can handle 5x of the load of the old one, so you would consider you can put 5 “shards” on it.
5 shards surely come with 5x more data which you need to carefully take into account.
Design Limits This is the brother of contention but I decided to put it separately. There are more things than contention which can limit the perfect scalability. The Replication is perfect example in MySQL world. Slave executes replication stream in single thread which means it replication can’t scale for large amount of writes. The lack of parallel query execution is similar issue – you may have a lot of resources in terms of CPUs and disk but it can’t help to reduce response time of the single query.
Response Time Do not forget you do not only be able to handle number of users in terms of capacity you also need to have response time to be within certain range in majority of cases. Some may look at 99% response time, some 95% but neither the less you want users to get response fast. This means you can’t plan on loading system 100%. There is a nice paper by Cary Millsap explaining this in more details. Depending on the system and workload you may want to keep your system loaded no more than 80% in peak times, though applications which need to accommodate for larger traffic variance need to have a lot more spare room.
So in the end the math to scale your system may not be as straightforward as you think – you need to take number of things into account and I’d always suggest to confirm your modeling with benchmarks/performance evaluation if have a chance.