MySQL Sizing questionsPeter Zaitsev
I frequently get questions along the lines of “how many transactions per second MySQL can do” or “how many servers I need to handle 100.000 users” or “which hardware would be enough to handle my 40GB” database.
There are two things which are very interesting about these questions which make them funny and annoying at the same time
1) People asking these questions demand the answer. It could be internal Sales and Marketing people or even customers saying they typically get the answers from other vendors.
2) The best answer to this question will not be a number but something like “it depends”. Seriously you need much more information than given in this example to give estimate which is correct at least to the order of magnitude.
Seriously in MySQL Consulting practice it would not be exception to have 1-2 orders of magnitude performance increase just by proper tuning – this means even application implementation details may give huge difference, not to mention such aspecs as data details, load distribution, what exactly application is doing and a lot of these little substle variables which add up to completely different picture.
Because of so huge use case differences it hard even to give sizing recomendations even if application is defined. For example if you would ask me what hardware you need for ecommerce web site with 100.000 visitors per day I still would not have direct answers. This is because there are still a lot of seriously impacting variables. For example are you selling 100 items or millions of items as amazon.com. What search services are you going to offer ? Is there any extra services such as live advicing of items based on your previous purchase history etc. To illustrate it better – take a look at Dell DVD Store Whitepaper This paper is using “real world” kind of application benchmark which emulates internet shop activit. According to this paper you can get some 2000 orders/min from single box (including search etc). Actually with proper optimization you can even get it to 5000 orders/min from the same single server. Even with 10$ per average order it means $26 billions per year. This is more than Amazon.com does. So would they be able to run their whole orders from single MySQL server. Of course not, because their application is a lot different even though it is doing about the same thing.
So why people are asking questions which have no answers, and why are they demanding answers for them ? I guess there are two reasons:
1) Even if it does not make it is used by Sales and Marketing forces in many companies because it looks good. Also Big bosses (frequenly barely technical) need estimates for their plans and they would rather take wrong estimates than no estimates. Now than these estimates turn out to be wrong they can blame software vendor for providing these, while vendor typically would not accept any responsibilty for estimates being wrong. On the contrary – underestimates would turn extra license sales, while overestimates would typically mean more than needed licenses and hardware is already purchased.
2) There are certain cases where quick estimates actually make sense. This typically applies to partcular end user applications with with very particular usage scenario – quite typical in the enterprise, telecom words etc. For example purchasing retail system you can count number of sellers you have, number of items plus few more variables and very well estimate hardware requirements. These would of course apply to particular system in particular configuration but it will be rather accurate. Many people however would assume it is the same for database server handling custom application – no it is not.
So am I saying we do not need any sizing at all and should just pray we purchased enough hardware ? Of course note. We need estimates, however we need to spend time doing serious capacity planning, probably look at real data try tunning sample queries etc. This should be serious effort if you want to get usable results. Trying to guess from just couple of numbers could be extremely dangerous as it can be very wrong. You may have good intuition and you may guess right but it still should be treated as a guess, making sure there is not catastrophy if you are wrong.
I should write about some techniques and formulas which you can use for capacity planning in one of the next articles.