You might be familiar with Six Sigma business management strategy which is employed by variety of the companies in relationship to managing quality of its product. Six Sigma applies to number of defects – when you have reached six sigma quality in your production you would see 99.99966% of the products manufactured with no defects, or in other words there is less than 3 defects per million.
One of principles of six sigma is what customers tend to be concerned about variance a lot more than average. For example if you produce tomato soup and the average difference from declared weight is going to be 0.1 gram or 0.5 gram, probably nobody would not notice the difference. What would worry people however is significant number of very large differences, such as half empty tomato soup can.
You can see Apdex standard looking among similar lines, which is looking to classify user experiences as good, tolerable and not acceptable.
So how are doing with this in MySQL (and general Web performance management) ? Not So good. In most cases organizations focus on performance rather than performance stability even though it is the later what their users really care about. Consider Adaptive Checkpoint in Innodb for example – until it got implemented we had very poor performance stability under very large number of workloads. Look at the industry standard TPC-C benchmark for example – it defines the guidance for response time as ninety five percentile response time, which is just two sigma in sigma notation. A lot of popular tools, including Sysbench and Maatkit also use 95% response time (probably inherited from these industry standards).
Sometimes we use 99% response time as a guidance but this is not even quite 3 sigma which would be 99.7%.
One reason we use rather weak confidence intervals in performance optimization is we need a lot of transactions to get stable measurements for stronger ones. If we’re looking at 95 percentile something like 1000 transactions will give more or less stable results in most cases. If we’re looking at 99 percentile we would need at least 10.000 and I’d look for at least 100.000 to get stable 99.9% response time.
In number of my talks about performance measurements I suggested to measure 95 percentile over short period of times for example every minute, which is great for graphing and it is somewhat gives you higher confidence if you look at the whole day. I believe meeting 95 percentile response time every minute of the day is a lot harder than getting 99 percentile over 24 hours.
It is worth to mention modern applications become a lot more demanding to stable performance. In the past you would have only one request to get HTML which would be responsible for most information on the page. Now with technologies like AJAX there can be number of request during page generation and interaction and each of them being slow give sluggish experience to the user.
So what can we do in terms of performance management ? Start measuring much higher response time and allow for less variance. Why not to starve at least for 99.9 daily response time (one request per thousand being slow) and use 99 percentile over 5 minutes to graph response time stability. This is still very far from the confidence manufacturing uses. This is also what is probably practical for most of internet applications – A lot of applications like Gmail have 99.9 to 99.99 percent of uptime, and from the user point of view the percentile number for good response time is bound by availability figure.
I also would like to see the technology vendors would focus more on the performance stability in their benchmarks. Way too often you just see the Throughput number reported which gives zero information about performance stability and so is completely irrelevant to ability of the product to provide great user experience.