Sometimes the question is put like are you looking for Performance OR Stability, which I believe is a strange way to put it. In real life systems you care both about Performance AND Stability. I would even say Stability is a not the best world here, I would day you care about your minimal performance in most cases.

If system can handle 5000 q/sec for 1 minute and when 20.000 s/sec for the next one, how much I can count on in terms of capacity planning ? In case this is typical OLTP system I will have to use 5000 q/sec number as I need my system to always be able to reach performance requirements. If the system though is doing batch processing may be I can count on the average which is 12.5K in this case.

The difference between stability and minimal performance is important as I can be quite OK with “unstable” performance if it is performance bursts rather than stalls, for example if my system performs 7000 q/sec and sometimes bursts to 15.000 q/sec I will prefer it to the system which has stable 6000 q/sec performance.

Finally we have to understand the difference between Benchmarks and real life. Most benchmarks perform “stress testing” throwing load on the system and seeing how much it can handle. In real world however you typically have a given load, which typically falls into the certain range, for example you may say we have 300-500 queries/sec in our peak time. Because most systems have load based on something called “random arrivals” rather than uniform pace of requests, the more you slice the time the more variance you’re going to get. For example the same case could correspond to 20-100 queries in 100ms period. In the real applications you do not drive your system at complete saturation point to accommodate for such micro spikes and you care about response time a lot, as it is response time what
users will use to judge if your system is fast or slow.

You will always see some response time distribution rather than all queries of the same time having same response time and this distribution will vary a lot. Typically the less outliers you have with the same
average response time the better it is.

The relationships between Throughput and response time is complicated and we can’t always say better throughput comes with better response times, but it still serves and important point. If I know my system peaks on 1000 q/sec for 10 seconds and I have to serve traffic which is 2000 q/sec – I can’t do that, as a lot of queries will have to be queued for at least 10 seconds until performance recovers, and this means their response time will be at least 10 seconds, which is likely not acceptable to me.

The micro stalls though can be acceptable. If my system serves 5000q/sec in average but there are some 10ms intervals in which it stalls to process 0 queries or processes just 1000 queries,sec
and my query inflow rate is 2000 q/sec and required response time is within 50ms, it well may be acceptable. Note if you drill to small enough time intervals you will find those micro stalls in basically any system.

As a summary most likely you do care about your performance, the minimal performance to be certain. The interval at which you should measure this minimal performance depends of response time your application can tolerate. In the MySQL Benchmarks Vadim Posted we see there are a lot of stalls lasting 1 minute or more which are not acceptable for any interactive application.

16 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Dimitri

Personally I’d prefer “stable performance” term, but everything is relative 🙂
Yes, testing is testing, and life is life.. What I’d expect from a database: if the testing workload is “stable” itself, there is no really any reason to see “unstable” database performance (if OS, HW and IO level are working as expected and don’t have any particular performance issues..) – so testing is helping to reproduce the problem (when possible) as no one customer will agree to play with their production :-))

Then, any observed TPS “drops” should be analyzed and fixed. And there is still a lot of work yet to be done with MySQL.. 😉

Rgds,
-Dimitri

Yzmir Ramirez

What about Capacity, for a lack of a better word? If you take Stability, Performance, and Capacity and put them on a triangle that pretty much sums up your choices between the three.

As you increase Capacity both Stability and Performance degrade and visa versa. Would you agree?

[img]http://img269.imageshack.us/img269/7130/servertriangleperforman.png[/img]

Dimitri

Peter,

Well, there are many reason why performance may be unstable (and as you see in all your examples you already have an explanation ;-)).. – and most of the HW issues can be easily monitored and explained too (for ex. on Solaris you may scan for a ration between Seq and Random I/O operations, and every storage array has its own stats too, etc.)..

Then, once you observe performance issues on your production system the first thing you’re trying to do is to align all “random” events with what you’re observing within a given period and look if there is any dependency, etc.. Then, if you really want to find the root issue of the problem, you start by reducing the “random” perimeter: you make sure your MySQL instance is alone to use its storage, its system, etc. Then you’re trying to analyze more in depth and finally come with a test case which gives you a way to reproduce your problem *every* time.. – And yet then you still may split your problem in several test cases, and analyze each one alone.. – like you’re using bricks to build a wall.. 🙂

A “stable workload” test case is one of such kind of bricks. And according to what kind of workload you’re running you know what kind of the result you’re expecting, no?.. – and if it’s not so, then you’re analyzing the problem in depth and looking for solution(s) to fix it.. Once you’re sure you’re obtaining a “stable performance” on a “stable workload”, you may try to involve any other “random” events – to see what will be impact of the batch started in background, or a report heavy query scanning a half of tables, etc..

Son before to build a wall you first have to be sure about your bricks..
Still don’t agree?.. 😉

Such a discussion may be very long and take weeks or months (specially if the beer is good :-)) it’s pity I’m coming this year only for Collaborate-11 conference and will miss UC, otherwise it was a good occasion to spend a night in “perf bar” :-))

Rgds,
-Dimitri

Patrick Casey

I think I share the consensus here in that I’d prefer consistent performance over throughput, provided that my performance was “good enough”.

Put numerically, if my app requires 100 q/s to function, I’d rather have a database that delivers between 110-120 q/s over one that delivered anywhere between 90-1000 q/s.

Of course, nobody sets out to build database infrastructure that can’t keep up with minimum requirements, usually if I’m looking at something its because the lower bound has been breached and suddenly we have a performance problem.

Like Dimitri was saying though, there’s a mess of factors at work here, one of which is clearly going to be price. For *enough* money I can design almost any practical performance level, but I’ve never, ever, worked on a project where the design criterial was “make it as fast as you can I don’t care about the price”. Usually the criteria is “it must have X performance, Y uptime, and should be as cheap as you can manage it”.

My mental model of the mysql user base (which could be totally bonkers), probably separates it into three big tranches.

You have a lot of small users backing cookbook sites, probably running myisam on old single disk servers or virtuals. This group probably values convenience above all else.

You have a middle group of users running on designed hardware to run one or more “real” database servers, almost definitely innodb on RAID, big memory boxes, but with serious budget constraints. This group probably values performance because they’re trying to eek out the maximum throughput without buying another server.

You have a small group at the top running enterprise grade sites on farms of dedicated database boxes. This group values stability above all else; they can solve a performance problem with more hardware and the cost of a few dozen extra boxes is nothing compared to the business cost of going dark or degrading to the point where there is customer impact.

So I guess the answer to the first question is: it depends who you are :).

Yzmir Ramirez

What about Capacity, for a lack of a better word? If you take Stability, Performance, and Capacity and put them on a triangle that pretty much sums up your choices between the three.

As you increase Capacity both Stability and Performance degrade and visa versa. Would you agree?

[img]http://img269.imageshack.us/img269/7130/servertriangleperforman.png[/img]

Patrick Casey

I think I share the consensus here in that I’d prefer consistent performance over throughput, provided that my performance was “good enough”.

Put numerically, if my app requires 100 q/s to function, I’d rather have a database that delivers between 110-120 q/s over one that delivered anywhere between 90-1000 q/s.

Of course, nobody sets out to build database infrastructure that can’t keep up with minimum requirements, usually if I’m looking at something its because the lower bound has been breached and suddenly we have a performance problem.

Like Dimitri was saying though, there’s a mess of factors at work here, one of which is clearly going to be price. For *enough* money I can design almost any practical performance level, but I’ve never, ever, worked on a project where the design criterial was “make it as fast as you can I don’t care about the price”. Usually the criteria is “it must have X performance, Y uptime, and should be as cheap as you can manage it”.

My mental model of the mysql user base (which could be totally bonkers), probably separates it into three big tranches.

You have a lot of small users backing cookbook sites, probably running myisam on old single disk servers or virtuals. This group probably values convenience above all else.

You have a middle group of users running on designed hardware to run one or more “real” database servers, almost definitely innodb on RAID, big memory boxes, but with serious budget constraints. This group probably values performance because they’re trying to eek out the maximum throughput without buying another server.

You have a small group at the top running enterprise grade sites on farms of dedicated database boxes. This group values stability above all else; they can solve a performance problem with more hardware and the cost of a few dozen extra boxes is nothing compared to the business cost of going dark or degrading to the point where there is customer impact.

So I guess the answer to the first question is: it depends who you are :) .

Dimitri

Peter,

Well, there are many reason why performance may be unstable (and as you see in all your examples you already have an explanation ;-) ).. – and most of the HW issues can be easily monitored and explained too (for ex. on Solaris you may scan for a ration between Seq and Random I/O operations, and every storage array has its own stats too, etc.)..

Then, once you observe performance issues on your production system the first thing you’re trying to do is to align all “random” events with what you’re observing within a given period and look if there is any dependency, etc.. Then, if you really want to find the root issue of the problem, you start by reducing the “random” perimeter: you make sure your MySQL instance is alone to use its storage, its system, etc. Then you’re trying to analyze more in depth and finally come with a test case which gives you a way to reproduce your problem *every* time.. – And yet then you still may split your problem in several test cases, and analyze each one alone.. – like you’re using bricks to build a wall.. :-)

A “stable workload” test case is one of such kind of bricks. And according to what kind of workload you’re running you know what kind of the result you’re expecting, no?.. – and if it’s not so, then you’re analyzing the problem in depth and looking for solution(s) to fix it.. Once you’re sure you’re obtaining a “stable performance” on a “stable workload”, you may try to involve any other “random” events – to see what will be impact of the batch started in background, or a report heavy query scanning a half of tables, etc..

Son before to build a wall you first have to be sure about your bricks..
Still don’t agree?.. ;-)

Such a discussion may be very long and take weeks or months (specially if the beer is good :-) ) it’s pity I’m coming this year only for Collaborate-11 conference and will miss UC, otherwise it was a good occasion to spend a night in “perf bar” :-) )

Rgds,
-Dimitri

Dimitri

Personally I’d prefer “stable performance” term, but everything is relative :-)
Yes, testing is testing, and life is life.. What I’d expect from a database: if the testing workload is “stable” itself, there is no really any reason to see “unstable” database performance (if OS, HW and IO level are working as expected and don’t have any particular performance issues..) – so testing is helping to reproduce the problem (when possible) as no one customer will agree to play with their production :-) )

Then, any observed TPS “drops” should be analyzed and fixed. And there is still a lot of work yet to be done with MySQL.. ;-)

Rgds,
-Dimitri