Performance or Stability ???

14 years ago

Personally I’d prefer “stable performance” term, but everything is relative 🙂
Yes, testing is testing, and life is life.. What I’d expect from a database: if the testing workload is “stable” itself, there is no really any reason to see “unstable” database performance (if OS, HW and IO level are working as expected and don’t have any particular performance issues..) – so testing is helping to reproduce the problem (when possible) as no one customer will agree to play with their production :-))

Then, any observed TPS “drops” should be analyzed and fixed. And there is still a lot of work yet to be done with MySQL.. 😉

Rgds,
-Dimitri

0

Peter Zaitsev

Author

14 years ago

Dmitri,

The thing is stable workload mostly exists in labs, almost any real system has workload which is unstable – number of incoming requests varies, there are batch jobs etc. Now the “hardware” performance is also generally not uniform with competing processes running on the system, such as backups, in many installations such as virtualized environments or SAN/NAS based databases there are many competing tenants for same hardware resources.

In practical case there is a danger in the system which bursts performance every so often as if burst is long enough you may think it is the normal performance and use the wrong number for capacity planning.

One more thing – burst may happen in seemingly uniform systems and you would need to do extra effort to prevent them. Consider for example flushing pages. Even if you design system to flush evenly 100 pages to the disk every second you may end up with cases when these pages are split on the whole disk or located on the close track(s) or even merged to the same sequential write. Which will take different amount of resources and hence will leave more resources for reads to happen hence likely increasing throughput.

0

Yzmir Ramirez

14 years ago

What about Capacity, for a lack of a better word? If you take Stability, Performance, and Capacity and put them on a triangle that pretty much sums up your choices between the three.

As you increase Capacity both Stability and Performance degrade and visa versa. Would you agree?

[img]http://img269.imageshack.us/img269/7130/servertriangleperforman.png[/img]

0

Dimitri

14 years ago

Peter,

Well, there are many reason why performance may be unstable (and as you see in all your examples you already have an explanation ;-)).. – and most of the HW issues can be easily monitored and explained too (for ex. on Solaris you may scan for a ration between Seq and Random I/O operations, and every storage array has its own stats too, etc.)..

Then, once you observe performance issues on your production system the first thing you’re trying to do is to align all “random” events with what you’re observing within a given period and look if there is any dependency, etc.. Then, if you really want to find the root issue of the problem, you start by reducing the “random” perimeter: you make sure your MySQL instance is alone to use its storage, its system, etc. Then you’re trying to analyze more in depth and finally come with a test case which gives you a way to reproduce your problem *every* time.. – And yet then you still may split your problem in several test cases, and analyze each one alone.. – like you’re using bricks to build a wall.. 🙂

A “stable workload” test case is one of such kind of bricks. And according to what kind of workload you’re running you know what kind of the result you’re expecting, no?.. – and if it’s not so, then you’re analyzing the problem in depth and looking for solution(s) to fix it.. Once you’re sure you’re obtaining a “stable performance” on a “stable workload”, you may try to involve any other “random” events – to see what will be impact of the batch started in background, or a report heavy query scanning a half of tables, etc..

Son before to build a wall you first have to be sure about your bricks..
Still don’t agree?.. 😉

Such a discussion may be very long and take weeks or months (specially if the beer is good :-)) it’s pity I’m coming this year only for Collaborate-11 conference and will miss UC, otherwise it was a good occasion to spend a night in “perf bar” :-))

Rgds,
-Dimitri

0

Patrick Casey

14 years ago

I think I share the consensus here in that I’d prefer consistent performance over throughput, provided that my performance was “good enough”.

Put numerically, if my app requires 100 q/s to function, I’d rather have a database that delivers between 110-120 q/s over one that delivered anywhere between 90-1000 q/s.

Of course, nobody sets out to build database infrastructure that can’t keep up with minimum requirements, usually if I’m looking at something its because the lower bound has been breached and suddenly we have a performance problem.

Like Dimitri was saying though, there’s a mess of factors at work here, one of which is clearly going to be price. For *enough* money I can design almost any practical performance level, but I’ve never, ever, worked on a project where the design criterial was “make it as fast as you can I don’t care about the price”. Usually the criteria is “it must have X performance, Y uptime, and should be as cheap as you can manage it”.

My mental model of the mysql user base (which could be totally bonkers), probably separates it into three big tranches.

You have a lot of small users backing cookbook sites, probably running myisam on old single disk servers or virtuals. This group probably values convenience above all else.

You have a middle group of users running on designed hardware to run one or more “real” database servers, almost definitely innodb on RAID, big memory boxes, but with serious budget constraints. This group probably values performance because they’re trying to eek out the maximum throughput without buying another server.

You have a small group at the top running enterprise grade sites on farms of dedicated database boxes. This group values stability above all else; they can solve a performance problem with more hardware and the cost of a few dozen extra boxes is nothing compared to the business cost of going dark or degrading to the point where there is customer impact.

So I guess the answer to the first question is: it depends who you are :).

0

Peter Zaitsev

Author

14 years ago

Yzmir,

Capacity is a good word though it is rather loosely understood by people. Though Scalability and Performance are not much better you can find tons of definitions.

0

Peter Zaitsev

Author

14 years ago

Dmitri,

Indeed too bad you’re not coming to California this year. Though good you at least make it to Collaborate. There are going to be bunch of Percona guys there though to keep you company 🙂

Speaking about your description – yes… in the end computers are state machines and if you dig deep enough you will find same conditions lead to stable result. My point is at certain level it is too complicated to deal in production environment. We love to have explanation for everything and we can find it in almost any case… though in many cases people would rather have just solution if explanation costs too much 🙂

0

Peter Zaitsev

Author

14 years ago

Patrick,

Yes. There are a lot of assumptions out there. I surely assume you want to get the best result for your money, which means you want to have system which can happen maximal amount of load all the time. In your case the answer is easy as 110-120 q/sec both higher minimum and higher variance compared to 90-1000 q/sec. What if it would be 120-1000 q/sec in this case ? This has higher minimum bug higher variance.

If you care about getting most for your money such behavior for the same is better as it can handle application which requires 120q/sec in peaks. However I understand some people would rather spend extra and get system with lower performance but lower variance for the piece of mind. System which as lower variance is a lot easier to monitor for exceptional events.

0

Peter Zaitsev

Author

14 years ago

Patrick,

Yes. There are a lot of assumptions out there. I surely assume you want to get the best result for your money, which means you want to have system which can happen maximal amount of load all the time. In your case the answer is easy as 110-120 q/sec both higher minimum and higher variance compared to 90-1000 q/sec. What if it would be 120-1000 q/sec in this case ? This has higher minimum bug higher variance.

If you care about getting most for your money such behavior for the same is better as it can handle application which requires 120q/sec in peaks. However I understand some people would rather spend extra and get system with lower performance but lower variance for the piece of mind. System which as lower variance is a lot easier to monitor for exceptional events.

0

Peter Zaitsev

Author

14 years ago

Dmitri,

Indeed too bad you’re not coming to California this year. Though good you at least make it to Collaborate. There are going to be bunch of Percona guys there though to keep you company

Speaking about your description – yes… in the end computers are state machines and if you dig deep enough you will find same conditions lead to stable result. My point is at certain level it is too complicated to deal in production environment. We love to have explanation for everything and we can find it in almost any case… though in many cases people would rather have just solution if explanation costs too much

0

Peter Zaitsev

Author

14 years ago

Yzmir,

Capacity is a good word though it is rather loosely understood by people. Though Scalability and Performance are not much better you can find tons of definitions.

0

Yzmir Ramirez

14 years ago

What about Capacity, for a lack of a better word? If you take Stability, Performance, and Capacity and put them on a triangle that pretty much sums up your choices between the three.

As you increase Capacity both Stability and Performance degrade and visa versa. Would you agree?

[img]http://img269.imageshack.us/img269/7130/servertriangleperforman.png[/img]

0