Choosing an appropriate benchmark length

March 21, 2011

Author

Baron Schwartz

Benchmarks

MySQL

Share this Post:

The duration of a benchmark is an important factor that helps determine how meaningful it is. Most systems have some “burstable capacity,” and this can influence the results a lot. You can see this in all areas of life — you can sprint much faster than you can run a 10k race. Your stereo system components are usually advertised in both peak and sustained output. Transducers can generally hit peaks that would melt them due to heat dispersion challenges if run at that level long-term. Database servers are no different. Many components in the system have the capacity to absorb peaks. But buffers eventually fill if pressured for a long time.

When designing a benchmark, you should think about what type of performance characteristics you are looking for in your production system. If you want a system that can handle peak loads that don’t last very long, then measuring burstable capacity with a short benchmark might be okay. But if you want to measure how the system will perform over a long time with a sustained load, then you need to run your benchmark for a long time.

This can be costly and time-consuming. If your cycle time is 8 hours or more, this can be frustrating, too. If you don’t time it right, you might only be able to fit in one or two benchmarks a day. Vadim runs a lot of long-term benchmarks on MySQL and Percona Server. He is a very patient man. Mark Callaghan has run benchmarks that last for months.

Sometimes you don’t know how long your benchmark should run until you try it. This was the case in a recent benchmark I ran. The following image shows the system’s IO behavior:

As you can see, the reads settled down after only 3 hours or so, but writes continued to climb until at least 8 hours. How long should I run this benchmark? In general, to understand the long-term performance, it should run at least twice as long as it takes for the system to settle in and appear fully warmed up. At that point, you should examine the preliminary graphs and see; if there is some unexplained variation, you should continue to run until you have determined that the system is behaving according to its long-term pattern. This might have cyclical variations. What is that notch near the right-hand side of the graph? Is that the beginning of a repeating pattern, a one-time event, or something else? There is only one way to tell: keep running the benchmark. I ended up running the above benchmark for 72 hours to ensure that it was exhibiting its typical long-term behavior.