October 24, 2014

Mystery Performance Variance with MySQL Restarts

Based on a lot of surprising comments about my MySQL 5.5 vs 5.6 performance post I decided to perform deeper investigation to see where my results could go possibly wrong. I had set up everything to be as simple as possible to get maximally repeatable results. I did Read Only ran which is typically a lot more repeatable (though also less relevant for production like workload). I had done number of iterations for benchmark run and I used dedicated physical hardware box so external environment impact often causing problems in Virtualized environments can be eliminated. Still I found there could be large variance between the runs.

I set up the benchmarks run to go over night in the loop, doing the benchmark run for 5 runs when restarting MySQL server and repeating the run. I did it on 2 identical boxes to eliminate faulty hardware as possible suspect. In both cases I’ve spotted something along those ways:

These would be very stable results repeatable in most cases, but sometimes you would see something like this instead:

As you can see the variance between individual (5 minute) run iteration is expectedly rather small – less than 1% especially if you remove the first run, which can be seen as warmup. However the difference between different starts of MySQL is staggering. It looks like sometimes you can just get MySQL server started a wrong way and you will get 10% worse performance which will be very stable at that lower level.

I can repeat it for both MySQL 5.5 and MySQL 5.6 both with single thread and 64 threads run, though the difference in performance can vary from 5% to 10%.

This does not seems to be the plan difference which could be one common cause of performance differences based on server restart it seems to be something else. For now I do not have a good answer what it could be. Why I’m writing about it ? Well there are two outcomes for me.

First – I now can see the confirmation for “magical slowdowns” some our customers reported when upon restart MySQL works significantly slower with no changes to the query load or query plans, which can be solved by MySQL restart. My previous approach have been – there must be some external changes in environment which we’re just not tracking.

Second – I understand we need to be a lot more careful with benchmarks. We need to do multiple runs with MySQL restarts in the end to get meaningful data.

I also can still confirm there is a significant performance difference between MySQL 5.5 and MySQL 5.6 in this workload with MySQL 5.6 being slower, yet the worse case performance difference looks lower than 26% I wrote about. I will update with corrected numbers as I will be able to look more into it.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Justin Swanhart says:

    I’d say it is unlikely to be MySQL itself, as obviously the code has not changed between restarts and we can consider that constant, and the way the code uses the CPU won’t have changed either, and everything else (total memory, OS settings, etc) are constant too.

    Perhaps some operating system or even hardware behavior is triggered when the old mysql process shuts down (or perhaps when the new one starts). I’m thinking something along the lines of what happens after you stop a write intensive workload on InnoDB. If one does not know about the insert buffer and the flushing of dirty pages, and the idle loop, it may seem counterintuitive that disk activity goes up when the DML is stopped.

    One thing that does come to mind is NUMA. Perhaps when memory is allocated unbalanced overall performance drops? But I think you said you tested this on an old box. Is it NUMA?

  2. Justin,

    Yes I was thinking about NUMA as potential cause. Though this CPU seems to be specifying only one NUMA node:

    root@dpe01:~# lscpu
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 4
    On-line CPU(s) list: 0-3
    Thread(s) per core: 2
    Core(s) per socket: 1
    Socket(s): 2
    NUMA node(s): 1
    Vendor ID: GenuineIntel
    CPU family: 15
    Model: 4
    Stepping: 3
    CPU MHz: 2992.772
    BogoMIPS: 5985.43
    L1d cache: 16K
    L2 cache: 2048K
    NUMA node0 CPU(s): 0-3

    I agree to suspect kernel-hardware interaction here

  3. Justin Swanhart says:

    I think the best thing to do would be to use perf or oprofile against each run and see which system calls get slower, or if new system calls bubble up to the top.

  4. Looks like this is this old monster:
    http://ark.intel.com/products/27478/Intel-Pentium-4-Processor-630-supporting-HT-Technology-2M-Cache-3_00-GHz-800-MHz-FSB

    The HT might be the issue with stability on this one though it is strange to see with 64 threads when all cores should be rather busy. I’ll check if I can get the same behavior on the newer box

  5. Justin Swanhart says:

    I wonder… Did they just label P4 630’s as Xeons?

    model name : Intel(R) Xeon(TM) CPU 3.00GHz

  6. Justin Swanhart says:

    The model is probably irrelevant if it is netburst and has hyperthreading, and you suspect HT to be the bottleneck. I was just trying to find a Xeon CPU matching your specs and couldn’t seem to find one – the 630 was the closest I could find.

    Yesterday I was thinking HT too, but I discarded the idea as I would really expect that to be a problem over all runs. You are always running the same number of threads, right? Unless there is something really weird, I don’t see how the HT overhead wouldn’t be uniform across runs. Of course, we’re looking for something really weird so maybe I shouldn’t have tossed the idea out. I’m very curious to see how the tests on the new machine go.

  7. Justin Swanhart says:

    Could you have bad ECC ram? That would make some RAM slower than other, similar to unbalanced NUMA.

  8. Well… Might be this is the different name they decided to use later. The specs seems to identical and this is what Wikipedia lists for this family/model stepping. Note the problem happens with 2 different boxes which would be unlikely with ECC also there are no ECC errors reported. I started the test on other box with different CPU lets see what kind of variance we’ll get

  9. Justin Swanhart says:

    Oh, I missed that you had tested two machines to eliminate faulty hardware. My bad.

  10. Marc Alff says:

    This definitively adds more spice to performance tuning.

    Is the shutdown + restart part of the same script, in which case this happens when the box is under constant load, or does this happen also with a manual shutdown and manual restart (with presumably some idle time in between) ?

    Not an explanation by itself, but how about adding a sleep + sync before restarting the server, to see if this has some effect ?

    In any case, thanks for sharing this.

    Regards,
    — Marc

  11. marc castrovinci says:

    How was the shutdown/restart done? Was it with an init script or mysqladmin shutdown / mysqld_safe –defaults-file=xxxxx?

    It is concerning that there could be a degradation in performance from just restarting.

  12. Marc,

    In this case I’m just using “restart” script from MySQL Sandbox. There might be some warmup effect though I would expect to see it only from 1st out of 5 rounds which are done. Each of them is 5 minutes which is rather long time (considering we’re speaking about some 200MB total data size)

    I’ve now trying the same on couple of different hardware. The hardware I noticed on has the largest difference between MySQL restarts though different boxes also have quite a difference between different MySQL server “instances” I will write more about it as I will complete the test.

  13. UPDATE: I’ve done runs on 5 other different servers with range of different CPUs. The couple of older HT enabled ones have the largest variance. Also single thread performance seems to be impacted more than results of multi thread ones. Still in all cases there seems to be some “good starts” and “bad starts” where variance between different started MySQL server instances is significantly higher than between different benchmark run for same running server instance. The typical difference between worst and best “starts” can be somewhere around 2%. Also it might take some time for outliers to show up – sometimes you need 10+ restarts to get surprisingly “good” or “bad” instance

    As result If you’re doing performance analyses where 2-3% difference is important I believe you need to do runs with multiple MySQL restarts to get reliable picture.

  14. Jean-Francois Lagace says:

    I noticed a very similar problem. While investigating on something else, I created 4 identical schemas with the same data. In fact, I restored the same dump file in 4 different schemas on the same database instance, same server. I run the same batch of sql queries on all 4 schemas and get constant performance on each run. The batch run in about 7 seconds. But there is always 1 schema that is very slower. On this schema, the batch takes 41 seconds to run. And I’m saying that this is similar to your case because after I restart MySQL, it’s not the same schema that is slower. But there is always 1 schema slower than the 3 others. And it always takes the same time to run the batch on the faster schemas (7 sec) and on the slower one (41 sec). And as I said, if I restart MySQL, there is always one schema that is slower than the other, but it’s not the same one as before the restart.

    I hope this will give you new ideas on what can be causing this behaviour.

    Regards
    Jeff

  15. Jean-Francois,

    For such large difference I would much likely expect difference in computed statistics. Are these Innodb tables ?
    Try running ANALYZE on your tables to see if this changes the timing.

  16. Jean-Francois Lagace says:

    Does the computed statistics change when restarting MySQL? Because after a restart, the slow schema becomes fast and one that was fast becomes slow.

  17. Yes, MySQL 5.5 and below with Innodb will recompute the stats on restart. You can have persistent statistics in MySQL 5.6 and Percona Server which are preserved between restarts.

  18. Jean-Francois Lagace says:

    I tried running ANALYZE TABLE on all tables of the slow schema. Sometimes it works and the queries run fast. But sometimes I have to run ANALYZE TABLE multiple times for the queries to be fast. And each time I restart MySQL, one of my 4 schemas becomes slow again. It’s like if the statistics are on the line between good and bad for my queries. And running ANALYZE makes the stats shift from one side to the other randomly.

    In my case, maybe to have the statistics persistent would help as long as it’s the “fast” statistics that are kept.

    Thank you
    Jeff

Speak Your Mind

*