Seemingly Random Slowdowns in CPU-bound Workload

  • Filter
  • Time
  • Show
Clear All
new posts

  • Seemingly Random Slowdowns in CPU-bound Workload

    Server specs:

    24 cores (E5670)
    128GB mem
    10 x X25E RAID 10 for data
    2 x X25E RAID 1 for OS/logs

    According to our monitoring, it seems like when things slow down, they slow down across the board. All queries seem to take longer. The slowdowns seem to come in groups as you can see here: https://skitch.com/jamesgolick/8djnw/fetlife-rails3-rubby-1. 9.3-new-relic

    Edit: I should add that disk array utilization and await remain constant during the slowdowns, as does the buffer pool hit rate.

    Also, we're mysql version: "Server version: 5.0.91-50-log Percona SQL Server, Revision 73 (GPL)"

    My mind immediately jumped to this being a mutex contention issue, but I don't see much in the SEMAPHORES section of show innodb status. Example innodb status here:

    I'm not graphing OS Waits yet, but I'm working on getting that in to ganglia now. Any other thoughts on where to look?

  • #2
    I don't think you have enough information to diagnose the problem. I'd use pt-stalk to gather more information when it happens.


    • #3
      I used pmp to get the following when it happens: https://gist.github.com/ae48bf7797f78e718bca

      Note that this is after I disabled the query cache, since the first few profiles showed some contention on that.

      Is it normal to see this much waiting for open_and_lock_tables? I'm not very familiar with the mysql codebase.


      • #4
        So after spending some time digging through those stack traces and the source, this is my conclusion:

        - most of those threads appear to be waiting for the LOCK_open mutex
        - that lock is held waiting for openfrm to either open() or read()
        - http://bugs.mysql.com/bug.php?id=51557 appears to be the solution (no longer holding LOCK_open when reading from frm files), which is in 5.5

        Am I missing something or does this sound accurate?

        Also, any guesses as to what the conditions are that cause this contention to become problematic?


        • #5
          Yeah, you're hitting LOCK_open. This is a serious bottleneck in 5.1 and although there are sometimes ways to avoid it, it can also be a rat race. Upgrading might make sense if it's something you're considering doing anyway.