GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Suddenly huge queries speed degradation for couple minutes

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Suddenly huge queries speed degradation for couple minutes

    Until 20:16:02 CET slow-query log is almost empty. Then suddenly, query which normally taking up to 2 seconds taking 8 seconds. In next 9 minutes slow-query.log is filled with queries taking much much more they are normally do. Again same query which taking up to 2 seconds taking 20 seconds (!) at 20:16:22.

    # Time: 100129 20:22:12
    # Query_time: 205 Lock_time: 102
    # Query_time: 307 Lock_time: 199
    # Query_time: 181 Lock_time: 0


    No always same time, so this problem random appearing so not for example some cron-job. Anyone have idea what to check, what this could be?

  • #2
    Gather more diagnostics. Get IO, CPU, process, and samples of SHOW PROCESSLIST, SHOW INNODB STATUS, SHOW GLOBAL STATUS at 10-second intervals.

    Comment


    • #3
      Ok today i gathered more diagnostics.
      Web server restart were at:
      2010-02-06 11:09:08
      2010-02-06 12:49:09
      2010-02-06 13:36:09

      which indicates time of problem. Then i looked at gathered diagnostics from this times.

      On 11:09 (first restart) usage of /dev/sdb which got 3 most used mysql tables were 100% for some times. Then again on 12:49 and 13:36 on next two restarts same problem 100% usage whole HDD by 3 tables which normally takes up to 20%.

      Next is vmstat logs (from 11:09)

      procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 75 436076 181108 4220 646940 0 0 15 74 0 0 19 3 76 2 0 0 68 436076 181212 4268 648652 0 0 1820 92 1514 1576 2 12 0 87 0 0 67 436076 180016 4272 650024 0 0 1328 8 1375 913 0 1 0 99 0 0 64 436076 177796 4276 652220 0 0 2200 4 1533 1209 1 0 0 99 0 0 63 436076 176272 4276 653800 0 0 1528 0 1464 1108 1 1 0 99 0 1 59 436076 175176 4308 655748 0 0 1740 472 1627 1525 4 1 0 96 0 0 60 436076 173800 4308 657076 0 0 1432 0 1611 1445 2 1 0 97 033 60 436076 385480 4316 658388 0 0 1384 0 1628 1508 3 6 0 91 0 0 64 385636 995628 4392 595104 0 0 1812 8 4329 8001 60 18 0 22 070 105 385632 817292 4440 611452 4 0 2284 4 4511 5055 78 12 0 10 0


      as you see CPU waiting time is 100% almost. Next minute also is that high. On 11:11 - 11:12 WA is around 20 then on 11:13 back to normal WA is from 3 to 11.



      So every of this choke (HDD/CPU/MYSQL) could be from one another. So HDD could bloke mysql and CPU, but CPU could be reason to block HDD and mysql.

      How to find what causing this really and how to fix it?

      Comment


      • #4
        "how to fix it" is not a valid question until you find out what the problem is. Look again at vmstat and think more about the relationship between iowait in the CPUs and disk IO. The CPUs are not causing the disks to "block." The disks aren't blocking. Look -- they are reading tons of data. The CPUs are blocked waiting for that to complete.

        Something is asking the disks to read data. It could be mysqld or something else. Is there a backup running? Is there a long-running query? You need to write a script that gets 'ps' and SHOW PROCESSLIST to answer these questions.

        Comment


        • #5
          I gathered show full processlist right now when problem occured again.
          ps showing only working mysqld process ( im sure of it for like 90% will confirm that next time).
          there is no backups etc on this random times only mysqld process working

          There is long query which would suit your theory on sending tons of data but strange is that, this query not returning huge amount of data because there is no post_text field, only user nickname, topic_name etc. basic info without text of posts which is biggest data value in db.
          Problem could be LIMIT 840,40 which is heavy query by only this.

          You must know we are talking about mysql slave server which is dedicated to managing search function for big phpbb forum (11,500,000 posts).


          Below 2 top queries from show full processlist, UPDATE locked and SELECT sending data.

          LIMIT 840, 40


          I looked also in /etc/my.cnf and saw max_allowed_packet = 16M which normally is 1M (16 was set while searching reason for other problem before).
          thread_cache_size = 190 which also is quite big to mysql not create a lot of threads (was increasing it till it stopped and all threads was in cache).
          no more strange settings ;].

          rest of SHOW FULL PROCESLIST will send you on PrivateMessage

          Comment


          • #6
            switch to dedicated search software, such as sphinx.

            Comment


            • #7
              Quote:

              switch to dedicated search software, such as sphinx.

              who told im not on it already? ;] but results of search must be pulled from DB like here.

              Comment


              • #8
                qspy wrote on Sun, 07 February 2010 01:14

                Quote:

                switch to dedicated search software, such as sphinx.

                who told im not on it already? ;] but results of search must be pulled from DB like here.

                Then why don't you do pagination in sphinx too?

                Comment


                • #9
                  How much data the query returns is immaterial. How much it reads from disk is what matters. Use EXPLAIN and the slow query log to understand your queries. And don't do pagination in MySQL )

                  Comment


                  • #10
                    We were forced to cut page viewing from for example 3000 pages to first 25 pages because of degradatin performance with every next page.

                    I will look into this sphinx pagination thing but sphinx got issues itself (returning false matches sometimes etc.) so im afraid it will not be 100% exact in pagination as it should be.

                    Another thing yesterday (07.02.2010) problem was minimalized maybe from not biggest traffic that day or changes that were made (packet size from 16 back to 1 and threads cache size from 190 to 150 with restarts of both mysqld's).

                    Web server didn't break whole day, and choke's was for 95% and maximum 97% of disk capabilities.
                    17:59 - only for 4 seconds disk usage was 60%, 90%, 71%, 76%
                    20:26 - minimum 4 seconds 75%, 95%, 97%, 79% dont know more...

                    only 1 query from that times in slow query log )
                    17:59 similiar like query described above LIMIT 784, 16;
                    20:26 similiar like query described above LIMIT 784, 16;

                    Here we go our problem is probably disk seek times because of big LIMIT's.

                    ) quite happy now, maybe some advices on sphinx pagination and its reliability.
                    Did you ever encounter sphinx returning false matches? For example exactly like described here:
                    http://www.sphinxsearch.com/forum/view.html?id=4922
                    We too searching with phrase like big number for example: 2256797652
                    and results are found in posts which dont have this number at all.

                    I know i should use probably SetLimits(xxx/xx) involving somehow this 864 id's (above query) but not yet sure.

                    UPDATE: confirmed it 100% by tests ) big LIMIT causing disk usage jumping from 0% to 99% and on high traffic this causing massive disk blocking which resulting in queries queueying for minutes.

                    Comment

                    Working...
                    X