Announcement

Announcement Module
Collapse
No announcement yet.

can someone EXPLAIN this :)

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • can someone EXPLAIN this :)




    explain SELECT s.*, d.NAME as DOMAIN_NAME, d.ID as DOMAIN_ID, d.CANONICAL as CANONICAL, COUNT(d.CANONICAL) as SUMMA, l.TWO_LETTER_CODE as CODE, c.ATTEMPTED_AT, TIME_TO_SEC(TIMEDIFF(NOW(),c.ATTEMPTED_AT)) AS DIFFA FROM SEEDS s LEFT JOIN DOMAINS d ON s.domain_id = d.idLEFT JOIN DOMAINS c ON c.id = d.canonicalLEFT JOIN LANGUAGES l ON s.language_id = l.id WHERE s.STATUS = -6 AND s.ID NOT IN (SELECT p.SEED_ID FROM PAGES p) AND ( (c.ATTEMPTED_AT = 0) OR (TIME_TO_SEC(TIMEDIFF(NOW(),c.ATTEMPTED_AT))>5) ) GROUP BY d.CANONICAL ORDER BY c.ATTEMPTED_ATLIMIT 100;


    It's not very slow (about 6/7 seconds) but I would like it less than 1 second.

    +--+------------------+-----+--------------+------------------+----------+-------+---------------------------+----+-------------------------------+|id|select_type.......|table|type..........|possi ble_keys.....|key.......|key_len|ref.............. ..........|rows|Extra..........................|+--+------------------+-----+--------------+------------------+----------+-------+---------------------------+----+-------------------------------+|.1|PRIMARY...........|.d...|index.........|PRIMA RY,CANONICAL.|CANONICAL.|.5.....|NULL............. ..........|.22.|Using.temporary;.Using.filesort||. 1|PRIMARY...........|.c...|eq_ref........|PRIMARY. ..........|PRIMARY...|.4.....|social_search.d.CANO NICAL..|..1.|Using.where....................||.1|P RIMARY...........|.s...|ref...........|URL_IDX,DOM AIN_IDX|DOMAIN_IDX|.4.....|social_search.d.ID..... ....|418.|Using.where....................||.1|PRIM ARY...........|.l...|eq_ref........|PRIMARY....... ....|PRIMARY...|.4.....|social_search.s.LANGUAGE_I D|..1.|...............................||.2|DEPENDE NT.SUBQUERY|.p...|index_subquery|SEED_IDX......... .|SEED_IDX..|.4.....|func.......................|. .1.|Using.index....................|+--+------------------+-----+--------------+------------------+----------+-------+---------------------------+----+-------------------------------+


    Thanks in advance!!

  • #2
    The last line of your EXPLAIN results, the DEPENDENT SUBQUERY, is most likely your problem. The NOT IN sub-query is referencing tables which are outside the sub-query, which causes the sub-query to be re-executed for every row examined by your main select. So while you think you're executing one query, you're actually running about a thousand. You can probably convert this to a left join of the form:

    SELECT A.*
    FROM A
    LEFT JOIN B
    ON A.foo = B.foo
    WHERE
    B.foo IS NULL

    Which captures rows from A which don't match B (basically, a NOT IN).

    Comment


    • #3
      Hi there,

      I removed the NOT IN ... and it sped it up a small bit.

      I am worried about the temporary tables (Using temp; using filesort) created because of the ORDER BY but I can't seem them optimize them away.

      It'd be good to speed this
      (TIME_TO_SEC(TIMEDIFF(NOW(),c.ATTEMPTED_AT))>5)
      up maybe?

      I'm out of ideas...

      Comment


      • #4
        What's the new explain look like?

        And why are you selecting SEEDS.*? Your group by will make these columns a little unpredictable in a one-to-many case.

        Also, if you have an index on ATTEMPTED_AT, it won't be used until you re-write your restriction to make a comparison against the column value without altering it, for example:

        c.ATTEMPTED_AT < DATE_ADD(NOW(), INTERVAL -5 SECOND)

        This doesn't mean the optimizer will use that index, but it will have the option.

        Comment


        • #5
          Hi there again,

          I don't really understand your last reply. Could you elaborate?

          We seemed to get a decent speed up by changing the WHERE to a HAVING...

          SELECT ... WHERE ... AND (X) GROUP BY ... ORDER BY ...
          ->
          SELECT ... WHERE ... GROUP BY ... HAVING (X) ORDER BY ...

          (X) is the complex calculation, I hope it hasn't changed the logic of the SQL

          Does this make the GROUP BY more predictable/stable?

          Thanks!

          Comment


          • #6
            Consider the following:


            mysql> SELECT a, b, c FROM examples;+------+------+------+| a | b | c |+------+------+------+| 1 | 2 | 3 | | 1 | 2 | 4 | | 1 | 3 | 3 | | 1 | 3 | 4 | +------+------+------+


            Adding a WHERE clause filters the output of a select:


            mysql> SELECT a, b, c FROM examples WHERE c = 4;+------+------+------+| a | b | c |+------+------+------+| 1 | 2 | 4 | | 1 | 3 | 4 | +------+------+------+


            Now when you group by, normally you only select the columns you're grouping on or some aggregate function of the remaining columns. The reason is that if you select non grouped columns, the result is unpredictable:


            mysql> SELECT a, b, c FROM examples GROUP BY b;+------+------+------+| a | b | c |+------+------+------+| 1 | 2 | 3 | | 1 | 3 | 3 | +------+------+------+


            Why is the value of column c above 3 and not 4? There's no logic behind that, it actually has to do with the order in which the rows are processed, which could be affected by insertion order and the use of indexes. My point is if you try to replicate this on your computer you might get 4 instead of 3. SQL Server, for example, considers the selection of a and b in this case to be illegal, to prevent you from shooting yourself in the foot. MySQL allows it because its convenient if you have a one-to-one relationship, because there is only one possible value, so selecting an arbitrary value for c from a set of rows that all have the same value gives you a stable result.

            HAVING is like WHERE except that it is evaluated after the GROUP BY. Consider what happens if you turn my WHERE clause from above into a HAVING:


            mysql> SELECT a, b, c FROM examples GROUP BY b HAVING c = 4;Empty set (0.00 sec)


            That's clearly not the same as:


            mysql> SELECT a, b, c FROM examples WHERE c = 4 GROUP BY b;+------+------+------+| a | b | c |+------+------+------+| 1 | 2 | 4 | | 1 | 3 | 4 | +------+------+------+


            So if you're using a HAVING, it must refer to an aggregate function, and not to an ungrouped column, or your results will be unpredictable. Does that make sense?

            Comment


            • #7
              Got you. I see what what you mean.

              Okay: I have s.* (and l.two_letter_code which has a one to one relation with s so you may as well consider it in s, it's just a lookup)

              There are many s for d but I don't care which i get i just want to process one s at a time for d and then mark it as processed...

              Moving the complex calculation from the where to the having sped it up more than anything else from about 15-20 seconds depending on the volume of data (100,000s of rows) to 5 seconds or less...

              Comment

              Working...
              X