GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

LEFT JOIN Optimization

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • LEFT JOIN Optimization

    Hi all,

    I have been reading mysql 5.6 ref man; in section 7.13.6. LEFT JOIN and RIGHT JOIN Optimization there is:

    Quote:
    "The table read order forced by LEFT JOIN or STRAIGHT_JOIN helps the join optimizer do its work much more quickly, because there are fewer table permutations to check. Note that this means that if you do a query of the following type, MySQL does a full scan on b because the LEFT JOIN forces it to be read before d:

    SELECT *
    FROM a JOIN b LEFT JOIN c ON (c.key=a.key)
    LEFT JOIN d ON (d.key=a.key)
    WHERE b.key=d.key;
    so I have stupid example for simulate this:


    create table a ( k int(10) unsigned NOT NULL auto_increment, v int(10) unsigned default NULL, primary key (k));create table b ( k int(10) unsigned NOT NULL auto_increment, v int(10) unsigned default NULL, primary key (k));create table c ( k int(10) unsigned NOT NULL auto_increment, v int(10) unsigned default NULL, primary key (k));create table d ( k int(10) unsigned NOT NULL auto_increment, v int(10) unsigned default NULL, primary key (k));insert into a (k,v) values (1,10), (2,20), (3,30), (4,40);insert into b (k,v) values (2,200), (4,400), (5,500), (6,600);insert into c (k,v) values (1,1000), (3,3000), (7,7000), (8,8000);insert into d (k,v) values (3,30000), (4,40000);


    If I run explain for select from refman example, I get different result:


    mysql> explain select * from a join b left join c on (c.k = a.k) left join d on (d.k=a.k) where b.k=d.k;+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------+| 1 | SIMPLE | d | ALL | PRIMARY | NULL | NULL | NULL | 2 | || 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | ljoin.d.k | 1 | || 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | ljoin.d.k | 1 | || 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 4 | ljoin.b.k | 1 | |+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------+


    where table D is scanned as first which is different to refman. Where is a bug? Tx

    j.

  • #2
    You should read on:
    Quote:
    For a LEFT JOIN, if the WHERE condition is always false for the generated NULL row, the LEFT JOIN is changed to a normal join.
    The WHERE condition referred to in this quote is "b.k=d.k".

    Comment


    • #3
      thanks you for reply, ... so if values of b,d table are

      insert into b (k,v) values (2,200), (4,400), (5,500), (6,600);
      insert into d (k,v) values (3,30000), (4,40000);

      it means that

      Quote:
      WHERE condition is always false for the generated NULL row
      ?

      condition "b.k=d.k" is not affected by b.k[4]==d.k[4]?

      Comment


      • #4
        d.k[4] is not a NULL row.

        Please read http://dev.mysql.com/doc/refman/5.0/en/working-with-null.htm l

        It means that if d.k is NULL, the condition b.k=d.k is always false, hence the LEFT JOIN is changed to a normal join. This does not depend on your data. The documentation quoted in your first message is wrong.

        Comment


        • #5
          (note.: d.k[4] is not 4-th index but value 4)

          sorry for my stupid question: why it does not depend on my data? I thought that condition b.k=d.k depend on my data.

          If you write "The documentation quoted in your first message is wrong." It means that LEFT JOIN doesn't force b to be read before d?

          Comment


          • #6
            In the query from the documentation, LEFT JOIN is always treated as a cross join.

            >> sorry for my stupid question: why it does not depend on my data? I thought that condition b.k=d.k depend on my data.

            true, but you should look at the case d.k is NULL (resulting from the LEFT JOIN).

            Comment


            • #7
              >> true, but you should look at the case d.k is NULL (resulting from the LEFT JOIN).
              ... but how can optimizer know that result from LEFT JOIN has any d.k value equal NULL? I think that optimizer is rule/cost based. => I understand less and less .

              Can you, please, write me (in detail [or show me any command/tool]) how query from documentation example "select * from a join b left join c on (c.k = a.k) left join d on (d.k=a.k) where b.k=d.k" is processed by planner/optimizer and why "b.k=d.k" tell to optimizer that LEFT JOIN should be converted to CROSS JOIN? Thank you very much.

              Comment


              • #8
                It's not converted to a CROSS join, but an INNER join. That is because, if either b.k or d.k is NULL, then this will not be true[1]:

                ... where b.k=d.k

                Therefore, the join between b and d is converted to an INNER join, because the WHERE clause is going to eliminate any rows in b that have no matching rows in d, which is the same thing as an INNER join.

                Try thinking of this visually with a simpler query: SELECT b.k, d.k FROM b LEFT JOIN d USING(k);

                Now imagine that you have the values 1, 2, 3 in b, and 1, 2 in d. The result will be:


                b.k d.k 1 1 2 2 3 NULL


                Now add a WHERE clause: SELECT b.k, d.k FROM b LEFT JOIN d USING(k) WHERE b.k = d.k;

                The result will be:


                b.k d.k 1 1 2 2


                That's the same thing as an INNER join. The MySQL optimizer is smart enough to recognize the logical equivalence and rewrite the query so that it'll consider joining the tables in the order d, b if there is a better cost estimate for that plan.

                [1] it won't be false either, it will be NULL.

                Comment


                • #9
                  thanks a lot

                  P.S.: man ref. 5.1-5.6 version "...MySQL does a full scan on b because the LEFT JOIN forces it to be read before d" will be changed?
                  P.S2.: I found very interesting 'toy' called OPTIMIZER_TRACE in version 5.6

                  Comment


                  • #10
                    xaprb, a cross join is an inner join

                    Comment


                    • #11
                      No, a cross join is the combination of ALL rows from the left table with ALL of the rows from the right table (not just matching rows). If you do a cross join of the two tables I showed, you will get 3x2 rows in the output. Try it. The cross join has no WHERE clause or USING or ON:

                      select * from b cross join d;

                      select * from b, d;

                      Comment


                      • #12
                        Do you know any DB server that treats a cross join with a where differently than an inner join with an on clause?

                        Comment


                        • #13
                          gmouse wrote on Tue, 20 December 2011 13:15
                          Do you know any DB server that treats a cross join with a where differently than an inner join with an on clause?
                          Since the definition of a cross join is "the Cartesian product of rows from tables in the join", it can't have either a where or an on condition.

                          An expression like:

                          SELECT a,b WHERE a.x = b.y

                          is not a cross join, it's an inner join in an older SQL syntax (which results in a cross join if the where condition is dropped, and that's the difference ).

                          Comment

                          Working...
                          X