September 16, 2014

Is there a performance difference between JOIN and WHERE?

I’ve heard this question a lot, but never thought to blog about the answer. “Is there a performance difference between putting the JOIN conditions in the ON clause or the WHERE clause in MySQL?”

No, there’s no difference. The following queries are algebraically equivalent inside MySQL and will have the same execution plan.

Personally, I prefer to stay away from “comma joins” because a) the ANSI join syntax is more expressive and you’re going to use it anyway for LEFT JOIN, and mixing styles is asking for trouble, so you might as well just use one style; b) I think ANSI style is clearer. But that’s just personal preference.

About Baron Schwartz

Baron is the lead author of High Performance MySQL.
He is a former Percona employee.

Comments

  1. Jestep says:

    I have to agree on the readability. I can’t tell you how many times I’ve gotten the “help me debug my query” to see 20 lines of SQL on 10 tables and every join is specified in the WHERE clause. Even if you know what you’re looking for it is messy and difficult to understand at a glance. Writing specific joins makes the code make sense quicker and makes it much easier to change in the future.

  2. Scott says:

    Don’t forget the difference between “ON A.ID = B.ID” and “USING(ID)” – the first will give you the columns from both tables and the second will give you only the coalesced result of the two (as of 5.0.12, anyway). The execution plan is the same, but the extra bytes going over the wire might add up to some difference if the result set is large enough.

  3. I’ve been advising people to adopt the SQL-92 JOIN syntax since… about 1994. It used to be a huge uphill battle to get people even to understand the syntax, and they remained unwilling to use it. Now, everyone recognizes it, and the few people who are still skeptical seem like dinosaurs. But every so often, I’m surprised by someone who says they actually prefer the Oracle proprietary outer join syntax.

  4. kas says:

    Bill: Well, count me as one. If you start your career using Oracle and then work mostly on your own person projects… Now that I’m on MySql I still have to look up that join syntax up on wikipedia! :) (+) seemed really natural :)

  5. popov130 says:

    You’d better publish some time score on a simple DB test ?!

  6. Jeffrey Gilbert says:

    Simple db or complex db. I’d think the only place you’d see a difference would be where the isolated logic in the ANSI syntax saved mysql from doing something you didn’t intend for it to do, like join in the wrong place.

  7. If you think there is a difference, then benchmark it, but I’ve read the source code and I assure you there will be no difference. You can save some time by running EXPLAIN EXTENDED, then SHOW WARNINGS, and examine the reconstructed query. It will be the same.

  8. peter says:

    I would also note the “coma join” is equivalent to JOIN not “LEFT JOIN” I see a lot of people use LEFT JOIN with no reason and this does restrict execution plan choices.

  9. Salle says:

    Being one of the dinosaurs who prefer comma syntax I’d like to say few words on that.

    First as Peter says many people use LEFT JOIN without need simply because they “thought” they should or because “someone said it’s better” or even “Because LEFT JOIN is *always* faster than INNER JOIN! Period!” (I argued once with someone who was claiming that and his “proof” was also “they told me”). Using outer joins without need is rare mistake among people who start learning joins with comma syntax. When comma syntax is the first you are introduced to your brain establishes clear distinction between it and other types of joins. They become too different and its difficult to forget that difference. It’s opposite with people who learn Everything-About-SQL-Within-Two-Hours and their first touch with the term “join” is full list of all possible join types before even they come to the idea why joins are necessary at all. (Inner, outer, natural .. who cares? They are all the same aren’t they?)

    Next – English is not my native language. Therefore the SQL reserved word JOIN doesn’t map to any natural language word as it would be the case if I was using “join” in everyday language. To me JOIN is strictly a term from Relational Algebra, Relational Model, SQL, you name it. So is , the comma though. In my eyes “A JOIN B” and “A,B” are nothing more than notations in particular language and both notations are equivalent.

    Yes it’s true that comma syntax makes it easier to write unreadable code, but the issue about readability is bit exaggerated. Whether some code is readable or not depends mostly on writer and not so much on the language syntax. I have seen a lot of easy to read and understand perl scripts for example and lot of completely unreadable sources in languages supposedly much easier to read. Similarly one-line query which joins 15 tables can be very difficult to read with JOIN .. ON .. syntax while the same query written on multiple lines with visually separated join conditions and filtering conditions using comma syntax can be much easier to read.

    To me (and don’t forget I am dinosaur) JOIN .. ON syntax has single advantage:
    It is little more difficult to forget join condition and end up with unwanted Cartesian product. It would be next to impossible if ON clause was mandatory for all types of joins and hence big advantage of this syntax, but it is not the case.

    Here are perfectly valid syntax examples:

    SELECT * FROM A JOIN B INNER JOIN C INNER JOIN D JOIN E;

    SELECT * FROM A JOIN B JOIN C JOIN D JOIN E ON (A.id = C.id) WHERE D.id = B.id;

  10. @Salle: You can write unclear code in almost any programming languages by formatting it all on a single line. This is not a recommended habit in any language.

  11. Salle says:

    Exactly my point Bill. You can write unclear code no matter whether it’s on single or multiple lines. If you don’t care about readability the language per se doesn’t help. Valid for human languages too not only programming ones.

  12. Rob Wultsch says:

    Salle:
    The issue about readability is not exaggerated. I spend a lot of my time looking at other peoples queries and I have regex to add white space for readability. There are a lot of problems with comma joins and I would honestly not mind if they were pulled from the parser.

    1. The behavior has changed significantly between 4.X and when 5.0. Read “Join Processing Changes in MySQL 5.0.12″ on http://dev.mysql.com/doc/refman/5.0/en/join.html . This makes queries written with “comma joins” quite fragile.

    2. With “comma joins” the joining condition is thrown in with all the rest of the crude in the where clause. This means that on a complicated query with lots of table it is much more difficult to find the joining condition.

  13. Salle says:

    Rob:
    You are missing the fact that JOIN .. ON .. syntax does *not* prevent having join conditions buried somewhere in WHERE clause in case of inner joins.

    My point is that if you don’t care about readability this syntax does not help.

    Consider this deliberately bad, but syntactically valid example:

    SELECT *
    FROM
    A JOIN
    B JOIN
    C ON (B.id = C.id) JOIN
    D ON (A.id = C.id AND D.id = B.id) JOIN
    E
    WHERE
    E.id = A.id AND
    A.x = ‘foo’ AND
    C.id = D.id AND
    D.y = ‘bar’;

    Is it more readable than equally bad style below?

    SELECT *
    FROM A, B, C, D, E
    WHERE
    B.id = C.id AND
    A.id = C.id AND
    D.id = B.id AND
    E.id = A.id AND
    A.x = ‘foo’ AND
    C.id = D.id AND
    D.y = ‘bar’;

    One can even argue the latter is actually easier to understand on the grounds that it is shorter. Both examples are deliberately bad ones, but in practice we often see queries much worst than that.

  14. It is not just about readability. I judge the logical thinking of the programmer by looking at the style of writing joins or queries in general. Comma joins mean that the guy is not as good as I expect. There are times when I need to change a inner join to left join and it is very difficult to explain the change to “those” programmers.

  15. dalin says:

    @Salle,
    You can make both formats unreadable, no problem there. However if you want to make the most readable code possible, it’s hard to argue that comma syntax can be made to be more readable than ON syntax.

  16. Aingel says:

    To mix up more on the confusion, it’s basically not about readability or syntax construction but rather the goal of the process.

    The comma syntax comes easy on such:
    SELECT * FROM A,B WHERE A.id = B.id and A.x=123;

    But, the join syntax will help it to be more flexible when necessary. Consider this for instance:
    SELECT * FROM A INNER JOIN B ON A.id = B.id WHERE A.x=123

    Ok, as Baron said, the two should be basically the same – in execution. So, if you need to adjust the query such that limitations on either sides of the tables should be in-place, the JOIN is more preferred:
    SELECT * FROM A LEFT OUTER JOIN B ON A.id=B.id WHERE A.x=123;

    So in turn, the comma syntax will have to re-code the whole structure and adopt the join syntax instead.

  17. Salle says:

    Sorry, but the point that INNER JOIN might need to be rewritten to OUTER JOIN at some point is the weakest argument possible. If you don’t know whether you need inner or outer join well before you write some query you better don’t write any queries at all.

  18. @Salle: I disagree. It’s common to write an SQL query for a report e.g. of customer actvity, and then your boss says, “okay now show all customers, including those who have no activity.” Or another example: “include all customers you had before, but restrict the totals to their activity during a certain time span.” Those are both realistic examples of when you’d change an inner join to an outer join for a given query.

  19. Salle says:

    If you have to do such changes dictated by your boss after the application is launched you failed to do your job at the time the specifications of the application were defined which only proves the point: If you don’t know whether you need inner or outer join at the very beginning you better don’t write any queries at all.

  20. Joshua says:

    @Salle, requirements/needs change all the time, post-launch. Welcome to the real world? Bill’s example is the kind of thing that happens ALL the time. But I guess in those cases you would just call your boss an ass for not figuring out what he needed at the planning stages of the project????

  21. Chris says:

    Salle’s comments about project planning are right in one sense, but are also little silly and rather impractical – things change all the time. Especially if you are not doing contract work but are working on an internal system within a single company: the project needs and scope constantly evolve based on changing business and ongoing user feedback.

    That said, I also agree with him about comma syntax. I prefer it myself. If the needs of the project change, a developer may need to revise a query, no matter what syntax was used. If one prefers comma syntax, forcing an unfamiliar or non-preferred syntax on your code will in fact make it harder for that given developer to change it in the future. Given my own preference for comma syntax, it is easier for -me- to read, understand, and change such a query than one using JOIN. It isn’t that I don’t understand JOIN queries, or that I don’t know how to use them; comma syntax comes more naturally to me, and is more readily parsed by my logic. So that’s what I write.

    Do what you’re comfortable with, as long as there isn’t a compelling reason (performance or otherwise) to do it a different way. If the code accomplishes it’s aims and is able to be maintained, then you’re doing it “right”.

  22. Robert says:

    I want to update a table with a value that has no apparent relationship with the table containing that value (no foreign key relationship)

    e.g.
    UPDATE table_1 a, table_2 b
    SET a.value = b.value
    WHERE a.id = and b.id =

    In mysql there are three ways to do this, but which one would be performing best considering the first table to be huge (100 thousands of records), the second table to be small (a few hundreds of records)

    1) no join, and both ids in where clause

    as stated above

    2) join, one id in ON clause, the other id in where clause

    UPDATE table_1 a INNER JOIN table_2 b ON b.id =
    SET a.value = b.value
    WHERE a.id =

    3) join, both ids in ON clause

    UPDATE table_1 a INNER JOIN table_2 b ON a.id = AND b.id =
    SET a.value = b.value

  23. Robert says:

    I want to update a table with a value that has no apparent relationship with the table containing that value (no foreign key relationship)

    e.g.
    UPDATE table_1 a, table_2 b
    SET a.value = b.value
    WHERE a.id = and b.id =

    In mysql there are three ways to do this, but which one would be performing best considering the first table to be huge (100 thousands of records), the second table to be small (hundreds of records)

    1) no join, and both ids in where clause

    as stated above

    2) join, id of table to be updated in ON clause, the other id in where clause

    UPDATE table_1 a INNER JOIN table_2 b ON b.id =
    SET a.value = b.value
    WHERE a.id =

    3) join, both ids in ON clause

    UPDATE table_1 a INNER JOIN table_2 b ON a.id = AND b.id =
    SET a.value = b.value

  24. dasmeer says:

    Salle is right – join in where rules!
    +1!

  25. Jerome says:

    Hi there,
    I am using a LEFT JOIN and i was wondering if having many condition –
    ie. SELECT * FROM table1 LEFT JOIN table2 ON table2.x=table1.y AND table2.b=table1.a

    can i add many conditions or rather leaving it to max one?

    Please help or advise.
    Thanks a bunch
    Jerome

  26. Jerome, yes, you can use complex expressions in the ON clause. Virtually any expression that would work in a WHERE clause is okay for an ON clause.

    Tip: you can also use tuple comparison. For example, the query you give could be equivalently stated:

    SELECT * FROM table1 LEFT JOIN table2 ON (table2.x, table2.b) = (table1.y, table1.a)

    This can help readability, and can help indicate good places for compound indexes.

  27. Jerome says:

    Hi Bill,
    Thanks a lot for helping me.
    I have been checking and trying to optimize my queries and when it comes to JOIN more than a 2 tables, i am not sure what to do.
    I thought having too much in the ON clause might slow down the query.
    Do you have any other resource you can point me.
    Thanks again for your help.

  28. Jerome,

    You can optimizing joins by choosing proper indexes, just like searching in the WHERE clause.
    Check out some recent Percona webinars on designing indexes:

    http://www.percona.com/webinars/tools-and-techniques-index-design
    http://www.percona.com/webinars/2012-08-15-mysql-indexing-best-practices

  29. S says:

    Joshua,

    The point is partially that the boss will, if they’re competent, know the business requirements during the planning phase — and if you’re competent you’ll be able to articulate those requirements into code.

    If your boss is wildly deviating what he needs on a continual basis, he’s either incompetent, lazy, abusive, or just plain stupid.

    Yes, there are exceptions to that rule — extenuating circumstances and all that — but it shouldn’t be a daily (or even weekly) occurrence.

  30. Rob Cook says:

    sql joins

Speak Your Mind

*