EXPLAIN FORMAT=JSON: nested_loop makes JOIN hierarchy transparent

EXPLAIN FORMAT=JSONOnce again it’s time for another EXPLAIN FORMAT=JSON is cool! post. This post will discuss how EXPLAIN FORMAT=JSON allows the nested_loop command to make the JOIN operation hierarchy transparent.

The regular EXPLAIN  command lists each table that participates in a  JOIN  operation on a single row. This works perfectly for simple queries:

You can see that the first accessed table was employees, then titles  and finally  salaries. Everything is clear.

EXPLAIN FORMAT=JSON in this case puts everything into the  nested_loop array (even if “MySQL isn’t limited to nested-loop joins”):

For a simple query this output does not add much. Except cost info and information about used columns and efficiency of composite indexes.

But what if you not only join tables, but use the other SQL language options? For example, for the query below, which has two JOIN operations and two subqueries, a regular EXPLAIN returns this plan:

It’s pretty hard to understand which part is a subquery and which is not. It’s also it is hard to find out if DERIVED belongs to the first JOIN or to the second. And I am not quite sure why <subquery2>  was marked as PRIMARY, which is supposed to indicate “Outermost SELECT”.

The real issue here is that the internal representation of JOIN is hierarchical, and MySQL Server (like in the case for UNION) has trouble representing an object as a “flat” table. EXPLAIN FORMAT=JSON, with its hierarchical nature, can help us in this case.

At first we see that all our tables, JOIN operations and subqueries are in the nested_loop array:

Then we see that the first table, <subquery2>, was materialized_from_subquery:

Which, in its turn, was materialized_from_subquery too:

This last subquery performs grouping_operation  on the other  nested_loop  ( JOIN) of tables titles  and salaries:

Now we have a better picture of how the query was optimized: tables titles and salaries  were joined first, then GROUP BY was executed on the result, then the result was materialized and queried. The result of the query select emp_no from <materialized> t  was materialized again as <subquery2>, and only after it joined with two other tables.

Conclusion: EXPLAIN FORMAT=JSON  helps to understand how complex queries are optimized.

Share this post