Once again it’s time for another EXPLAIN FORMAT=JSON is cool! post. This post will discuss how EXPLAIN FORMAT=JSON allows the nested_loop command to make the JOIN operation hierarchy transparent.
The regular EXPLAIN command lists each table that participates in a JOIN operation on a single row. This works perfectly for simple queries:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
mysql> explain select * from employees join titles join salariesG *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 299379 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: SIMPLE table: titles partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 442724 filtered: 100.00 Extra: Using join buffer (Block Nested Loop) *************************** 3. row *************************** id: 1 select_type: SIMPLE table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2745434 filtered: 100.00 Extra: Using join buffer (Block Nested Loop) 3 rows in set, 1 warning (0.00 sec) |
You can see that the first accessed table was employees, then titles and finally salaries. Everything is clear.
EXPLAIN FORMAT=JSON in this case puts everything into the nested_loop array (even if “MySQL isn’t limited to nested-loop joins”):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
mysql> explain format=json select * from employees join titles join salariesG *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "7.277755124e16" }, "nested_loop": [ { "table": { "table_name": "employees", "access_type": "ALL", "rows_examined_per_scan": 299379, "rows_produced_per_join": 299379, "filtered": "100.00", "cost_info": { "read_cost": "929.00", "eval_cost": "59875.80", "prefix_cost": "60804.80", "data_read_per_join": "13M" }, "used_columns": [ "emp_no", "birth_date", "first_name", "last_name", "gender", "hire_date" ] } }, { "table": { "table_name": "titles", "access_type": "ALL", "rows_examined_per_scan": 442724, "rows_produced_per_join": 132542268396, "filtered": "100.00", "using_join_buffer": "Block Nested Loop", "cost_info": { "read_cost": "62734.88", "eval_cost": "26508453679.20", "prefix_cost": "26508577218.88", "data_read_per_join": "7T" }, "used_columns": [ "emp_no", "title", "from_date", "to_date" ] } }, { "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2745434, "rows_produced_per_join": 363886050091503872, "filtered": "100.00", "using_join_buffer": "Block Nested Loop", "cost_info": { "read_cost": "314711040856.92", "eval_cost": "7.277721002e16", "prefix_cost": "7.277755124e16", "data_read_per_join": "5171P" }, "used_columns": [ "emp_no", "salary", "from_date", "to_date" ] } } ] } } 1 row in set, 1 warning (0.00 sec) |
For a simple query this output does not add much. Except cost info and information about used columns and efficiency of composite indexes.
But what if you not only join tables, but use the other SQL language options? For example, for the query below, which has two JOIN operations and two subqueries, a regular EXPLAIN returns this plan:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
mysql> explain select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <subquery2> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: PRIMARY table: dept_manager partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: <subquery2>.emp_no rows: 1 filtered: 100.00 Extra: NULL *************************** 3. row *************************** id: 1 select_type: PRIMARY table: employees partitions: NULL type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: <subquery2>.emp_no rows: 1 filtered: 100.00 Extra: NULL *************************** 4. row *************************** id: 2 select_type: MATERIALIZED table: <derived3> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 9 filtered: 100.00 Extra: NULL *************************** 5. row *************************** id: 3 select_type: DERIVED table: titles partitions: NULL type: index possible_keys: PRIMARY,emp_no key: emp_no key_len: 4 ref: NULL rows: 442724 filtered: 7.51 Extra: Using where; Using index; Using temporary; Using filesort; LooseScan *************************** 6. row *************************** id: 3 select_type: DERIVED table: salaries partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: employees.titles.emp_no rows: 9 filtered: 100.00 Extra: NULL 6 rows in set, 1 warning (0.00 sec) |
It’s pretty hard to understand which part is a subquery and which is not. It’s also it is hard to find out if DERIVED belongs to the first JOIN or to the second. And I am not quite sure why <subquery2> was marked as PRIMARY, which is supposed to indicate “Outermost SELECT”.
The real issue here is that the internal representation of JOIN is hierarchical, and MySQL Server (like in the case for UNION) has trouble representing an object as a “flat” table. EXPLAIN FORMAT=JSON, with its hierarchical nature, can help us in this case.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
mysql> explain format=json select * from employees join dept_manager using (emp_no) where emp_no in (select emp_no from (select emp_no, salary from salaries where emp_no in (select emp_no from titles where title like '%manager%') group by emp_no, salary having salary > avg(salary) ) t )G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "39.45" }, "nested_loop": [ { "table": { "table_name": "<subquery2>", "access_type": "ALL", "materialized_from_subquery": { "using_temporary_table": true, "query_block": { "table": { "table_name": "t", "access_type": "ALL", "rows_examined_per_scan": 9, "rows_produced_per_join": 9, "filtered": "100.00", "cost_info": { "read_cost": "10.45", "eval_cost": "1.80", "prefix_cost": "12.25", "data_read_per_join": "144" }, "used_columns": [ "emp_no", "salary" ], "materialized_from_subquery": { "using_temporary_table": true, "dependent": false, "cacheable": true, "query_block": { "select_id": 3, "cost_info": { "query_cost": "176246.11" }, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "9.54" }, "nested_loop": [ { "table": { "table_name": "titles", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "emp_no", "used_key_parts": [ "emp_no" ], "key_length": "4", "rows_examined_per_scan": 442724, "rows_produced_per_join": 33229, "filtered": "7.51", "using_index": true, "loosescan": true, "cost_info": { "read_cost": "3380.56", "eval_cost": "6645.94", "prefix_cost": "63199.96", "data_read_per_join": "2M" }, "used_columns": [ "emp_no", "title", "from_date" ], "attached_condition": "(`employees`.`titles`.`title` like '%manager%')" } }, { "table": { "table_name": "salaries", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.titles.emp_no" ], "rows_examined_per_scan": 9, "rows_produced_per_join": 9, "filtered": "100.00", "cost_info": { "read_cost": "49622.62", "eval_cost": "1.91", "prefix_cost": "176236.57", "data_read_per_join": "152" }, "used_columns": [ "emp_no", "salary", "from_date" ] } } ] } } } } } } } }, { "table": { "table_name": "dept_manager", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "<subquery2>.emp_no" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 9, "filtered": "100.00", "cost_info": { "read_cost": "9.00", "eval_cost": "1.80", "prefix_cost": "23.05", "data_read_per_join": "144" }, "used_columns": [ "dept_no", "emp_no", "from_date", "to_date" ] } }, { "table": { "table_name": "employees", "access_type": "eq_ref", "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "<subquery2>.emp_no" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 1, "filtered": "100.00", "cost_info": { "read_cost": "1.00", "eval_cost": "0.20", "prefix_cost": "39.45", "data_read_per_join": "48" }, "used_columns": [ "emp_no", "birth_date", "first_name", "last_name", "gender", "hire_date" ] } } ] } } 1 row in set, 1 warning (0.01 sec) |
At first we see that all our tables, JOIN operations and subqueries are in the nested_loop array:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
"nested_loop": [ { "table": { "table_name": "<subquery2>", ... { "table": { "table_name": "dept_manager", ... { "table": { "table_name": "employees", ... } ] |
Then we see that the first table, <subquery2>, was materialized_from_subquery:
1 2 3 4 5 |
"table": { "table_name": "<subquery2>", "access_type": "ALL", "materialized_from_subquery": { ... |
Which, in its turn, was materialized_from_subquery too:
1 2 3 4 5 6 7 |
"table": { "table_name": "<subquery2>", "access_type": "ALL", "materialized_from_subquery": { ... "materialized_from_subquery": { ... |
This last subquery performs grouping_operation on the other nested_loop ( JOIN) of tables titles and salaries:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
"grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "9.54" }, "nested_loop": [ { "table": { "table_name": "titles", ... }, { "table": { "table_name": "salaries", ... |
Now we have a better picture of how the query was optimized: tables titles and salaries were joined first, then GROUP BY was executed on the result, then the result was materialized and queried. The result of the query select emp_no from <materialized> t was materialized again as <subquery2>, and only after it joined with two other tables.
Conclusion: EXPLAIN FORMAT=JSON helps to understand how complex queries are optimized.