The previous post in the EXPLAIN FORMAT=JSON is Cool! series showed an example of the query
select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null), where the subquery was materialized into a temporary table and then joined with the outer query. This is known as a semi-join optimization. But what happens if we turn off this optimization?
EXPLAIN FORMAT=JSON can help us with this investigation too.
First lets look at the original output again:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
mysql> explain format=json select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null)G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "16.72" }, "nested_loop": [ { "table": { "table_name": "departments", <skipped> }, { "table": { "table_name": "<subquery2>", "access_type": "eq_ref", "key": "<auto_key>", "key_length": "4", "ref": [ "employees.departments.dept_no" ], "rows_examined_per_scan": 1, "materialized_from_subquery": { "using_temporary_table": true, "query_block": { "table": { "table_name": "dept_manager", "access_type": "ALL", "possible_keys": [ "dept_no" ], "rows_examined_per_scan": 24, "rows_produced_per_join": 21, "filtered": "90.00", "cost_info": { "read_cost": "1.48", "eval_cost": "4.32", "prefix_cost": "5.80", "data_read_per_join": "345" }, "used_columns": [ "dept_no", "to_date" ], "attached_condition": "(`employees`.`dept_manager`.`to_date` is not null)" } } } } } ] } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` semi join (`employees`.`dept_manager`) where ((`<subquery2>`.`dept_no` = `employees`.`departments`.`dept_no`) and (`employees`.`dept_manager`.`to_date` is not null)) |
To repeat what happened here: the subquery was materialized into a temporary table, then joined with the departments table. Semi-join optimization is ON by default (as would be most likely without intervention).
What happens if we temporarily turn semi-join optimization OFF?
1 2 |
mysql> set optimizer_switch="semijoin=off"; Query OK, 0 rows affected (0.00 sec) |
And then execute EXPLAIN one more time:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
mysql> explain format=json select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null) G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "2.80" }, "table": { "table_name": "departments", "access_type": "index", "key": "dept_name", "used_key_parts": [ "dept_name" ], "key_length": "42", "rows_examined_per_scan": 9, "rows_produced_per_join": 9, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "1.00", "eval_cost": "1.80", "prefix_cost": "2.80", "data_read_per_join": "432" }, "used_columns": [ "dept_no", "dept_name" ], "attached_condition": "<in_optimizer>(`employees`.`departments`.`dept_no`,`employees`.`departments`.`dept_no` in ( <materialize> (/* select#2 */ select `employees`.`dept_manager`.`dept_no` from `employees`.`dept_manager` where (`employees`.`dept_manager`.`to_date` is not null) ), <primary_index_lookup>(`employees`.`departments`.`dept_no` in <temporary table> on <auto_key> where ((`employees`.`departments`.`dept_no` = `materialized-subquery`.`dept_no`)))))", "attached_subqueries": [ { "table": { "table_name": "<materialized_subquery>", "access_type": "eq_ref", "key": "<auto_key>", "key_length": "4", "rows_examined_per_scan": 1, "materialized_from_subquery": { "using_temporary_table": true, "dependent": true, "cacheable": false, "query_block": { "select_id": 2, "cost_info": { "query_cost": "5.80" }, "table": { "table_name": "dept_manager", "access_type": "ALL", "possible_keys": [ "dept_no" ], "rows_examined_per_scan": 24, "rows_produced_per_join": 21, "filtered": "90.00", "cost_info": { "read_cost": "1.48", "eval_cost": "4.32", "prefix_cost": "5.80", "data_read_per_join": "345" }, "used_columns": [ "dept_no", "to_date" ], "attached_condition": "(`employees`.`dept_manager`.`to_date` is not null)" } } } } } ] } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` where <in_optimizer>(`employees`.`departments`.`dept_no`,`employees`.`departments`.`dept_no` in ( <materialize> (/* select#2 */ select `employees`.`dept_manager`.`dept_no` from `employees`.`dept_manager` where (`employees`.`dept_manager`.`to_date` is not null) ), <primary_index_lookup>(`employees`.`departments`.`dept_no` in <temporary table> on <auto_key> where ((`employees`.`departments`.`dept_no` = `materialized-subquery`.`dept_no`))))) |
Now the picture is completely different. There is no nested_loop member, and instead there is an attached_subqueries array containing a single member: the temporary table materialized from the subquery select dept_no from dept_manager where to_date is not null (including all the details of this materialization).
Conclusion: We can experiment with the value of optimizer_switch and use EXPLAIN FORMAT=JSON to examine how a particular optimization affects our queries.