The previous post in the EXPLAIN FORMAT=JSON is Cool! series showed an example of the query
select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null), where the subquery was materialized into a temporary table and then joined with the outer query. This is known as a semi-join optimization. But what happens if we turn off this optimization?
EXPLAIN FORMAT=JSON can help us with this investigation too.
First lets look at the original output again:
|
1 |
mysql> explain format=json select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null)G<br>*************************** 1. row ***************************<br>EXPLAIN: {<br> "query_block": {<br> "select_id": 1,<br> "cost_info": {<br> "query_cost": "16.72"<br> },<br> "nested_loop": [<br> {<br> "table": {<br> "table_name": "departments",<br> <skipped><br> },<br> {<br> "table": {<br> "table_name": "<subquery2>",<br> "access_type": "eq_ref",<br> "key": "<auto_key>",<br> "key_length": "4",<br> "ref": [<br> "employees.departments.dept_no"<br> ],<br> "rows_examined_per_scan": 1,<br> "materialized_from_subquery": {<br> "using_temporary_table": true,<br> "query_block": {<br> "table": {<br> "table_name": "dept_manager",<br> "access_type": "ALL",<br> "possible_keys": [<br> "dept_no"<br> ],<br> "rows_examined_per_scan": 24,<br> "rows_produced_per_join": 21,<br> "filtered": "90.00",<br> "cost_info": {<br> "read_cost": "1.48",<br> "eval_cost": "4.32",<br> "prefix_cost": "5.80",<br> "data_read_per_join": "345"<br> },<br> "used_columns": [<br> "dept_no",<br> "to_date"<br> ],<br> "attached_condition": "(`employees`.`dept_manager`.`to_date` is not null)"<br> }<br> }<br> }<br> }<br> }<br> ]<br> }<br>}<br>1 row in set, 1 warning (0.00 sec)<br>Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` semi join (`employees`.`dept_manager`) where ((`<subquery2>`.`dept_no` = `employees`.`departments`.`dept_no`) and (`employees`.`dept_manager`.`to_date` is not null)) |
To repeat what happened here: the subquery was materialized into a temporary table, then joined with the departments table. Semi-join optimization is ON by default (as would be most likely without intervention).
What happens if we temporarily turn semi-join optimization OFF?
|
1 |
mysql> set optimizer_switch="semijoin=off";<br>Query OK, 0 rows affected (0.00 sec) |
And then execute EXPLAIN one more time:
|
1 |
mysql> explain format=json select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null) G<br>*************************** 1. row ***************************<br>EXPLAIN: {<br> "query_block": {<br> "select_id": 1,<br> "cost_info": {<br> "query_cost": "2.80"<br> },<br> "table": {<br> "table_name": "departments",<br> "access_type": "index",<br> "key": "dept_name",<br> "used_key_parts": [<br> "dept_name"<br> ],<br> "key_length": "42",<br> "rows_examined_per_scan": 9,<br> "rows_produced_per_join": 9,<br> "filtered": "100.00",<br> "using_index": true,<br> "cost_info": {<br> "read_cost": "1.00",<br> "eval_cost": "1.80",<br> "prefix_cost": "2.80",<br> "data_read_per_join": "432"<br> },<br> "used_columns": [<br> "dept_no",<br> "dept_name"<br> ],<br> "attached_condition": "<in_optimizer>(`employees`.`departments`.`dept_no`,`employees`.`departments`.`dept_no` in ( <materialize> (/* select#2 */ select `employees`.`dept_manager`.`dept_no` from `employees`.`dept_manager` where (`employees`.`dept_manager`.`to_date` is not null) ), <primary_index_lookup>(`employees`.`departments`.`dept_no` in <temporary table> on <auto_key> where ((`employees`.`departments`.`dept_no` = `materialized-subquery`.`dept_no`)))))",<br> "attached_subqueries": [<br> {<br> "table": {<br> "table_name": "<materialized_subquery>",<br> "access_type": "eq_ref",<br> "key": "<auto_key>",<br> "key_length": "4",<br> "rows_examined_per_scan": 1,<br> "materialized_from_subquery": {<br> "using_temporary_table": true,<br> "dependent": true,<br> "cacheable": false,<br> "query_block": {<br> "select_id": 2,<br> "cost_info": {<br> "query_cost": "5.80"<br> },<br> "table": {<br> "table_name": "dept_manager",<br> "access_type": "ALL",<br> "possible_keys": [<br> "dept_no"<br> ],<br> "rows_examined_per_scan": 24,<br> "rows_produced_per_join": 21,<br> "filtered": "90.00",<br> "cost_info": {<br> "read_cost": "1.48",<br> "eval_cost": "4.32",<br> "prefix_cost": "5.80",<br> "data_read_per_join": "345"<br> },<br> "used_columns": [<br> "dept_no",<br> "to_date"<br> ],<br> "attached_condition": "(`employees`.`dept_manager`.`to_date` is not null)"<br> }<br> }<br> }<br> }<br> }<br> ]<br> }<br> }<br>}<br>1 row in set, 1 warning (0.00 sec)<br><br>Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` where <in_optimizer>(`employees`.`departments`.`dept_no`,`employees`.`departments`.`dept_no` in ( <materialize> (/* select#2 */ select `employees`.`dept_manager`.`dept_no` from `employees`.`dept_manager` where (`employees`.`dept_manager`.`to_date` is not null) ), <primary_index_lookup>(`employees`.`departments`.`dept_no` in <temporary table> on <auto_key> where ((`employees`.`departments`.`dept_no` = `materialized-subquery`.`dept_no`))))) |
Now the picture is completely different. There is no nested_loop member, and instead there is an attached_subqueries array containing a single member: the temporary table materialized from the subquery select dept_no from dept_manager where to_date is not null (including all the details of this materialization).
Conclusion: We can experiment with the value of optimizer_switch and use EXPLAIN FORMAT=JSON to examine how a particular optimization affects our queries.
Resources
RELATED POSTS