EXPLAIN FORMAT=JSON
The regular EXPLAIN command already provides some information about subquery optimization. For example, you can find out if the subquery is dependent or not, and (since version 5.6) if it was materialized:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
mysql> explain select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null)G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: departments partitions: NULL type: index possible_keys: PRIMARY key: dept_name key_len: 42 ref: NULL rows: 9 filtered: 100.00 Extra: Using where; Using index *************************** 2. row *************************** id: 1 select_type: SIMPLE table: <subquery2> partitions: NULL type: eq_ref possible_keys: <auto_key> key: <auto_key> key_len: 4 ref: employees.departments.dept_no rows: 1 filtered: 100.00 Extra: NULL *************************** 3. row *************************** id: 2 select_type: MATERIALIZED table: dept_manager partitions: NULL type: ALL possible_keys: dept_no key: NULL key_len: NULL ref: NULL rows: 24 filtered: 90.00 Extra: Using where 3 rows in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` semi join (`employees`.`dept_manager`) where ((`<subquery2>`.`dept_no` = `employees`.`departments`.`dept_no`) and (`employees`.`dept_manager`.`to_date` is not null)) |
However, you can’t find details on exactly how this subquery was materialized. To find out more, use EXPLAIN FORMAT=JSON:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
mysql> explain format=json select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null)G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "16.72" }, "nested_loop": [ { "table": { "table_name": "departments", <skipped> }, { "table": { "table_name": "<subquery2>", "access_type": "eq_ref", "key": "<auto_key>", "key_length": "4", "ref": [ "employees.departments.dept_no" ], "rows_examined_per_scan": 1, "materialized_from_subquery": { "using_temporary_table": true, "query_block": { "table": { "table_name": "dept_manager", "access_type": "ALL", "possible_keys": [ "dept_no" ], "rows_examined_per_scan": 24, "rows_produced_per_join": 21, "filtered": "90.00", "cost_info": { "read_cost": "1.48", "eval_cost": "4.32", "prefix_cost": "5.80", "data_read_per_join": "345" }, "used_columns": [ "dept_no", "to_date" ], "attached_condition": "(`employees`.`dept_manager`.`to_date` is not null)" } } } } } ] } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` semi join (`employees`.`dept_manager`) where ((`<subquery2>`.`dept_no` = `employees`.`departments`.`dept_no`) and (`employees`.`dept_manager`.`to_date` is not null)) |
We can see here that the new temporary table ( "using_temporary_table": true) with internal name <subquery2> was materilized from the subquery ( materialized_from_subquery). We also see how this new temporary table was accessed.
We also can see that the member that contains information about the materialized subquery is part of the nested_loop array. Nested loop optimization applies to a regular JOIN command. In this case, it means that the MySQL Server would join the temporary table with an outer query.
Another interesting case is optimized_away_subqueries: this command shows subqueries that were executed only once and were replaced by their result. A regular EXPLAIN command provides no information about such optimization:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
mysql> explain select emp_no, salary from salaries order by (select max(salary) from salaries)G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2838525 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 2 select_type: SUBQUERY table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2838525 filtered: 100.00 Extra: NULL 2 rows in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary` from `employees`.`salaries` order by (/* select#2 */ select max(`employees`.`salaries`.`salary`) from `employees`.`salaries`) |
We can see that in select_type: SUBQUERY, the output of SHOW WARNINGS doesn’t give us much insight either.
However, EXPLAIN FORMAT=JSON tells us how this subquery was optimized:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
mysql> explain format=json select emp_no, salary from salaries order by (select max(salary) from salaries)G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "573505.00" }, "ordering_operation": { "using_filesort": false, "table": { "table_name": "salaries", <skipped> "optimized_away_subqueries": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 2, "cost_info": { "query_cost": "573505.00" }, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2838525, "rows_produced_per_join": 2838525, "filtered": "100.00", "cost_info": { "read_cost": "5800.00", "eval_cost": "567705.00", "prefix_cost": "573505.00", "data_read_per_join": "43M" }, "used_columns": [ "salary" ] } } } ] } } } 1 row in set, 1 warning (0.00 sec) |
After seeing optimized_away_subqueries and "cacheable": true, we know that the subquery was executed only once, and the result was cached and used to resolve the outer query.
Another subquery type that I want to mention in this post is attached_subqueries – which covers subqueries that are not converted to JOIN.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
mysql> explain format=json select emp_no from salaries where salary > (select avg(salary) from salaries)G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "250601.60" }, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 1240668, "rows_produced_per_join": 413514, "filtered": "33.33", "cost_info": { "read_cost": "167898.67", "eval_cost": "82702.92", "prefix_cost": "250601.60", "data_read_per_join": "6M" }, "used_columns": [ "emp_no", "salary" ], "attached_condition": "(`employees`.`salaries`.`salary` > (/* select#2 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))", "attached_subqueries": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 2, "cost_info": { "query_cost": "250601.80" }, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 1240669, "rows_produced_per_join": 1240669, "filtered": "100.00", "cost_info": { "read_cost": "2468.00", "eval_cost": "248133.80", "prefix_cost": "250601.80", "data_read_per_join": "18M" }, "used_columns": [ "salary" ] } } } ] } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no` from `employees`.`salaries` where (`employees`.`salaries`.`salary` > (/* select#2 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`)) |
For these subqueries, using this option shows if they are dependent and/or cacheable. There is also a query_block member inside the attached_subqueries object which provides all the details on how the query was optimized.
Conclusion: EXPLAIN FORMAT=JSON provides some great details on subqueries optimization.
Sveta, could you point to a location where I can read a bit more about json format.
Other question – are there tools which helps to visualise json plan. I came from SQL Server background and there are number of great free tools to visualise and provides stats about execution plan.
One more – probably very ignorant question :). Am I right that produced plan is actually estimated and not the actual plan?
In SQL Server you can get both but actual is much more useful because you get the real and not estimated numbers
— I’m answering on behalf of Sveta while we sort out commenting on these older posts.
— Bear with us!
Hi Leonid,
> Sveta, could you point to a location where I can read a bit more about json format.
JSON format is defined at json.org
> Other question – are there tools which helps to visualise json plan. I came from SQL Server background and there are number of great free tools to visualise and provides stats about execution plan.
MySQL Workbench (https://www.mysql.com/products/workbench/) has visual EXPLAIN feature and uses EXPLAIN FORMAT=JSON as underlying source for it.
> One more – probably very ignorant question :). Am I right that produced plan is actually estimated and not the actual plan?
> In SQL Server you can get both but actual is much more useful because you get the real and not estimated numbers
Exactly: only estimates. If you want to find information about actual produced plan you have few options:
– Use table optimizer_trace in the information_schema: https://dev.mysql.com/doc/refman/5.7/en/optimizer-trace-table.html
– Use EXPLAIN FOR CONNECTION: https://dev.mysql.com/doc/refman/5.7/en/explain.html While this is still “normal” EXPLAIN once applied to the running query it may show actually executed plan
– Use ANALYZE in MariaDB: https://mariadb.com/kb/en/library/analyze-statement/
You can also examine values of status variables those names are starting with ‘Handler_’.
You may find some information in this webinar: https://www.percona.com/blog/2016/05/20/introduction-troubleshooting-performance-troubleshooting-slow-queries-webinar-q/