Another post in the EXPLAIN FORMAT=JSON is Cool! series! In this post, we’ll discuss how the EXPLAIN FORMAT=JSON provides optimization details for
ORDER BY and
GROUP BY operations in conjunction with
order_by_subqueries and
group_by_subqueries.
EXPLAIN FORMAT=JSON can print details on how a subquery in ORDER BY is optimized:
|
1 |
mysql> explain format=json select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select emp_no limit 1)G<br>*************************** 1. row ***************************<br>EXPLAIN: {<br> "query_block": {<br> "select_id": 1,<br> "cost_info": {<br> "query_cost": "60833.60"<br> },<br> "ordering_operation": {<br> "using_filesort": true,<br> "table": {<br> "table_name": "employees",<br> "access_type": "ALL",<br> "rows_examined_per_scan": 299843,<br> "rows_produced_per_join": 299843,<br> "filtered": "100.00",<br> "cost_info": {<br> "read_cost": "865.00",<br> "eval_cost": "59968.60",<br> "prefix_cost": "60833.60",<br> "data_read_per_join": "13M"<br> },<br> "used_columns": [<br> "emp_no",<br> "first_name",<br> "last_name"<br> ]<br> },<br> "order_by_subqueries": [<br> {<br> "dependent": true,<br> "cacheable": false,<br> "query_block": {<br> "select_id": 2,<br> "message": "No tables used"<br> }<br> }<br> ]<br> }<br> }<br>}<br>1 row in set, 2 warnings (0.00 sec)<br><br>Note (Code 1276): Field or reference 'employees.employees.emp_no' of SELECT #2 was resolved in SELECT #1<br>Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`employees`.`emp_no` limit 1) |
The above code shows member ordering_operation of query_block (which includes the order_by_subqueries array) with information on how the subquery in ORDER BY was optimized.
This is a simple example. In real life you can have larger subqueries in the ORDER BY clause. For example, take this more complicated and slightly crazy query:
|
1 |
select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no) |
Run a regular EXPLAIN on it. If we imagine this is a regular subquery, we won’t know if it can be cached or would be executed for each row sorted.
|
1 |
mysql> explain select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: PRIMARY<br> table: employees<br> partitions: NULL<br> type: ALL<br>possible_keys: NULL<br> key: NULL<br> key_len: NULL<br> ref: NULL<br> rows: 299843<br> filtered: 100.00<br> Extra: NULL<br>*************************** 2. row ***************************<br> id: 2<br> select_type: SUBQUERY<br> table: dept_emp<br> partitions: NULL<br> type: index<br>possible_keys: PRIMARY,emp_no,dept_no<br> key: dept_no<br> key_len: 4<br> ref: NULL<br> rows: 331215<br> filtered: 100.00<br> Extra: Using index<br>*************************** 3. row ***************************<br> id: 2<br> select_type: SUBQUERY<br> table: salaries<br> partitions: NULL<br> type: ref<br>possible_keys: PRIMARY,emp_no<br> key: emp_no<br> key_len: 4<br> ref: employees.dept_emp.emp_no<br> rows: 10<br> filtered: 100.00<br> Extra: Using index<br>3 rows in set, 1 warning (0.00 sec)<br><br>Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`dept_emp`.`dept_no` AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) |
EXPLAIN FORMAT=JSON provides a completely different picture:
|
1 |
mysql> explain format=json select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)G<br>*************************** 1. row ***************************<br>EXPLAIN: {<br> "query_block": {<br> "select_id": 1,<br> "cost_info": {<br> "query_cost": "60833.60"<br> },<br> "ordering_operation": {<br> "using_filesort": false,<br> "table": {<br> "table_name": "employees",<br> "access_type": "ALL",<br> "rows_examined_per_scan": 299843,<br> "rows_produced_per_join": 299843,<br> "filtered": "100.00",<br> "cost_info": {<br> "read_cost": "865.00",<br> "eval_cost": "59968.60",<br> "prefix_cost": "60833.60",<br> "data_read_per_join": "13M"<br> },<br> "used_columns": [<br> "emp_no",<br> "first_name",<br> "last_name"<br> ]<br> },<br> "optimized_away_subqueries": [<br> {<br> "dependent": false,<br> "cacheable": true,<br> "query_block": {<br> "select_id": 2,<br> "cost_info": {<br> "query_cost": "1082124.21"<br> },<br> "grouping_operation": {<br> "using_filesort": false,<br> "nested_loop": [<br> {<br> "table": {<br> "table_name": "dept_emp",<br> "access_type": "index",<br> "possible_keys": [<br> "PRIMARY",<br> "emp_no",<br> "dept_no"<br> ],<br> "key": "dept_no",<br> "used_key_parts": [<br> "dept_no"<br> ],<br> "key_length": "4",<br> "rows_examined_per_scan": 331215,<br> "rows_produced_per_join": 331215,<br> "filtered": "100.00",<br> "using_index": true,<br> "cost_info": {<br> "read_cost": "673.00",<br> "eval_cost": "66243.00",<br> "prefix_cost": "66916.00",<br> "data_read_per_join": "5M"<br> },<br> "used_columns": [<br> "emp_no",<br> "dept_no"<br> ]<br> }<br> },<br> {<br> "table": {<br> "table_name": "salaries",<br> "access_type": "ref",<br> "possible_keys": [<br> "PRIMARY",<br> "emp_no"<br> ],<br> "key": "emp_no",<br> "used_key_parts": [<br> "emp_no"<br> ],<br> "key_length": "4",<br> "ref": [<br> "employees.dept_emp.emp_no"<br> ],<br> "rows_examined_per_scan": 10,<br> "rows_produced_per_join": 3399374,<br> "filtered": "100.00",<br> "using_index": true,<br> "cost_info": {<br> "read_cost": "335333.33",<br> "eval_cost": "679874.87",<br> "prefix_cost": "1082124.21",<br> "data_read_per_join": "51M"<br> },<br> "used_columns": [<br> "emp_no",<br> "from_date"<br> ]<br> }<br> }<br> ]<br> }<br> }<br> }<br> ]<br> }<br> }<br>}<br>1 row in set, 1 warning (0.00 sec)<br><br>Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`dept_emp`.`dept_no` AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) |
We see that the subquery was optimized away: member optimized_away_subqueries exists, but there is no order_by_subqueries in the ordering_operation object. We can also see that the subquery was cached: "cacheable": true.
EXPLAIN FORMAT=JSON also provides information about subqueries in the GROUP BY clause. It uses the group_by_subqueries array in the grouping_operation member for this purpose.
|
1 |
mysql> explain format=json select count(emp_no) from salaries group by salary > ALL (select s/c as avg_salary from (select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) t)G<br>*************************** 1. row ***************************<br>EXPLAIN: {<br> "query_block": {<br> "select_id": 1,<br> "cost_info": {<br> "query_cost": "3412037.60"<br> },<br> "grouping_operation": {<br> "using_temporary_table": true,<br> "using_filesort": true,<br> "cost_info": {<br> "sort_cost": "2838638.00"<br> },<br> "table": {<br> "table_name": "salaries",<br> "access_type": "ALL",<br> "rows_examined_per_scan": 2838638,<br> "rows_produced_per_join": 2838638,<br> "filtered": "100.00",<br> "cost_info": {<br> "read_cost": "5672.00",<br> "eval_cost": "567727.60",<br> "prefix_cost": "573399.60",<br> "data_read_per_join": "43M"<br> },<br> "used_columns": [<br> "emp_no",<br> "salary",<br> "from_date"<br> ]<br> },<br> "group_by_subqueries": [<br> {<br> "dependent": true,<br> "cacheable": false,<br> "query_block": {<br> "select_id": 2,<br> "cost_info": {<br> "query_cost": "881731.00"<br> },<br> "table": {<br> "table_name": "t",<br> "access_type": "ALL",<br> "rows_examined_per_scan": 3526884,<br> "rows_produced_per_join": 3526884,<br> "filtered": "100.00",<br> "cost_info": {<br> "read_cost": "176354.20",<br> "eval_cost": "705376.80",<br> "prefix_cost": "881731.00",<br> "data_read_per_join": "134M"<br> },<br> "used_columns": [<br> "dept_no",<br> "s",<br> "c"<br> ],<br> "attached_condition": "((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`)))",<br> "materialized_from_subquery": {<br> "using_temporary_table": true,<br> "dependent": false,<br> "cacheable": true,<br> "query_block": {<br> "select_id": 3,<br> "cost_info": {<br> "query_cost": "1106758.94"<br> },<br> "grouping_operation": {<br> "using_filesort": false,<br> "nested_loop": [<br> {<br> "table": {<br> "table_name": "dept_emp",<br> "access_type": "index",<br> "possible_keys": [<br> "PRIMARY",<br> "emp_no",<br> "dept_no"<br> ],<br> "key": "dept_no",<br> "used_key_parts": [<br> "dept_no"<br> ],<br> "key_length": "4",<br> "rows_examined_per_scan": 331215,<br> "rows_produced_per_join": 331215,<br> "filtered": "100.00",<br> "using_index": true,<br> "cost_info": {<br> "read_cost": "673.00",<br> "eval_cost": "66243.00",<br> "prefix_cost": "66916.00",<br> "data_read_per_join": "5M"<br> },<br> "used_columns": [<br> "emp_no",<br> "dept_no"<br> ]<br> }<br> },<br> {<br> "table": {<br> "table_name": "salaries",<br> "access_type": "ref",<br> "possible_keys": [<br> "PRIMARY",<br> "emp_no"<br> ],<br> "key": "PRIMARY",<br> "used_key_parts": [<br> "emp_no"<br> ],<br> "key_length": "4",<br> "ref": [<br> "employees.dept_emp.emp_no"<br> ],<br> "rows_examined_per_scan": 10,<br> "rows_produced_per_join": 3526884,<br> "filtered": "100.00",<br> "cost_info": {<br> "read_cost": "334466.14",<br> "eval_cost": "705376.80",<br> "prefix_cost": "1106758.95",<br> "data_read_per_join": "53M"<br> },<br> "used_columns": [<br> "emp_no",<br> "salary",<br> "from_date"<br> ]<br> }<br> }<br> ]<br> }<br> }<br> }<br> }<br> }<br> }<br> ]<br> }<br> }<br>}<br>1 row in set, 1 warning (0.01 sec)<br><br>Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)` from `employees`.`salaries` group by <not>(<in_optimizer>(`employees`.`salaries`.`salary`,<exists>(/* select#2 */ select 1 from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s`,count(`employees`.`salaries`.`emp_no`) AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t` where ((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`))) having <is_not_null_test>((`t`.`s` / `t`.`c`))))) |
Again, this output gives a clear view of query optimization: subquery in GROUP BY itself cannot be optimized, cached or converted into temporary table, but the subquery inside the subquery ( select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) could be materialized into a temporary table and cached.
A regular EXPLAIN command does not provide such details:
|
1 |
mysql> explain select count(emp_no) from salaries group by salary > ALL (select s/c as avg_salary from (select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) t)G<br>*************************** 1. row ***************************<br> id: 1<br> select_type: PRIMARY<br> table: salaries<br> partitions: NULL<br> type: ALL<br>possible_keys: NULL<br> key: NULL<br> key_len: NULL<br> ref: NULL<br> rows: 2838638<br> filtered: 100.00<br> Extra: Using temporary; Using filesort<br>*************************** 2. row ***************************<br> id: 2<br> select_type: DEPENDENT SUBQUERY<br> table: <derived3><br> partitions: NULL<br> type: ALL<br>possible_keys: NULL<br> key: NULL<br> key_len: NULL<br> ref: NULL<br> rows: 3526884<br> filtered: 100.00<br> Extra: Using where<br>*************************** 3. row ***************************<br> id: 3<br> select_type: DERIVED<br> table: dept_emp<br> partitions: NULL<br> type: index<br>possible_keys: PRIMARY,emp_no,dept_no<br> key: dept_no<br> key_len: 4<br> ref: NULL<br> rows: 331215<br> filtered: 100.00<br> Extra: Using index<br>*************************** 4. row ***************************<br> id: 3<br> select_type: DERIVED<br> table: salaries<br> partitions: NULL<br> type: ref<br>possible_keys: PRIMARY,emp_no<br> key: PRIMARY<br> key_len: 4<br> ref: employees.dept_emp.emp_no<br> rows: 10<br> filtered: 100.00<br> Extra: NULL<br>4 rows in set, 1 warning (0.01 sec)<br><br>Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)` from `employees`.`salaries` group by <not>(<in_optimizer>(`employees`.`salaries`.`salary`,<exists>(/* select#2 */ select 1 from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s`,count(`employees`.`salaries`.`emp_no`) AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t` where ((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`))) having <is_not_null_test>((`t`.`s` / `t`.`c`))))) |
Most importantly, we cannot guess from the output if the DERIVED subquery can be cached.
Conlcusion: EXPLAIN FORMAT=JSON provides details on how subqueries in ORDER BY and GROUP BY clauses are optimized.
Resources
RELATED POSTS