EXPLAIN FORMAT=JSON: order_by_subqueries, group_by_subqueries details on subqueries in ORDER BY and GROUP BY
Another post in the EXPLAIN FORMAT=JSON is Cool! series! In this post, we’ll discuss how the EXPLAIN FORMAT=JSON provides optimization details for
ORDER BY and
GROUP BY operations in conjunction with
order_by_subqueries and
group_by_subqueries.
EXPLAIN FORMAT=JSON can print details on how a subquery in ORDER BY is optimized:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
mysql> explain format=json select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select emp_no limit 1)G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "60833.60" }, "ordering_operation": { "using_filesort": true, "table": { "table_name": "employees", "access_type": "ALL", "rows_examined_per_scan": 299843, "rows_produced_per_join": 299843, "filtered": "100.00", "cost_info": { "read_cost": "865.00", "eval_cost": "59968.60", "prefix_cost": "60833.60", "data_read_per_join": "13M" }, "used_columns": [ "emp_no", "first_name", "last_name" ] }, "order_by_subqueries": [ { "dependent": true, "cacheable": false, "query_block": { "select_id": 2, "message": "No tables used" } } ] } } } 1 row in set, 2 warnings (0.00 sec) Note (Code 1276): Field or reference 'employees.employees.emp_no' of SELECT #2 was resolved in SELECT #1 Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`employees`.`emp_no` limit 1) |
The above code shows member ordering_operation of query_block (which includes the order_by_subqueries array) with information on how the subquery in ORDER BY was optimized.
This is a simple example. In real life you can have larger subqueries in the ORDER BY clause. For example, take this more complicated and slightly crazy query:
1 |
select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no) |
Run a regular EXPLAIN on it. If we imagine this is a regular subquery, we won’t know if it can be cached or would be executed for each row sorted.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
mysql> explain select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: employees partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 299843 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 2 select_type: SUBQUERY table: dept_emp partitions: NULL type: index possible_keys: PRIMARY,emp_no,dept_no key: dept_no key_len: 4 ref: NULL rows: 331215 filtered: 100.00 Extra: Using index *************************** 3. row *************************** id: 2 select_type: SUBQUERY table: salaries partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: emp_no key_len: 4 ref: employees.dept_emp.emp_no rows: 10 filtered: 100.00 Extra: Using index 3 rows in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`dept_emp`.`dept_no` AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) |
EXPLAIN FORMAT=JSON provides a completely different picture:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
mysql> explain format=json select emp_no, concat(first_name, ' ', last_name) f2 from employees order by (select dept_no as c from salaries join dept_emp using (emp_no) group by dept_no)G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "60833.60" }, "ordering_operation": { "using_filesort": false, "table": { "table_name": "employees", "access_type": "ALL", "rows_examined_per_scan": 299843, "rows_produced_per_join": 299843, "filtered": "100.00", "cost_info": { "read_cost": "865.00", "eval_cost": "59968.60", "prefix_cost": "60833.60", "data_read_per_join": "13M" }, "used_columns": [ "emp_no", "first_name", "last_name" ] }, "optimized_away_subqueries": [ { "dependent": false, "cacheable": true, "query_block": { "select_id": 2, "cost_info": { "query_cost": "1082124.21" }, "grouping_operation": { "using_filesort": false, "nested_loop": [ { "table": { "table_name": "dept_emp", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no", "dept_no" ], "key": "dept_no", "used_key_parts": [ "dept_no" ], "key_length": "4", "rows_examined_per_scan": 331215, "rows_produced_per_join": 331215, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "673.00", "eval_cost": "66243.00", "prefix_cost": "66916.00", "data_read_per_join": "5M" }, "used_columns": [ "emp_no", "dept_no" ] } }, { "table": { "table_name": "salaries", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "emp_no", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.dept_emp.emp_no" ], "rows_examined_per_scan": 10, "rows_produced_per_join": 3399374, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "335333.33", "eval_cost": "679874.87", "prefix_cost": "1082124.21", "data_read_per_join": "51M" }, "used_columns": [ "emp_no", "from_date" ] } } ] } } } ] } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`employees`.`emp_no` AS `emp_no`,concat(`employees`.`employees`.`first_name`,' ',`employees`.`employees`.`last_name`) AS `f2` from `employees`.`employees` order by (/* select#2 */ select `employees`.`dept_emp`.`dept_no` AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) |
We see that the subquery was optimized away: member optimized_away_subqueries exists, but there is no order_by_subqueries in the ordering_operation object. We can also see that the subquery was cached: "cacheable": true.
EXPLAIN FORMAT=JSON also provides information about subqueries in the GROUP BY clause. It uses the group_by_subqueries array in the grouping_operation member for this purpose.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
mysql> explain format=json select count(emp_no) from salaries group by salary > ALL (select s/c as avg_salary from (select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) t)G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "3412037.60" }, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "2838638.00" }, "table": { "table_name": "salaries", "access_type": "ALL", "rows_examined_per_scan": 2838638, "rows_produced_per_join": 2838638, "filtered": "100.00", "cost_info": { "read_cost": "5672.00", "eval_cost": "567727.60", "prefix_cost": "573399.60", "data_read_per_join": "43M" }, "used_columns": [ "emp_no", "salary", "from_date" ] }, "group_by_subqueries": [ { "dependent": true, "cacheable": false, "query_block": { "select_id": 2, "cost_info": { "query_cost": "881731.00" }, "table": { "table_name": "t", "access_type": "ALL", "rows_examined_per_scan": 3526884, "rows_produced_per_join": 3526884, "filtered": "100.00", "cost_info": { "read_cost": "176354.20", "eval_cost": "705376.80", "prefix_cost": "881731.00", "data_read_per_join": "134M" }, "used_columns": [ "dept_no", "s", "c" ], "attached_condition": "((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`)))", "materialized_from_subquery": { "using_temporary_table": true, "dependent": false, "cacheable": true, "query_block": { "select_id": 3, "cost_info": { "query_cost": "1106758.94" }, "grouping_operation": { "using_filesort": false, "nested_loop": [ { "table": { "table_name": "dept_emp", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no", "dept_no" ], "key": "dept_no", "used_key_parts": [ "dept_no" ], "key_length": "4", "rows_examined_per_scan": 331215, "rows_produced_per_join": 331215, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "673.00", "eval_cost": "66243.00", "prefix_cost": "66916.00", "data_read_per_join": "5M" }, "used_columns": [ "emp_no", "dept_no" ] } }, { "table": { "table_name": "salaries", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.dept_emp.emp_no" ], "rows_examined_per_scan": 10, "rows_produced_per_join": 3526884, "filtered": "100.00", "cost_info": { "read_cost": "334466.14", "eval_cost": "705376.80", "prefix_cost": "1106758.95", "data_read_per_join": "53M" }, "used_columns": [ "emp_no", "salary", "from_date" ] } } ] } } } } } } ] } } } 1 row in set, 1 warning (0.01 sec) Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)` from `employees`.`salaries` group by <not>(<in_optimizer>(`employees`.`salaries`.`salary`,<exists>(/* select#2 */ select 1 from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s`,count(`employees`.`salaries`.`emp_no`) AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t` where ((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`))) having <is_not_null_test>((`t`.`s` / `t`.`c`))))) |
Again, this output gives a clear view of query optimization: subquery in GROUP BY itself cannot be optimized, cached or converted into temporary table, but the subquery inside the subquery ( select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) could be materialized into a temporary table and cached.
A regular EXPLAIN command does not provide such details:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
mysql> explain select count(emp_no) from salaries group by salary > ALL (select s/c as avg_salary from (select dept_no, sum(salary) as s, count(emp_no) as c from salaries join dept_emp using (emp_no) group by dept_no) t)G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2838638 filtered: 100.00 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: <derived3> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 3526884 filtered: 100.00 Extra: Using where *************************** 3. row *************************** id: 3 select_type: DERIVED table: dept_emp partitions: NULL type: index possible_keys: PRIMARY,emp_no,dept_no key: dept_no key_len: 4 ref: NULL rows: 331215 filtered: 100.00 Extra: Using index *************************** 4. row *************************** id: 3 select_type: DERIVED table: salaries partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: employees.dept_emp.emp_no rows: 10 filtered: 100.00 Extra: NULL 4 rows in set, 1 warning (0.01 sec) Note (Code 1003): /* select#1 */ select count(`employees`.`salaries`.`emp_no`) AS `count(emp_no)` from `employees`.`salaries` group by <not>(<in_optimizer>(`employees`.`salaries`.`salary`,<exists>(/* select#2 */ select 1 from (/* select#3 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no`,sum(`employees`.`salaries`.`salary`) AS `s`,count(`employees`.`salaries`.`emp_no`) AS `c` from `employees`.`salaries` join `employees`.`dept_emp` where (`employees`.`salaries`.`emp_no` = `employees`.`dept_emp`.`emp_no`) group by `employees`.`dept_emp`.`dept_no`) `t` where ((<cache>(`employees`.`salaries`.`salary`) <= (`t`.`s` / `t`.`c`)) or isnull((`t`.`s` / `t`.`c`))) having <is_not_null_test>((`t`.`s` / `t`.`c`))))) |
Most importantly, we cannot guess from the output if the DERIVED subquery can be cached.
Conlcusion: EXPLAIN FORMAT=JSON provides details on how subqueries in ORDER BY and GROUP BY clauses are optimized.
Leave a Reply