In the previous EXPLAIN FORMAT=JSON is Cool! series blog post, we discussed the
group_by_subqueries member (which is child of
grouping_operation). Let’s now focus on the
grouping_operation and other details of
GROUP BY processing.
grouping_operation simply shows the details of what happens when the GROUP BY clause is run:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
mysql> explain format=json select dept_no from dept_emp group by dept_noG *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "14.40" }, "grouping_operation": { "using_filesort": false, "table": { "table_name": "dept_emp", "access_type": "range", "possible_keys": [ "PRIMARY", "emp_no", "dept_no" ], "key": "dept_no", "used_key_parts": [ "dept_no" ], "key_length": "4", "rows_examined_per_scan": 9, "rows_produced_per_join": 9, "filtered": "100.00", "using_index_for_group_by": true, "cost_info": { "read_cost": "12.60", "eval_cost": "1.80", "prefix_cost": "14.40", "data_read_per_join": "144" }, "used_columns": [ "emp_no", "dept_no" ] } } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`dept_emp`.`dept_no` AS `dept_no` from `employees`.`dept_emp` group by `employees`.`dept_emp`.`dept_no` |
In the listing above, you can see which table was accessed by the GROUP BY operation, the access type, and if an index for GROUP BY was used.
In case of a simple JOIN of two tables, grouping_operation is usually a parent for the nested_loop object (which provides details on how the JOIN proceeded):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
mysql> explain format=json select de.dept_no, count(dm.emp_no) from dept_emp de join dept_manager dm using(emp_no) group by de.dept_noG *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "61.50" }, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "26.41" }, "nested_loop": [ { "table": { "table_name": "dm", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "emp_no", "used_key_parts": [ "emp_no" ], "key_length": "4", "rows_examined_per_scan": 24, "rows_produced_per_join": 24, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "1.00", "eval_cost": "4.80", "prefix_cost": "5.80", "data_read_per_join": "384" }, "used_columns": [ "dept_no", "emp_no" ] } }, { "table": { "table_name": "de", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no", "dept_no" ], "key": "emp_no", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.dm.emp_no" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 26, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "24.00", "eval_cost": "5.28", "prefix_cost": "35.09", "data_read_per_join": "422" }, "used_columns": [ "emp_no", "dept_no" ] } } ] } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select `employees`.`de`.`dept_no` AS `dept_no`,count(`employees`.`dm`.`emp_no`) AS `count(dm.emp_no)` from `employees`.`dept_emp` `de` join `employees`.`dept_manager` `dm` where (`employees`.`de`.`emp_no` = `employees`.`dm`.`emp_no`) group by `employees`.`de`.`dept_no` |
Surprisingly, while many DISTINCT queries can be converted into equivalent queries with the GROUP BY clause, there is separate member ( duplicates_removal) for processing it. Let’s see how it works with a simple query that performs the same job as the first one in this blog post:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
mysql> explain format=json select distinct dept_no from dept_empG *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "14.40" }, "duplicates_removal": { "using_filesort": false, "table": { "table_name": "dept_emp", "access_type": "range", "possible_keys": [ "PRIMARY", "emp_no", "dept_no" ], "key": "dept_no", "used_key_parts": [ "dept_no" ], "key_length": "4", "rows_examined_per_scan": 9, "rows_produced_per_join": 9, "filtered": "100.00", "using_index_for_group_by": true, "cost_info": { "read_cost": "12.60", "eval_cost": "1.80", "prefix_cost": "14.40", "data_read_per_join": "144" }, "used_columns": [ "emp_no", "dept_no" ] } } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select distinct `employees`.`dept_emp`.`dept_no` AS `dept_no` from `employees`.`dept_emp` |
You can see that the plan is almost same, but parent element for the plan is duplicates_removal.
The reason there are differences between these members can be seen if we change the second, more complicated query to use DISTINCT in place of GROUP BY:
1 2 |
mysql> explain format=json select distinct de.dept_no, count(dm.emp_no) from dept_emp de join dept_manager dm using(emp_no)G ERROR 1140 (42000): In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'employees.de.dept_no'; this is incompatible with sql_mode=only_full_group_by |
This example shows that DISTINCT is not exactly same as GROUP BY, and can be used together if we want to count the number of managers in each department (grouped by the year when the manager started working in the department). In this case, however, we are interested only in unique pairs of such dates and don’t want to see duplicates. Duplicates will appear if one person managed same department more than two years.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
mysql> explain format=json select distinct de.dept_no, count(dm.emp_no) from dept_emp de join dept_manager dm using(emp_no) group by de.dept_no, year(de.from_date)G *************************** 1. row *************************** EXPLAIN: { "query_block": { "select_id": 1, "cost_info": { "query_cost": "61.63" }, "duplicates_removal": { "using_temporary_table": true, "using_filesort": false, "grouping_operation": { "using_temporary_table": true, "using_filesort": true, "cost_info": { "sort_cost": "26.53" }, "nested_loop": [ { "table": { "table_name": "dm", "access_type": "index", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "emp_no", "used_key_parts": [ "emp_no" ], "key_length": "4", "rows_examined_per_scan": 24, "rows_produced_per_join": 24, "filtered": "100.00", "using_index": true, "cost_info": { "read_cost": "1.00", "eval_cost": "4.80", "prefix_cost": "5.80", "data_read_per_join": "384" }, "used_columns": [ "dept_no", "emp_no" ] } }, { "table": { "table_name": "de", "access_type": "ref", "possible_keys": [ "PRIMARY", "emp_no" ], "key": "PRIMARY", "used_key_parts": [ "emp_no" ], "key_length": "4", "ref": [ "employees.dm.emp_no" ], "rows_examined_per_scan": 1, "rows_produced_per_join": 26, "filtered": "100.00", "cost_info": { "read_cost": "24.00", "eval_cost": "5.31", "prefix_cost": "35.11", "data_read_per_join": "424" }, "used_columns": [ "emp_no", "dept_no", "from_date" ] } } ] } } } } 1 row in set, 1 warning (0.00 sec) Note (Code 1003): /* select#1 */ select distinct `employees`.`de`.`dept_no` AS `dept_no`,count(`employees`.`dm`.`emp_no`) AS `count(dm.emp_no)` from `employees`.`dept_emp` `de` join `employees`.`dept_manager` `dm` where (`employees`.`de`.`emp_no` = `employees`.`dm`.`emp_no`) group by `employees`.`de`.`dept_no`,year(`employees`.`de`.`from_date`) |
In this case, the member grouping_operation is a child of duplicates_removal and the temporary table used to store the result of GROUP BY before removing the duplicates. A temporary table was also used to perform a filesort for the grouping operation itself.
Compare this with regular EXPLAIN output. EXPLAIN only shows that a temporary table was used, but does not provide insights on the operations for which it was used:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
mysql> explain select distinct de.dept_no, count(dm.emp_no) from dept_emp de join dept_manager dm using(emp_no) group by de.dept_no, year(de.from_date)G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: dm partitions: NULL type: index possible_keys: PRIMARY,emp_no key: emp_no key_len: 4 ref: NULL rows: 24 filtered: 100.00 Extra: Using index; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: de partitions: NULL type: ref possible_keys: PRIMARY,emp_no key: PRIMARY key_len: 4 ref: employees.dm.emp_no rows: 1 filtered: 100.00 Extra: NULL 2 rows in set, 1 warning (0.01 sec) Note (Code 1003): /* select#1 */ select distinct `employees`.`de`.`dept_no` AS `dept_no`,count(`employees`.`dm`.`emp_no`) AS `count(dm.emp_no)` from `employees`.`dept_emp` `de` join `employees`.`dept_manager` `dm` where (`employees`.`de`.`emp_no` = `employees`.`dm`.`emp_no`) group by `employees`.`de`.`dept_no`,year(`employees`.`de`.`from_date`) |
Conclusion: EXPLAIN FORMAT=JSON contains all the details about the GROUP BY and DISTINCT optimizations.