grouping_operation, duplicates_removal: EXPLAIN FORMAT=JSON has all details about GROUP BY

EXPLAIN FORMAT=JSONIn the previous EXPLAIN FORMAT=JSON is Cool! series blog post, we discussed the   group_by_subqueries  member (which is child of grouping_operation). Let’s now focus on the  grouping_operation  and other details of  GROUP BY  processing.

grouping_operation simply shows the details of what happens when the  GROUP BY clause is run:

In the listing above, you can see which table was accessed by the  GROUP BY operation, the access type, and if an index for GROUP BY was used.

In case of a simple JOIN  of two tables,  grouping_operation is usually a parent for the  nested_loop  object (which provides details on how the  JOIN  proceeded):

Surprisingly, while many DISTINCT queries can be converted into equivalent queries with the  GROUP BY clause, there is separate member ( duplicates_removal) for processing it. Let’s see how it works with a simple query that performs the same job as the first one in this blog post:

You can see that the plan is almost same, but parent element for the plan is duplicates_removal.

The reason there are differences between these members can be seen if we change the second, more complicated query to use DISTINCT in place of GROUP BY:

This example shows that  DISTINCT is not exactly same as GROUP BY, and can be used together  if we want to count the number of managers in each department (grouped by the year when the manager started working in the department). In this case, however, we are interested only in unique pairs of such dates and don’t want to see duplicates. Duplicates will appear if one person managed same department more than two years.

In this case, the member grouping_operation is a child of duplicates_removal and the temporary table used to store the result of GROUP BY  before removing the duplicates. A temporary table was also used to perform a filesort for the grouping operation itself.

Compare this with regular EXPLAIN output. EXPLAIN only shows that a temporary table was used, but does not provide insights on the operations for which it was used:

Conclusion:  EXPLAIN FORMAT=JSON contains all the details about the  GROUP BY and DISTINCT  optimizations.

Share this post