Should MySQL Extend GROUP BY Syntax ?

Jan has a good article about finding the row matching some value in the group:

This is one illustration of group by limitations in SQL language which is not offset by any MySQL specific extensions,yet
As you can see if you want to get one row from the group which is sorted some way you have to use ugly hacks. This is because SQL does not have a notion of sorting the data within the groups and in fact ANSI SQL even forbids you to select columns which are not aggregates or part of group by because result in this case is not defined.

What would be quite helpful is to have GROUPORDER (pick the name) clause which defines which element is selected for non aggregate columns, something like

SELECT MAX(Population),  City, Country  FROM City GROUP BY Country GROUPORDER BY Population DESC;

1	SELECT MAX(Population), City, Country FROM City GROUP BY Country GROUPORDER BY Population DESC;

This could give you row which has highest population for which country, and really you would not even need to query MAX(population) if you do not need that because you’re soring by population you would see MAX(Population) in the population column:

SELECT City, Population City FROM City GROUP BY Country GROUPORDER BY  Population DESC

1	SELECT City, Population City FROM City GROUP BY Country GROUPORDER BY Population DESC

would simply return city with highest population for each country.

In fact we had similar problem in Sphinx when we were working on BoardReader project – we wanted to group search results by thread and return only one best matching post from each thread while matches themselve could be sorted by relevance or freshness. The benefit of not following any standards is what you can always add what you want and this is just what we did with sphinx in similar case.

SQL Standard might be good for enterprise applications but huge amount of Web applications do not care – they are stuck enough in MySQL they would have rewrite a lot going to other database anyway while they would love to get features which allow you to do stuff more efficient or more beautiful way.

On other hand what I’m talking about if probably the most commonly requested extension which is already supported by number of vendors – CONNECT BY is still not implemented.

11 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

18 years ago

Great post. Lucky for me I can use “ALTER TABLE tbl ORDER BY col;” because my data is nearly static and just one GROUP BY query needs a “GROUPORDER”.

The issue as I see it thought, is that those queries aren’t actually any GROUP BY queries, but queries where you want the best matching row for a certain WHERE.

Basically MySQL should handle those queries like this:

-Sort by Population
-Scan through the table and
a. if we already took a row with that city, trash it
b. take the city

This can even fully run on indices!

Something like

SELECT DISTINCT( Country ) City, Population, Country FROM City ORDER BY Country, Population DESC

This would pick all all cities with highest population, and sort them by country.

**Sadly enough DISTINCT doesn’t take any arguments.**

I never understood why it doesn’t. Makes no sense to me.

Mark Callaghan

18 years ago

Would window functions provide what you want? Oracle has them (http://orafaq.com/node/55). I thought they were part of a standard.

Olexandr Melnyk

18 years ago

> in fact ANSI SQL even forbids you to select columns
> which are not aggregates or part of group by

ANSI SQL doesn’t require not aggregated selected columns to be present in GROUP BY in all cases. Here is some elaboration on this:

http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html

Peter Zaitsev

Author

18 years ago

Olexandr,

Yes this is simplification there are other bits in pieces which are allowed but generally if selected column is not dependent on the group by column, like in this case City is not Dependent on Country it either would not be allowed or you could see surprising results in the city column.

That exact article gives good examples and a bit more explanations.

Peter Zaitsev

Author

18 years ago

No,

Sorting table is dangerous trap as if you change query a bit, ie to add range which will be using index to access the rows you will get different plan and not result you’re expecting.

Regarding DISTINCT – applies to the Row rather than the column – it says basically remove duplicate rows.

If you would make it to take argument and read like remove duplicate values from this column you would have same problem as group by – what values you should see for other columns ? This would be hard to define independently on query execution plan.

Peter Zaitsev

Author

18 years ago

Mark,

Analytical functions are quite helpful but they are generally designed for the different purpose – as you can see they do aggregates but they run on result set not on the grouped result set (which ensures only one row for each group is left)

Or may be I do not see how you can get the same result set as I’m looking for using these functions without too much complications 🙂

It surely would be helpful to get them in MySQL.

David Grant

18 years ago

I was just wishing for this feature the other day.

Roland Bouman

18 years ago

Hi Peter,

“In fact ANSI SQL even forbids you to select columns which are not aggregates or part of group by because result in this case is not defined.”

Well, it depends which version of ANSI SQL you are referring too. SQL92 says:

“If T is a grouped table, then each in each that references a column of T shall reference a grouping column or be specified within a .”

Which indeed wants us to always include all non-aggregated columns in the GROUP BY clause.

But SQL1999 and SQL2003 have a more sophisticated view on the matter:

“If T is a grouped table, then let G be the set of grouping columns of T. In each contained in , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a whose aggregation query is QS.”

Which just says that you are allowed to include non-aggregated columns in the GROUP BY clause as long as they are completely determined by the GROUP BY clause, in other words, when there is exactly one distinct value per combination of values in the GROUP BY clause.

(http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html)

kind regards,

Roland Bouman

18 years ago

mmm, seems all terms enclosed in angle brackets got removed….quotes again:

First Quote, GROUP BY according to SQL92:

“If T is a grouped table, then each [column reference] in each [value expression] that references a column of T shall reference a grouping column or be specified within a [set function specification].”

Second Quote, GROUP BY according to SQL1999 and SQL2003:

“If T is a grouped table, then let G be the set of grouping columns of T. In each [value expression] contained in [select list], each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a [set function specification] whose aggregation query is QS.”

Peter Zaitsev

Author

18 years ago

Ronald,

You article was already quoted and I replied 🙂

“Functionally Dependent” is really more elaborate explanation of columns in group by – the basics remain the same – you can have the column in the list if there is only one value of that column in the group.

I’m speaking about different case when there are many column value for each group and we just want to define which one is picked.

For example in our example grouping by Country we may also include CountryCode because there is only one for each country, however there are multiple Cities for each Country and group by does not allow to pick which one is selected (or even select it at all in more strict databases)

Clem

16 years ago

I believe this is a semi-elegant solution to this ugly problem:

SELECT City, Country, Population FROM
(SELECT City, Country, Population
ORDER BY Country, /* Artificial GROUP BY Country */
Population DESC /* Artificial MAX of Population within Country */
) temptable
GROUP BY Country; /* Select first row of each country grouping */

MySQL 5.7
Support

Compare Percona to Leading Database Solutions

Software
Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Should MySQL Extend GROUP BY Syntax ?

Related Blog Articles

RECOMMENDED ARTICLES

Semantic Caching for LLM Apps: Reduce Costs by 40-80% and Speed up by 250x

Importance of Tuning Checkpoint in PostgreSQL

Databases, Data Lakes, And Encryption

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7 Support

Compare Percona to Leading Database Solutions

Software Downloads

Valkey Contribution

Product Documentation

Resource Hub

Why Percona for MongoDB?

Why Percona for PostgreSQL?

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Should MySQL Extend GROUP BY Syntax ?

About the Author

Share This Post!

Stay up to date with the Percona Blog

Related Blog Articles

RECOMMENDED ARTICLES

Semantic Caching for LLM Apps: Reduce Costs by 40-80% and Speed up by 250x

Importance of Tuning Checkpoint in PostgreSQL

Databases, Data Lakes, And Encryption

MOST POPULAR ARTICLES

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL Performance Tuning: Maximizing Database Efficiency and Speed

The Ultimate Guide to Open Source Databases

MySQL 5.7
Support

Software
Downloads