Jan has a good article about finding the row matching some value in the group:

This is one illustration of group by limitations in SQL language which is not offset by any MySQL specific extensions,yet
As you can see if you want to get one row from the group which is sorted some way you have to use ugly hacks. This is because SQL does not have a notion of sorting the data within the groups and in fact ANSI SQL even forbids you to select columns which are not aggregates or part of group by because result in this case is not defined.

What would be quite helpful is to have GROUPORDER (pick the name) clause which defines which element is selected for non aggregate columns, something like

This could give you row which has highest population for which country, and really you would not even need to query MAX(population) if you do not need that because you’re soring by population you would see MAX(Population) in the population column:

would simply return city with highest population for each country.

In fact we had similar problem in Sphinx when we were working on BoardReader project – we wanted to group search results by thread and return only one best matching post from each thread while matches themselve could be sorted by relevance or freshness. The benefit of not following any standards is what you can always add what you want and this is just what we did with sphinx in similar case.

SQL Standard might be good for enterprise applications but huge amount of Web applications do not care – they are stuck enough in MySQL they would have rewrite a lot going to other database anyway while they would love to get features which allow you to do stuff more efficient or more beautiful way.

On other hand what I’m talking about if probably the most commonly requested extension which is already supported by number of vendors – CONNECT BY is still not implemented.

11 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
No

Great post. Lucky for me I can use “ALTER TABLE tbl ORDER BY col;” because my data is nearly static and just one GROUP BY query needs a “GROUPORDER”.

The issue as I see it thought, is that those queries aren’t actually any GROUP BY queries, but queries where you want the best matching row for a certain WHERE.

Basically MySQL should handle those queries like this:

-Sort by Population
-Scan through the table and
a. if we already took a row with that city, trash it
b. take the city

This can even fully run on indices!

Something like

SELECT DISTINCT( Country ) City, Population, Country FROM City ORDER BY Country, Population DESC

This would pick all all cities with highest population, and sort them by country.

**Sadly enough DISTINCT doesn’t take any arguments.**

I never understood why it doesn’t. Makes no sense to me.

Mark Callaghan

Would window functions provide what you want? Oracle has them (http://orafaq.com/node/55). I thought they were part of a standard.

Olexandr Melnyk

> in fact ANSI SQL even forbids you to select columns
> which are not aggregates or part of group by

ANSI SQL doesn’t require not aggregated selected columns to be present in GROUP BY in all cases. Here is some elaboration on this:

http://www.oreillynet.com/databases/blog/2007/05/debunking_group_by_myths.html

David Grant

I was just wishing for this feature the other day.

Roland Bouman

Hi Peter,

“In fact ANSI SQL even forbids you to select columns which are not aggregates or part of group by because result in this case is not defined.”

Well, it depends which version of ANSI SQL you are referring too. SQL92 says:

“If T is a grouped table, then each in each that references a column of T shall reference a grouping column or be specified within a .”

Which indeed wants us to always include all non-aggregated columns in the GROUP BY clause.

But SQL1999 and SQL2003 have a more sophisticated view on the matter:

“If T is a grouped table, then let G be the set of grouping columns of T. In each contained in , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a whose aggregation query is QS.”

Which just says that you are allowed to include non-aggregated columns in the GROUP BY clause as long as they are completely determined by the GROUP BY clause, in other words, when there is exactly one distinct value per combination of values in the GROUP BY clause.

(http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html)

kind regards,

Roland Bouman

Roland Bouman

mmm, seems all terms enclosed in angle brackets got removed….quotes again:

First Quote, GROUP BY according to SQL92:

“If T is a grouped table, then each [column reference] in each [value expression] that references a column of T shall reference a grouping column or be specified within a [set function specification].”

Second Quote, GROUP BY according to SQL1999 and SQL2003:

“If T is a grouped table, then let G be the set of grouping columns of T. In each [value expression] contained in [select list], each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a [set function specification] whose aggregation query is QS.”

Clem

I believe this is a semi-elegant solution to this ugly problem:

SELECT City, Country, Population FROM
(SELECT City, Country, Population
ORDER BY Country, /* Artificial GROUP BY Country */
Population DESC /* Artificial MAX of Population within Country */
) temptable
GROUP BY Country; /* Select first row of each country grouping */