Suppose the following three conditions are met:
- DISTINCT is used together with ORDER BY
- an index can be used for ORDER BY
- the query contains LIMIT x with x relatively small compared to the full result set
I think this is a very common situation.
I've tried this in a quite recent version, and DISTINCT is still performed by ordering the results first, then removing duplicates, then ordering on the columns specified after ORDER BY, then returning the first x rows.
It seems more sense to fetch the rows in the order specified by ORDER BY if an index is available, and checking if it is not equal to a row that has been returned already. Especially when x is small, say 10 or 50, and the full result set is in the millions, the performance gain is substantial. The check for uniqueness could be based on full row comparison or by checksum.
- DISTINCT is used together with ORDER BY
- an index can be used for ORDER BY
- the query contains LIMIT x with x relatively small compared to the full result set
I think this is a very common situation.
I've tried this in a quite recent version, and DISTINCT is still performed by ordering the results first, then removing duplicates, then ordering on the columns specified after ORDER BY, then returning the first x rows.
It seems more sense to fetch the rows in the order specified by ORDER BY if an index is available, and checking if it is not equal to a row that has been returned already. Especially when x is small, say 10 or 50, and the full result set is in the millions, the performance gain is substantial. The check for uniqueness could be based on full row comparison or by checksum.
Comment