Announcement

Announcement Module
Collapse
No announcement yet.

MySQL DISTINCT with LIMIT optimization

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • MySQL DISTINCT with LIMIT optimization

    Suppose the following three conditions are met:
    - DISTINCT is used together with ORDER BY
    - an index can be used for ORDER BY
    - the query contains LIMIT x with x relatively small compared to the full result set

    I think this is a very common situation.

    I've tried this in a quite recent version, and DISTINCT is still performed by ordering the results first, then removing duplicates, then ordering on the columns specified after ORDER BY, then returning the first x rows.

    It seems more sense to fetch the rows in the order specified by ORDER BY if an index is available, and checking if it is not equal to a row that has been returned already. Especially when x is small, say 10 or 50, and the full result set is in the millions, the performance gain is substantial. The check for uniqueness could be based on full row comparison or by checksum.

  • #2
    does anyone have useful thoughts?

    Comment


    • #3
      Sorry I didn't really understand that it was a question the first time I read the post.

      Do you have a join in the query is it on only one table?

      The first thought that comes from the top of my head is that you create a derived table with say 2 times the amount of rows that you need to ensure that your DISTINCT will result in the LIMIT amount of rows.

      SELECT DISTINCT ...FROM ( SELECT ... FROM yourBigTable ORDER BY x LIMIT y ) AS tempORDER BY zLIMIT n


      That way you can use the order by index limit x optimization on the inner query and only have to sort/condense the smaller derived table.

      Comment


      • #4
        That's a nice workaround, but still causes disk i/o if the result set has a text table. I was hoping to hear that either MySQL has the functionality which can be triggered somehow or why it does not have it

        Comment


        • #5
          I think this is going to be improved in MySQL 5.6, but I haven't tested this exact scenario.

          Comment


          • #6
            http://dev.mysql.com/tech-resources/articles/whats-new-in-my sql-5.6.html

            It's not listed there. The only thing that comes close, is:
            Quote:
            File Sort Optimization

            For queries that combine ORDER BY non_indexed_column and a LIMIT x clause, this feature speeds up the sort when the contents of X rows can fit into the sort buffer. Works with all storage engines.
            This can speed up DISTINCT, but not when DISTINCT is combined with ORDER BY.

            Comment


            • #7
              Now that I think about it, it works for a slightly modified version of sterins query:

              SELECT DISTINCT ...FROM ( SELECT ... FROM yourBigTable ORDER BY x ) AS tempLIMIT n

              Comment

              Working...
              X