GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Optimizing top N from large aggregates

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimizing top N from large aggregates

    i have a table with 25+ million rows and i have a query that needs to get the top n ids for varying criteria,
    the basic problem is optimizing something like:

    SELECT id, SUM(x) xsum
    WHERE ....
    GROUP BY id
    ORDER BY xsum
    LIMIT n

    i have a clustered index on id and the fields in the where clause

    without the ORDER BY, this is very fast, due to id being the first index in the cluster (i think), the problem is when the ORDER BY is added and the result is quite large (1-2 million), ordering such a set takes ~10-15 seconds

    ive tried partitioning the query by turning id into MD5(id) and then doing
    SELECT id, SUM(x) xsum FROM (

    (SELECT id, SUM(x) xsum
    WHERE ....
    AND id LIKE '0%'
    GROUP BY id
    order by xsum limit n)
    UNION ALL
    (SELECT id, SUM(x) xsum
    WHERE ....
    AND id LIKE '1%'
    GROUP BY id
    order by xsum limit n)
    UNION ALL
    .
    .
    .
    )
    ORDER BY xsum
    LIMIT n

    but though each query in the union is much faster, once i union all 16 (0-9a-f) it still takes about the same time

    is there anyway to optimize such a query given i only need the top n?

  • #2
    Have you read the comments on http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization .html ?

    Some of the techniques may apply to the WHERE clauses you are using (whatever they may be).

    Comment

    Working...
    X