Announcement

Announcement Module
Collapse
No announcement yet.

find some random recently added items in a large table

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • find some random recently added items in a large table

    This is an easy problem to solve, but seems to be harder to do 'nicely', we cant seem to find a nice solution yet.

    The Problem
    Select a small number of random rows from those recently added to a table. (this is for a photo upload site, and want to showcase a selection of recently submitted photos)

    Our Solution
    We maintain a second 'recent' table, which holds just the id's for the most recent 250 images (maintained by a cronjob - quite well we think).

    This is the select statement we come up with so far to pick some recent images:





    (I havnt included the schema but it should be clear on the columns, gridimage_id is a primary key of both tables)

    This of results in a full table scan (almost) of recent_gridimage but it should never contain more than 250 rows anyway.

    Other attempts

    1) A quick select count(*) and then limit 486643,5; but of course that is very slow too.

    2)
    select * from recent_gridimage inner join gridimage_search using (gridimage_id) order by rand() limit 5


    3) Both of these do quite well (the second slightly better)

    select * from gridimage where gridimage_id between 486643 and 486647 limit 5select * from gridimage where gridimage_id in (486643,486644,486645,486646,486647) limit 5

    the problem beeing that not all ids are used, (about 1%), so could jsut select 6 to be safe, I don't think it critical that we would occaisaonly only get 4.


    .... so any neat tricks to tackle this?
    (the last one actully sounds quite good, and only thought of as writing this, but I'll post anyway incase it helps others!)

  • #2
    Create dedicated table like recent_items which has auto-increment key with no holes and just pick rows where generating random IDs in your application. This will make it very fast.

    Comment


    • #3
      Thanks for the reply,

      I guess you meaning to store all columns in the 'recent table' as opposed to just the primary key column (like we did, but that had sub-optimal performance as it had to join with the large table)

      In the end have gone with a count(*) then pick (in php) somewhere in the last few rows, and then do a "id in(x,x,x)" which from some crude testing gives marginally faster than other methods (apart from your suggestion which we haven't tried).

      Comment

      Working...
      X