Announcement

Announcement Module
Collapse
No announcement yet.

Help order by rand before group by

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help order by rand before group by

    Hello everybody,
    I need an advice for query which must Select rows from table grouped by position but for each position get each time random rows. I'm not sure that I'm explaining clearly so I will do some example. Sorry for my bad English. So the example is:
    This table could be several thousand rows, position will be only from 1 - 6. The big problem is that each day this select will be executed approximately 100 million times. So I need better optimazed query.
    table 'myTable'
    id, position, name, userid

    1 , 1 , foo, 20
    2 , 1 , bar, 322
    3 , 1 , foo, 322
    4 , 2 , bar, 20
    5 , 2 , foo, 45
    6 , 2 , bar, 45
    7 , 3 , bar, 20
    8 , 3 , foo, 45
    9 , 3 , bar, 45
    10 , 4 , bar, 20
    11 , 4 , foo, 45
    12 , 4 , bar, 45
    13 , 5 , bar, 20
    14 , 5 , foo, 45
    15 , 5 , bar, 45
    16 , 6 , bar, 20
    17 , 6 , foo, 45
    18 , 6 , bar, 45

    I need to SELECT each time only six rows group by position order by random.
    The query :
    SELECT id, position, name FROM myTable GROUP BY position ORDER BY RAND();
    will be extremely slow for my needs
    I have an idea to use some kind of UNION but I need best solution because load of the database is heavy. I'll appreciate anybody who can help me! Thanks in advance.

  • #2
    Add column with random double value (let it be 'random_value'). So, You'll have:

    1 , 1 , foo, 20, 0.1231
    2 , 1 , bar, 322, 0.524
    3 , 1 , foo, 322, 0.0138
    4 , 2 , bar, 20, 0.275
    5 , 2 , foo, 45, 0.9943
    6 , 2 , bar, 45, 0.8783
    7 , 3 , bar, 20, 0.13754
    etc.

    Now, in Your program (php/java/etc), before executing query get some random value, for example in java:

    Double javaRandomValue = Math.random();

    and construct Your query in following way:


    SELECT id,position,nameFROM myTableWHERE myTable.randomValue > javaRandomValueGROUP BY positionORDER BY myTable.randomValue


    By the way, Your statement does not return truly random rows. Rand() is executed only once, before statement is executed, so doing ORDER BY RAND() is equal to ORDER BY constant.

    Comment


    • #3
      10x for advice. It's fairly nice approach. I will use it lucek, but I have a little concern for that:
      If it happens: there is not enought (myTable.randomValue > javaRandomValue) returned rows I will not get 6 rows (one per position)
      Your example
      1 , 1 , foo, 20, 0.1231
      2 , 1 , bar, 322, 0.524
      3 , 1 , foo, 322, 0.0138
      4 , 2 , bar, 20, 0.275
      5 , 2 , foo, 45, 0.9943
      6 , 2 , bar, 45, 0.8783
      7 , 3 , bar, 20, 0.13754

      So if more of the random double value will less then javaRandValue I will return empty or less row then I needed. If I understand correctly this could happen sometimes. There is no option for me to return less thens 6 rows -> one row per position.
      10x in advance

      Comment


      • #4
        You really want to get rid of the order by if you're concerned about performance. Adding a column (and probably an index to help your sort) seems very wasteful to me.

        Instead, find the max auto incrementing ID of your table, and use it to generate six different random numbers, none larger than the current maximum ID of your table (which you can get from SELECT MAX(id), SHOW TABLE STATUS LIKE, SHOW CREATE TABLE). Then you can do this:

        (SELECT * FROM mytable WHERE id >= x1 LIMIT 1)
        UNION
        (SELECT * FROM mytable WHERE id >= x2 LIMIT 1)
        UNION
        ...

        The downside here is that it is possible that some of these queries may return the same row, especially if you have large gaps in your ID range. The upside is that these queries are index range scans, which should be very fast and you completely eliminate the sort.

        Comment


        • #5
          That was the first idea which I had to use UNION for each position but for generate random value per position with php/java ( run away from ORDER BY RAND() ) I need to know each id because there is no auto increment id. There is no matter if I put auto increment id because I must know which id on which position correspond. Unique key is a pair (user id, position). The main concern to me is more than 10,000 query per second to this table. In this situation select all rows to get ids will be heavy. I really don't know how to figure that out. May be some test of all variant which I can get and see performance, would be the key to resolve this problem. If that was a normal web page there would be no matter what I can do and these concerns wouldn't be so important. But this heavy traffic makes our server crazy. I know there is a lot of people around the world, run into this problem and I hope someone can share his knowledge. Thanks to everybody who are trying to help.

          Comment


          • #6
            I am Vietnamese, I don't use E well. I am sorry if....

            I have the same problem with you, i want order before group. Afther i find out this topic, i find out something at:
            http://dev.mysql.com/doc/refman/5.0/en/select.html

            , you can Ctrl+F to find row ORDER BY NULL

            "If you use GROUP BY, output rows are sorted according to the GROUP BY columns as if you had an ORDER BY for the same columns. To avoid the overhead of sorting that GROUP BY produces, add ORDER BY NULL: "


            SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;

            I solved my problem completely, very happy. And I hope it usefull for you.

            If you use GROUP BY, output rows are sorted according to the GROUP BY columns as if you had an ORDER BY for the same columns. To avoid the overhead of sorting that GROUP BY produces, add ORDER BY NULL:

            SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;

            Comment


            • #7
              I solved my problem in other way. Generating temporary table(permanent but replaced by script) with templates, which replace in 10 minutes. This helps me to know just how many are they. And then I use script language rand() to find random number between 1 and MAX_TEMPLATES_NUMBER. Than I put that value into LIMIT and get one of this template. That increased performance enourmous. For more performance I duplicate this temporary table and UPDATE/INSERT into table that I don't SELECT current 10 minutes. Hope that can help someone.

              Comment

              Working...
              X