Announcement

Announcement Module
Collapse
No announcement yet.

Full table scan query on table with 3mm+ rows, a better way?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Full table scan query on table with 3mm+ rows, a better way?

    Can anyone recommend a better way to do this query?

    This query executes about once an hour right now, we may run it more frequently in the future and the amount of rows in the table will continue to grow (currently at around 3mm +).

    # Query_time: 3.202836 Lock_time: 0.000091 Rows_sent: 1 Rows_examined: 3444018 Rows_affected: 0 Rows_read: 3444018

    select count(*) from phplist_usermessage where date_add(entered,interval 3000 second) > now() and status = "sent";

  • #2
    You will need to use an index to speed this up. Your predicates are

    date_add(entered,interval 3000 second) > now()
    status = "sent";

    The first one can't use an index because the 'entered' column is inside a function. Place the calculation on the now(), not the column.

    The second one is unlikely to be very selective.

    See what happens when you get 'entered' by itself on one side of the operator.

    Comment


    • #3
      Try this:
      Add an index on (entered,sent) and change the query so that you aren't manipulating the column side of the expression:

      select count(*)
      from phplist_usermessage
      where entered > now() - interval 3000 second
      and status = "sent";

      Comment


      • #4
        Thanks for the quick response, I'm not exactly sure what you mean in terms of getting the "entered" field by itself? You're right that indexing won't work on the calculated field and that indexing on "status" (which exists) is not selective, especially since all records eventually have a "sent" status.

        Comment


        • #5
          Thank you so much, this worked in 0.065 seconds!

          Huge improvement. I'm looking at the modification and it looks OK. Is there any reason to think that:

          your updated query:

          -- entered > now() - interval 3000 second

          is NOT the same as the original:

          -- date_add(entered,interval 3000 second) > now()

          I'm looking for double confirmation that the change does not affect anything.

          Comment


          • #6
            Yes, like an algebra equation what you add to one side can be subtracted from the other. Remember that now() gets turned into a constant.

            If you think about it numerically:

            x={
            1
            2
            3
            4
            5
            6
            7
            8
            9
            10
            }

            where x + 3 > 10 would return 8,9,10

            where x > 10 - 3 would also return 8,9,10

            Comment


            • #7
              If you want confirmation, do this:

              mysql> pager md5sum -
              mysql> select * from.....

              (Notice that I changed the count(*) to *)

              Compare the results of both queries.

              Comment

              Working...
              X