GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Inconsistent use of time index

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inconsistent use of time index

    I am experiencing strange behavior out of MySQL 5.1.30. I have a table with (among others) a datetime field. The field has an index on it. My query does a join of two tables: image that contains time data, and image_change that is associated with image through a foreign key constraint. The image_change table contains about 2M records.

    When I execute this query:

    explain select ic.* from image_change ic, image iwhere ic.image_id=i.id and i.camera_id=2 and (i.time > '2009-03-01' and i.time <'2009-04-15')

    I get the following response:

    1, 'SIMPLE', 'i', 'range', 'PRIMARY,FK5FAA95B637C4FBA,Index_Time', 'Index_Time', '8', '', 746, 'Using where'1, 'SIMPLE', 'ic', 'ref', 'FK10B73B14D0FB3FBA,Index_Image', 'FK10B73B14D0FB3FBA', '8', 'reboard.i.id', 558, ''


    Notice that 'Index_Time' is being used. When I run the same query with a slightly different date:


    explain select ic.* from image_change ic, image iwhere ic.image_id=i.id and i.camera_id=2 and (i.time > '2009-03-01' and i.time <'2009-04-16')


    I get a response that indicates that the index is not being used.

    1, 'SIMPLE', 'ic', 'ALL', 'FK10B73B14D0FB3FBA,Index_Image', '', '', '', 2065041, ''1, 'SIMPLE', 'i', 'eq_ref', 'PRIMARY,FK5FAA95B637C4FBA,Index_Time', 'PRIMARY', '8', 'reboard.ic.image_id', 1, 'Using where'


    If I actually run the query that doesn't use the index, it takes about two orders of magnitude slower to run.

    Any thoughts on what to check?

    Thanks,

    Gene

  • #2
    Could you post the result of SHOW INDEXES for the two tables? Also, try running the following:


    explain select i.id from image i where i.camera_id=2 and (i.time > '2009-03-01' and i.time <'2009-04-15')



    explain select i.id from image i where i.camera_id=2 and (i.time > '2009-03-01' and i.time <'2009-04-16')


    I'm guessing the results of the second EXPLAIN will estimate very large number of rows. This could actually be correct if you have a lot of rows with a time value between 4/15 and 4/16, but more than likely there's a problem with the index statistics. Try running ANALYZE TABLE on image and image_change and see if that helps.

    If that doesn't help, you can use STRAIGHT_JOIN to force the optimizer to join the tables in a particular order. That should avoid the nasty table scan of image_change you're doing. In general its better to let the optimizer make the decision by itself, but if its just being stupid, then you can force it.

    Comment


    • #3
      Here are the results of the queries:


      explain select i.id from image i where i.camera_id=2 and (i.time > '2009-03-01' and i.time <'2009-04-15');1, 'SIMPLE', 'i', 'range', 'FK5FAA95B637C4FBA,Index_Time', 'Index_Time', '8', '', 746, 'Using where'explain select i.id from image i where i.camera_id=2 and (i.time > '2009-03-01' and i.time <'2009-04-16')1, 'SIMPLE', 'i', 'ref', 'FK5FAA95B637C4FBA,Index_Time', 'FK5FAA95B637C4FBA', '8', 'const', 4484, 'Using where'


      As you suspected, one estimates a somewhat larger number of records, 4484 vs. 746 out of 27252 total in the table.
      But the query


      select count(1) from image i where i.camera_id=2 and (i.time > '2009-03-01' and i.time <'2009-04-16')

      returns 70, and the corresponding query ending on '2009-14-15' returns 57 records.

      So it seems like something is fouling up the optimizer. ANALYZE TABLE produces nothing surprising:

      'reboard.image', 'analyze', 'status', 'OK''reboard.image_change', 'analyze', 'status', 'OK'


      Unfortunately, I do not have direct access to this query, as it comes from the Hibernate layer. So I am not sure I know how to force it to do a straight join. I will look into that, however.

      Thanks again,

      Gene

      Comment


      • #4
        The estimates in the results of EXPLAIN are meant to capture the number of rows examined by the database engine, not the number of rows in the result set. So its actually more fair to compare

        explain select i.id from image i where i.camera_id=2 and (i.time > '2009-03-01' and i.time <'2009-04-15');

        with

        select count(*) from image where (i.time > '2009-03-01' and i.time <'2009-04-15');

        and

        explain select i.id from image i where i.camera_id=2 and (i.time > '2009-03-01' and i.time <'2009-04-16')

        with

        select count(*) from image i where i.camera_id=2


        Is the output of EXPLAIN for your two original queries the same both before and after you ran ANALZYE? That command actually recomputes the index stats, so if those where somehow stale, running this command could affect the query plan.

        What's the output of SHOW INDEXES for your two tables, by the way?

        Comment


        • #5
          The indexes are:


          'image', 0, 'PRIMARY', 1, 'id', 'A', 27075, , '', '', 'BTREE', '''image', 0, 'external_id', 1, 'external_id', 'A', 27075, , '', '', 'BTREE', '''image', 1, 'FK5FAA95B637C4FBA', 1, 'camera_id', 'A', 12, , '', '', 'BTREE', '''image', 1, 'FK5FAA95BF1B06135', 1, 'grouping_id', 'A', 27075, , '', 'YES', 'BTREE', '''image', 1, 'Index_Time', 1, 'time', 'A', 27075, , '', '', 'BTREE', '''image_change', 0, 'PRIMARY', 1, 'id', 'A', 2070227, , '', '', 'BTREE', '''image_change', 1, 'FK10B73B14D0FB3FBA', 1, 'image_id', 'A', 4442, , '', '', 'BTREE', '''image_change', 1, 'Index_Image', 1, 'image_id', 'A', 754, , '', '', 'BTREE', ''


          The output of EXPLAIN was not affected by running ANALYZE.


          explain select i.id from image i where i.camera_id=2 and (i.time > '2009-03-01' and i.time <'2009-04-15');

          produces 746 rows as the estimate; the corresponding select count(*) query returns 747. Pretty good! With the longer date range, the value goes to 4500 for the explain, and 2097 for the select(*).

          If I run

          select count(*) from image i where (i.time > '2009-03-01' and i.time <'2009-04-16')


          the query returns 843, that is the total number of images in that range regardless of which camera was involved. 70 of these are associated with camera_id = 2. Reducing the time range to end on '2009-04-15' reduces the numbers to 747 and 57, respectively.

          So we're not talking about a lot of data difference between these points. I am also not sure how significant the difference is between the 4500 estimate and the 2097 actual counts.

          Thanks again for all your help!

          Gene

          Comment


          • #6
            Try this:

            ALTER TABLE image ADD INDEX image_ix_cameraid_time_id( camera_id, time, id);


            Since you have a:

            ...WHERE i.camera_id=2 AND ( i.time > '2009-03-01' AND i.time <'2009-04-16' )

            Adding a index on the columns (camera_id, time) makes it possible for the optimizer to use it for both the conditions so it doesn't have to choose between on or the other index.
            And by adding the id as the last column, which is needed for the join, it doesn't even have to read anything from the table since all information exists in the index. This saves us extra seek times.

            2.
            How many rows does the larger date range return in the result set?

            SELECT COUNT(*) from image_change ic, image iwhere ic.image_id=i.id and i.camera_id=2 and (i.time > '2009-03-01' and i.time <'2009-04-16')

            And compared to the smaller date range?

            Since if there are a lot of matching records in the image_change table (your 2M table) and you have selected ic.* it could be faster to read all these rows sequentially and then throw away the non matching ones.
            Not using indexes aren't always a bad thing since todays disks can pump a lot of data sequentially but show a pretty poor performance when reading data randomly.



            BTW I suggest using the INNER JOIN syntax instead:

            SELECT ic.*FROM image_change icINNER JOIN image i ON ic.image_id=i.id AND i.camera_id=2 AND ( i.time > '2009-03-01' AND i.time <'2009-04-16' )

            It makes the query much easier to read and the risk of forgetting a join condition is virtually impossible.
            And it integrates seamlessly (both visually and logically) with LEFT JOIN's.

            Comment


            • #7
              Thanks for the great suggestion about how to modify the index -- that seems to have done the trick. I am not sure I will be able to change the query that's produced, however, as I am using Hibernate. Also, I will need to poke around a bit to find out how to get the three-field index to be generated automatically by Hibernate (as opposed to my mucking with the MySQL tools).

              Thanks again for your help!

              Gene

              Comment

              Working...
              X