GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Apparently bizarre query plan joins all rows of huge table to tiny table

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Apparently bizarre query plan joins all rows of huge table to tiny table

    Hello,

    I would love some help understanding what I believe to be a bizarre query plan, and how I can avoid it. The table alerts_ad_staging1 is huge (~420M rows), and the table ads_page_group is joined in, but it only has 4 rows. If I remove the join, the query uses the primary key against the alerts_ad_staging1 table, as I would expect. This query should select about 1.8M rows, so the estimated 47807 rows in the second step of the query plan makes no sense to me.


    mysql version: mysql Ver 14.14 Distrib 5.1.31, for debian-linux-gnu (x86_64) using EditLine wrapperquery:explain extended select aa.id, ads_page_group.name from alerts_ad_staging1 aa join ads_page_group on aa.page_group_id = ads_page_group.id where aa.id between 418536945 and 420368655\Gquery plan:*************************** 1. row *************************** id: 1 select_type: SIMPLE table: ads_page_group type: ALLpossible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 4 filtered: 100.00 Extra:*************************** 2. row *************************** id: 1 select_type: SIMPLE table: aa type: refpossible_keys: PRIMARY,alerts_ad_page_group_id key: alerts_ad_page_group_id key_len: 1 ref: bv.ads_page_group.id rows: 47807 filtered: 100.00 Extra: Using where; Using index*************************** 1. row ***************************table definitions:mysql> show create table ads_page_group\G*************************** 1. row *************************** Table: ads_page_groupCreate Table: CREATE TABLE `ads_page_group` ( `id` tinyint(3) unsigned NOT NULL AUTO_INCREMENT, `name` varchar(6) NOT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin11 row in set (0.00 sec)mysql> select * from ads_page_group;+----+--------+| id | name |+----+--------+| 1 | Top | | 2 | Right | | 3 | Left | | 4 | Bottom | +----+--------+4 rows in set (0.00 sec)CREATE TABLE `alerts_ad_staging1` ( `id` int(11) NOT NULL AUTO_INCREMENT,...other columns not referenced by this query... `page_group_id` tinyint(3) unsigned NOT NULL, PRIMARY KEY (`id`,`date`),...other keys not referenced by this query... KEY `alerts_ad_page_group_id` (`page_group_id`),) ENGINE=InnoDB AUTO_INCREMENT=426916572 DEFAULT CHARSET=utf8 COLLATE=utf8_bin/*!50100 PARTITION BY RANGE (TO_DAYS(date))(PARTITION All2008 VALUES LESS THAN (733773) ENGINE = InnoDB, PARTITION Jan09 VALUES LESS THAN (733804) ENGINE = InnoDB, PARTITION Feb09 VALUES LESS THAN (733832) ENGINE = InnoDB, PARTITION Mar09 VALUES LESS THAN (733863) ENGINE = InnoDB, PARTITION Apr09 VALUES LESS THAN (733893) ENGINE = InnoDB, PARTITION May09 VALUES LESS THAN (733924) ENGINE = InnoDB, PARTITION Jun09 VALUES LESS THAN (733954) ENGINE = InnoDB, PARTITION Jul09 VALUES LESS THAN (733985) ENGINE = InnoDB, PARTITION Aug09 VALUES LESS THAN (734016) ENGINE = InnoDB, PARTITION Sep09 VALUES LESS THAN (734046) ENGINE = InnoDB, PARTITION Oct09 VALUES LESS THAN (734077) ENGINE = InnoDB, PARTITION Nov09 VALUES LESS THAN (734107) ENGINE = InnoDB, PARTITION Dec09 VALUES LESS THAN (734138) ENGINE = InnoDB, PARTITION Jan10 VALUES LESS THAN (734169) ENGINE = InnoDB, PARTITION Feb10 VALUES LESS THAN (734197) ENGINE = InnoDB, PARTITION Mar10 VALUES LESS THAN (734228) ENGINE = InnoDB, PARTITION Apr10 VALUES LESS THAN (734258) ENGINE = InnoDB, PARTITION May10 VALUES LESS THAN (734289) ENGINE = InnoDB, PARTITION Jun10 VALUES LESS THAN (734319) ENGINE = InnoDB, PARTITION Jul10 VALUES LESS THAN (734350) ENGINE = InnoDB, PARTITION Aug10 VALUES LESS THAN (734381) ENGINE = InnoDB, PARTITION Sep10 VALUES LESS THAN (734411) ENGINE = InnoDB, PARTITION Oct10 VALUES LESS THAN (734442) ENGINE = InnoDB, PARTITION Nov10 VALUES LESS THAN (734472) ENGINE = InnoDB, PARTITION Dec10 VALUES LESS THAN (734503) ENGINE = InnoDB, PARTITION After10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */


    Thanks in advance for any help. I'm more than happy to provide other information that is helpful.

    Andy

  • #2
    What are the execution times?

    Suppose the query would return only ten rows, then scanning all rows from the table aa is too expensive. In that case the current plan could be much better, given that this index is present.
    ALTER TABLE aa ADD KEY `alerts_ad_page_group_id_id` (`page_group_id`, `id`)

    If all or almost all rows in aa will be returned, as is the case in your situation, then indeed the execution plan is bad. This query would run better:

    select aa.id, ads_page_group.name
    from alerts_ad_staging1 aa
    straight_join ads_page_group on aa.page_group_id = ads_page_group.id
    where aa.id between 418536945 and 420368655

    Comment


    • #3
      good questions,

      The query should return ~1.8M rows, which is about 1/233rd of the table. All of these rows should be contiguous on disk.

      execution times:
      (edited to reflect recent test)
      query with no join: 4 seconds
      query with straight_join: 4 seconds
      query with join: 58 minutes (wow - big difference vs straight_join!)

      Indeed, using straight_join produces the query plan I expected to see in the first place (below). But I am dying to know why the optimizer would choose the plan it did?


      mysql> explain extended select aa.id, ads_page_group.name from alerts_ad_staging1 aa straight_join ads_page_group on aa.page_group_id = ads_page_group.id where aa.id between 418536945 and 420368655\G*************************** 1. row *************************** id: 1 select_type: SIMPLE table: aa type: rangepossible_keys: PRIMARY,alerts_ad_page_group_id key: PRIMARY key_len: 4 ref: NULL rows: 5123229 filtered: 100.00 Extra: Using where*************************** 2. row *************************** id: 1 select_type: SIMPLE table: ads_page_group type: ALLpossible_keys: PRIMARY key: NULL key_len: NULL ref: NULL rows: 4 filtered: 75.00 Extra: Using where; Using join buffer2 rows in set, 1 warning (0.01 sec)


      Thanks,

      Andy

      Comment


      • #4
        http://www.mysqlperformanceblog.com/2008/04/28/the-mysql-opt imizer-the-os-cache-and-sequential-versus-random-io/

        Comment


        • #5
          Thanks Baron, your post is extremely helpful, and I read it carefully so I do understand it.

          However, I am still missing two small pieces:

          1. Would my first query plan cause the optimizer to estimate the aa (big) table reads as sequential or random?

          2. Why the strangely low number in step two of that plan (47807) when the query needs to return 1.8M rows?

          Much appreciated to everyone - this discussion has been very enlightening for me so far.

          Comment


          • #6
            1) I am not sure. I have forgotten or never knew the details of that.

            2) The query plan the optimizer wants to choose is something like this:

            table: ads_page_group
            type: ALL
            possible_keys: PRIMARY
            rows: 4

            "Table-scan ads_page_group using the primary key. 4 rows will be found. For each row, ..."

            table: aa
            type: ref
            key: alerts_ad_page_group_id
            ref: bv.ads_page_group.id
            rows: 47807

            "... use the current value of id look up all matching rows in table 'aa'; on average, 47807 rows should match."

            So indeed we can see that it estimates the total rows will be just under 200k, not 1.8M, but this is "only" a factor of 10 off. For InnoDB, especially with a low-cardinality index such as this, that's not shocking. InnoDB statistics are based on sampling, and it's pretty common to get bad samples, although they are usually somewhat more accurate than this. (Yeah, this is one of the places InnoDB doesn't do a very good job.)

            Comment


            • #7
              Thank you again Baron, that is very helpful.

              If anyone else knows about the following I would be very grateful:

              1. Would my first query plan cause the optimizer to estimate the aa (big) table reads as sequential or random?

              Comment


              • #8
                Anytime a query is the 2nd or subsequent table in a join, the reads to it are almost certainly random, not sequential.

                Comment


                • #9
                  That makes sense as well Baron. Thanks again.

                  Comment

                  Working...
                  X