Announcement

Announcement Module
Collapse
No announcement yet.

Any suggestions to tackle a slow query on a large table with composite PK?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any suggestions to tackle a slow query on a large table with composite PK?

    Actual query:SELECT SQL_NO_CACHE country_abbreviation, country_name, region, city, isp, latitude, longitude FROM ip2location WHERE ip_first <= 3335238065 AND ip_last >= 3335238065; 4.6 secondsProfile:Status Time(initialization) 0.000004checking query cache for query 0.000073Opening tables 0.000014System lock 0.000022Table lock 0.00001init 0.000032optimizing 0.000015statistics 0.000213preparing 0.000131executing 0.000011Sending data 4.651696 <--- What? it's 1 recordend 0.000047query end 0.000008freeing items 0.000019closing tables 0.000011logging slow query 0.000004Explain:1 SIMPLE ip2location range PRIMARY,ip_last,ip_first PRIMARY 4 NULL 1886656 Using whereResearch:SELECT SQL_NO_CACHE ip_first FROM ip2location WHERE ip_last >= 3335238065; .0007 secondsSELECT SQL_NO_CACHE ip_last FROM ip2location WHERE ip_first <= 3335238065; .0007 secondsSELECT SQL_NO_CACHE count(ip_last) FROM ip2location WHERE ip_first <=3335238065; 2,251,823 rows (1.3 seconds)SELECT SQL_NO_CACHE count(ip_first) FROM ip2location WHERE ip_last >= 3335238065; 1,406,113 rows (.65 seconds)SELECT SQL_NO_CACHE max(ip_first) AS first_max FROM ip2location WHERE ip_last >= 3335238065; 4278190080 (1 second)SELECT SQL_NO_CACHE min(ip_first) AS first_min FROM ip2location WHERE ip_last >= 3335238065; 3335237632 (.69 seconds)SELECT SQL_NO_CACHE max(ip_last) AS last_max FROM ip2location WHERE ip_first <= 3335238065; 3335239167 (1.4 seconds)SELECT SQL_NO_CACHE min(ip_last) AS last_min FROM ip2location WHERE ip_first <= 3335238065; 33996343 (1.37 seconds)


    The table itself has a composite PK on ip_first and ip_last. I wasn't sure if it was not correctly using the keys because it was a composite, so i added a normal index on both ip_last and ip_first separately. Initially I thought I saw a 2-4 second improvement in query performance, however, that was months ago. to my knowledge queries like this have not ever run under 4 seconds though. This is a query which runs when someone first visits the site which causes a 5-10 second delay in some cases. It would be great if I could find a way to tune this down somehow to around or below 1 second.

    Thanks in advance for any help!

  • #2
    The schema, for reference:

    CREATE TABLE IF NOT EXISTS `ip2location` (
    `ip_first` int(10) unsigned NOT NULL,
    `ip_last` int(10) unsigned NOT NULL,
    `country_abbreviation` varchar(2) NOT NULL default '',
    `country_name` varchar(42) NOT NULL default '',
    `region` varchar(45) NOT NULL default '',
    `city` varchar(37) NOT NULL default '',
    `latitude` float default NULL,
    `longitude` float default NULL,
    `isp` varchar(103) NOT NULL default '',
    PRIMARY KEY (`ip_first`,`ip_last`),
    KEY `ip_last` (`ip_last`),
    KEY `ip_first` (`ip_first`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    Comment


    • #3
      The problem is the "WHERE first key part ... AND second key part ..." which MySQL handles by performing a range scan on the entire index which in your case is the primary key which in InnoDB basically means a table scan since data is stored in leaves.

      Some info needed to try to solve your problem:
      Do you have ip_first,ip_last combinations that overlap anywhere?
      If you don't then by tweaking your data a bit (if you have "holes" in you sequence) you can for example write your query like this:

      ...WHERE ip_first >= 123456ORDER BY ip_first DESCLIMIT 1;

      Which MySQL will be able to optimize to use only the ip_first index and return a result very fast.

      Comment


      • #4
        Try to add a key: first_last(first, last) and force mysql to use it. Then show us an explain results along with timing.

        Notice: to force mysql to use some key, try USE INDEX(keyname) after the table name.

        Comment


        • #5
          That would be essentially the PK since it is a composite between ip_first and ip_last. I deleted the two other indexes for this set, but left the PK

          Status Time
          (initialization) 0.000005
          checking query cache for query 0.000195
          Opening tables 0.000032
          System lock 0.000025
          Table lock 0.000016
          init 0.000043
          optimizing 0.000019
          statistics 0.000138
          preparing 0.000087
          executing 0.000015
          Sending data 6.511645
          end 0.000044
          query end 0.000007
          freeing items 0.000017
          closing tables 0.000009
          logging slow query 0.001079


          1 SIMPLE ip2location range PRIMARY PRIMARY 4 NULL 1864658 Using where


          (1 total, Query took 4.7419 sec)

          Comment


          • #6
            Did you read my post three posts up?

            Comment


            • #7
              JGilbert wrote on Sun, 07 October 2007 03:54

              That would be essentially the PK since it is a composite between ip_first and ip_last.



              Oops! Sorry - i missed this fact that you already have such index.

              Comment


              • #8
                sterin wrote on Fri, 05 October 2007 17:52


                Some info needed to try to solve your problem:
                Do you have ip_first,ip_last combinations that overlap anywhere?



                SELECT COUNT(*) AS total_rows, ip_first
                FROM ip2location
                GROUP BY ip_first
                HAVING total_rows >1;
                0 results

                SELECT COUNT(*) AS total_rows, ip_last
                FROM ip2location
                GROUP BY ip_last
                HAVING total_rows >1;
                0 results

                SELECT COUNT(*) AS total_rows, CONCAT_WS('_',ip_first,ip_last) as first_last
                FROM ip2location_disk
                GROUP BY first_last
                HAVING total_rows > 1;
                0 results (as you'd expect from a pk)

                I didn't realize before this that ip_first and ip_last were also unique on their own. Need to test more queries.

                Comment


                • #9
                  JGilbert wrote on Sun, 07 October 2007 10:48

                  sterin wrote on Fri, 05 October 2007 17:52


                  Some info needed to try to solve your problem:
                  Do you have ip_first,ip_last combinations that overlap anywhere?



                  I didn't realize before this that ip_first and ip_last were also unique on their own. Need to test more queries.


                  AFAIU, they should be unique, because in generic GeoIP database one given IP should be assigned to one ISP/location only.

                  Comment


                  • #10
                    sterin wrote on Sun, 07 October 2007 08:47

                    Did you read my post three posts up?


                    I've run several queries with different ips and the results are identical to ip_first / last combo WHERE only you're right, mysql optimized out the difference.

                    .00007 seconds is much improved. Thanks for helping me tackle this )

                    Comment

                    Working...
                    X