GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

optimize IP range join: postgresql is 496 times faster

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • optimize IP range join: postgresql is 496 times faster

    I am very curious to know how to write the following join query for MySQL so it will perform as well as PostgreSQL. I never used PostgreSQL before now. I like MySQL and I always use it. However I was disappointed by its performance on this query and I installed PostgreSQL to compare. The purpose of this query is to map ip addresses to countries. On my CentOS 5 machine, MySQL 5.0.45 takes 10 mins 45 seconds, PostgreSQL 8.1.11 takes 0 mins 1.3 seconds. Yes MySQL is 496 times slower. I'm sure there must be a way to hint MySQL to run faster. Here is the query:


    mysql> select range.id_country from address join range on address.address between range.begin_num and range.end_num;mysql> describe address;+---------+------------------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+---------+------------------+------+-----+---------+-------+| address | int(10) unsigned | YES | | NULL | |+---------+------------------+------+-----+---------+-------+mysql> describe range;+-------------+---------------------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------------+---------------------+------+-----+---------+-------+| begin_num | int(10) unsigned | NO | PRI | | || end_num | int(10) unsigned | YES | UNI | NULL | || id_country | tinyint(3) unsigned | YES | MUL | NULL | |+-------------+---------------------+------+-----+---------+-------+

    Both tables are MyISAM type. Table `address` is 2124 rows (all distinct). Table `range` is 105920 rows (all distinct).

    The best answer I found so far is here. Steinbrink gives a way to write it as a join on subquery that my MySQL finished in 6 min 42 sec (63% the time of the simple join version). That is still 310 times slower than PostgreSQL. Too slow!

    Actually there is a small error in the SQL at that url:

    ORDER BY ip_address DESC
    should be:

    ORDER BY start DESC

    Here is MySQL's explanation of the original query:

    mysql> explain select range.id_country from address join range on address.address between range.begin_num and range.end_num;+----+-------------+---------+------+-----------------+------+---------+------+--------+------------------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+---------+------+-----------------+------+---------+------+--------+------------------------------------------------+| 1 | SIMPLE | address | ALL | NULL | NULL | NULL | NULL | 2124 | || 1 | SIMPLE | range | ALL | PRIMARY,end_num | NULL | NULL | NULL | 105920 | Range checked for each record (index map: 0x7) |+----+-------------+---------+------+-----------------+------+---------+------+--------+------------------------------------------------+

    Here is PostgreSQL's explanation of the original query:

    postgresql# explain select range.id_country from address join range on address.address between range.begin_num and range.end_num; QUERY PLAN----------------------------------------------------------------------------------------------------------------- Nested Loop (cost=5.72..7061709.90 rows=83990942 width=2) -> Seq Scan on range (cost=0.00..3316.47 rows=185547 width=18) -> Bitmap Heap Scan on address (cost=5.72..31.25 rows=453 width=8) Recheck Cond: ((address.address >= "outer".begin_num) AND (address.address <= "outer".end_num)) -> Bitmap Index Scan on addresses_pkey (cost=0.00..5.72 rows=453 width=0) Index Cond: ((address.address >= "outer".begin_num) AND (address.address <= "outer".end_num))

    Here is MySQL's explanation showing that a simple query can use the index on begin_num in a "range" type query plan:

    mysql> explain select id_country from range where 123123123 between begin_num and end_num;+----+-------------+---------+-------+-----------------+---------+---------+------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+---------+-------+-----------------+---------+---------+------+------+-------------+| 1 | SIMPLE | range | range | PRIMARY,end_num | PRIMARY | 4 | NULL | 35 | Using where |+----+-------------+---------+-------+-----------------+---------+---------+------+------+-------------+

    Others encountered this problem before.
    References:

  • #2
    Hi,
    You can try > and < instead of between.

    ON address.address > ranges.begin_num AND address < ranges.end_num

    Comment


    • #3
      MySQL chooses the wrong join order. Try this:

      SELECT range.id_country
      FROM range
      STRAIGHT_JOIN address ON (address.address between range.begin_num and range.end_num);

      Note that the time this query takes, depends on the number of ranges and not so much on the number of addresses.


      The range scan on an index that you give is nice, but not very useful. It is simply not selective enough.

      Comment


      • #4
        Old thread alert )

        Comment


        • #5
          Thanks for the replies. I'll try it.

          Comment


          • #6
            Let me know your findings.

            Comment

            Working...
            X