GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

ORDER by datetime - still using filesort :(

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • ORDER by datetime - still using filesort :(

    I am working on a tough mySQL problem, and need some help optimizing an ORDER BY query. Here is a table description:

    CREATE TABLE `urls` (
    `urls_id` bigint(32) NOT NULL auto_increment,
    `urls_url` varchar(96) collate utf8_unicode_ci NOT NULL default '',
    `urls_last_updated` datetime NOT NULL default '0000-00-00 00:00:00',
    `urls_added` datetime NOT NULL default '0000-00-00 00:00:00',
    `urls_active` tinyint(1) NOT NULL default '0',
    `urls_type` enum('MAIN','CATEGORY','LISTING') collate utf8_unicode_ci NOT NULL default 'LISTING',
    `urls_name` varchar(255) collate utf8_unicode_ci NOT NULL default '',
    `urls_update_count` int(16) NOT NULL default '0',
    `urls_hold_until` datetime NOT NULL default '0000-00-00 00:00:00',
    `urls_last_discovered` datetime NOT NULL default '0000-00-00 00:00:00',
    `urls_dropdown_name` varchar(64) collate utf8_unicode_ci NOT NULL default '',
    `urls_flag` enum('NONE','EXPIRED','FLAGGED','REMOVED') collate utf8_unicode_ci NOT NULL default 'NONE',
    PRIMARY KEY (`urls_id`),
    UNIQUE KEY `urls_unique` (`urls_url`),
    KEY `get_urls` (`urls_active`,`urls_hold_until`,`urls_flag`,`urls _type`),
    KEY `last_updated` (`urls_last_updated`),
    KEY `get_urls2` (`urls_active`,`urls_hold_until`,`urls_flag`,`urls _type`,`ur ls_last_updated`)
    ) ENGINE=InnoDB AUTO_INCREMENT=7621125 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

    select count(*) from urls;

    +----------+
    | count(*) |
    +----------+
    | 7620415 |
    +----------+
    1 row in set (28.24 sec)

    The indexes are experimental to do the query I am trying to optimize below, and are not permanent to the schema.

    here is the query I am trying to optimize:

    SELECT urls_id, urls_url, urls_type, urls_last_updated FROM urls WHERE urls_active = '1' and urls_flag = 'NONE' and urls_hold_until < now() and urls_type = 'CATEGORY' ORDER BY urls_last_updated LIMIT 100;

    Explain on this query:
    +----+-------------+-------+------+--------------------+---- ------+---------+-------+--------+-------------------------- ---+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+-------------+-------+------+--------------------+---- ------+---------+-------+--------+-------------------------- ---+
    | 1 | SIMPLE | urls | ref | get_urls,get_urls2 | get_urls | 1 | const | 418179 | Using where; Using filesort |
    +----+-------------+-------+------+--------------------+---- ------+---------+-------+--------+-------------------------- ---+

    Filesort is a big problem, so if I remove ORDER BY the "Using filesort" goes away. Here is the explain on this query (dropping the ORDER BY)

    SELECT urls_id, urls_url, urls_type, urls_last_updated FROM urls WHERE urls_active = '1' and urls_flag = 'NONE' and urls_hold_until < now() and urls_type = 'CATEGORY' LIMIT 100;

    +----+-------------+-------+-------+---------------+-------- ---+---------+------+---------+-------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+-------------+-------+-------+---------------+-------- ---+---------+------+---------+-------------+
    | 1 | SIMPLE | urls | range | get_urls2 | get_urls2 | 9 | NULL | 3815726 | Using where |
    +----+-------------+-------+-------+---------------+-------- ---+---------+------+---------+-------------+
    1 row in set (0.00 sec)

    The first query with the ORDER BY takes 1 minute 38 seconds over 7 million rows. The second query takes 0.01 seconds (with query cache turned off)

    How do I get this query optimized, understanding that I MUST have the order by in there, or the equivalent.

    Building another table, running periodic summary queries into another table are not acceptable solutions, unfortunately (

  • #2
    Every part of an index after an inequality is ignored (unless for covering indices).

    So the "urls_hold_until < now()" is spoiling your performance. If this is not a very restrictive condition, you should just remove the urls_hold_until part from your index.

    Your get_urls index is redundant btw since any prefix of an index can also be used as an index.

    Comment


    • #3
      Also by using BIGINT all your secondary indexes are larger than they need to be. InnoDB stores the PK for each row in each of your secondary indexes, so by changing it to INT you could decrease your index size.

      Comment

      Working...
      X