Announcement

Announcement Module
Collapse
No announcement yet.

How to speed up a simple query?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to speed up a simple query?

    Hello,

    We have the following 2 tables (and indexes) in our database:

    Table 1
    -------
    CREATE TABLE linkentityarticle(id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, article_id INT UNSIGNED NOT NULL, entity_id INT UNSIGNED NOT NULL, score TINYINT UNSIGNED NOT NULL) Engine = InnoDB;

    Indexes
    -------
    1. entity_id
    2. entity_id + article_id
    3. article_id + entity_id
    4. score

    Total # rows in table: 10 million
    Total space used by table: 1.5 GB

    Table 2
    -------
    CREATE TABLE entity(id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, node VARCHAR(255)) Engine = InnoDB;

    Indexes
    ------
    1. node

    Total # rows in table: 600,000
    Total space used by table: 88 MB

    When I run the following query against these tables, it runs for over 30 seconds:

    SELECT DISTINCT article_id
    FROM linkentityarticle d
    INNER JOIN
    (
    SELECT id FROM entity
    WHERE node IN('sports')

    ) AS T1
    ON T1.id = d.entity_id
    ORDER BY d.article_id DESC
    LIMIT 50;

    If I remove ORDER BY d.article_id DESC, it takes 0.3 seconds (i.e. 100 fold improvement).

    Below is the EXPLAIN PLAN WITH ORDER BY

    id: 1
    select_type: PRIMARY
    table:
    type: system
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: 1
    Extra: Using temporary; Using filesort
    *************************** 2. row ***************************
    id: 1
    select_type: PRIMARY
    table: d
    type: ref
    possible_keys: linkentityarticle_entity_id,index_linkentityarticl e_entity_i d_article_id
    key: linkentityarticle_entity_id
    key_len: 4
    ref: const
    rows: 6708
    Extra: Using where
    *************************** 3. row ***************************
    id: 2
    select_type: DERIVED
    table: drift_entity
    type: const
    possible_keys: unique_entity_node
    key: unique_entity_node
    key_len: 257
    ref:
    rows: 1
    Extra: Using index


    Below is the EXPLAIN WITHOUT ORDER BY:

    id: 1
    select_type: PRIMARY
    table:
    type: system
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: 1
    Extra: Using temporary
    *************************** 2. row ***************************
    id: 1
    select_type: PRIMARY
    table: d
    type: ref
    possible_keys: linkentityarticle_entity_id,index_linkentityarticl e_entity_i d_article_id
    key: linkentityarticle_entity_id
    key_len: 4
    ref: const
    rows: 6708
    Extra:
    *************************** 3. row ***************************
    id: 2
    select_type: DERIVED
    table: drift_entity
    type: const
    possible_keys: unique_entity_node
    key: unique_entity_node
    key_len: 257
    ref:
    rows: 1
    Extra: Using index


    We are using MySQL version 5.0.44 and our InnoDB Buffer Pool is set to 2GB.

    Is there anything that can be done to speed up this query when ORDER BY is used (as we need to use it to get the latest data).

    Thank you in advance!

  • #2
    If you omit the sort and limit, how many rows are in the result set? It should be quite a few if sorting takes that long. You should also be able to re-write that query to remove the inner select in favor of an inner join, though I'm not sure if that will help any, but you should get a different query plan out of EXPLAIN.

    Before doing anything else, seriously consider switching that BIGINT to INT. It will shrink the size of the table and all the indexes considerably, and make everything you do with that table faster.

    Comment


    • #3
      LIMIT makes your second query fast. Your first index is redundant since it is contained in your second index. I would change the lay-out of your first table to make use of a clustered index:

      Table 1
      -------
      CREATE TABLE linkentityarticle(
      article_id INT UNSIGNED NOT NULL,
      entity_id INT UNSIGNED NOT NULL,
      score TINYINT UNSIGNED NOT NULL)
      Engine = InnoDB;

      Indexes
      -------
      1. UNIQUE article_id+entity_id
      3. entity_id + article_id
      4. score


      Table 2
      -------
      CREATE TABLE entity(
      id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
      node VARCHAR(255)
      ) Engine = InnoDB;

      Indexes
      ------
      1. node

      SELECT DISTINCT article_id
      FROM linkentityarticle d
      WHERE id IN(SELECT id FROM entity e WHERE node IN('sports') AND e.id = d.entity_id)
      ORDER BY d.article_id DESC
      LIMIT 50;

      To check whether this helps, try this query first:

      SELECT DISTINCT article_id
      FROM linkentityarticle d FORCE INDEX(name of article_id + entity_id index)
      WHERE id IN(SELECT id FROM entity e WHERE node IN('sports') AND e.id = d.entity_id)
      ORDER BY d.article_id DESC
      LIMIT 50;

      Comment


      • #4
        vgatto,

        - Ommitting LIMIT I get 94383 rows as a result
        - Rewriting the query to use the INNER JOIN I do get a slightly different explain plan, but the performance of the query did not improve
        - I don't prefer to change BIGINT to INT, as we might run out of integer space. We had it as INT in another system, and we ran out of space in ~ 2 years, and had to change to BIGINT. So although we could keep it as INT for some time, I prefer to come up with some permanent solution to this query optimization (i.e. not something which would only last until we run out of INT space)

        gmouse,

        - You are right. First index was redundant. I dropped it, but it did not help the speed of the query

        - I cannot change the layout of the table, as there are many stored procedures which use the id field we have in this table


        It seems to me that filesort is slowing down the query. Is there anything that can be done to make this sorting faster?


        Thanks a lot!

        Comment


        • #5
          What about

          SELECT DISTINCT article_id
          FROM linkentityarticle d FORCE INDEX(name of article_id + entity_id index)
          WHERE id IN(SELECT id FROM entity e WHERE node IN('sports') AND e.id = d.entity_id)
          ORDER BY d.article_id DESC
          LIMIT 50;

          +explain

          Comment


          • #6
            gmouse,

            I tried the suggestion, and performance got worse. It now ran for 1 min and 12 seconds. Rows field shows that it searched through the whole table now [Interestingly, it did not use filesort now]. Below is the explain:

            *************************** 1. row ***************************
            id: 1
            select_type: PRIMARY
            table: d
            type: index
            possible_keys: NULL
            key: unique_linkentityarticle_article_id_entity_id
            key_len: 8
            ref: NULL
            rows: 9913948
            Extra: Using where; Using index
            *************************** 2. row ***************************
            id: 2
            select_type: DEPENDENT SUBQUERY
            table: e
            type: const
            possible_keys: PRIMARY,unique_entity_node
            key: unique_entity_node
            key_len: 257
            ref: const
            rows: 1
            Extra: Using index

            Comment


            • #7
              I see. Actually it stops after 50 rows are found but appearantly this can take a very long time for seldomly used node names. The only way to improve running time is by having an index on (node,article_id), but then you must have a table that contains both these fields.

              Comment


              • #8
                gmouse,

                Thank you for the suggestions.

                I am currently creating such a table - just to test things out - if results are good I'll consider implementing it in production.

                Comment

                Working...
                X