Announcement

Announcement Module
Collapse
No announcement yet.

poor performance ORDERing BY indexed column

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • poor performance ORDERing BY indexed column

    Hello!

    I need help concerning performance of particular SQL query (provided below). Could anyone, please, give me some clue, why the execution time of first query (provided below) is so long, while execution of second query (provided below) is almost instant. Maybe there is any way to optimize the first one? For example, using index on some or several columns, or performing some table optimization routines, or something else.

    All activities are performed in the single table.

    First query:
    select distinct [PRIMARY_KEY_BIGINT(20)] from [TABLE] where lower(concat_ws(' ', [VARCHAR(30)], [VARCHAR(60)])) like 'const_chars%' order by [BTREE_INDEXED_MEDIUMINT(9)] asc limit 20, 20

    Exec time: ~30 secs

    EXPLAIN statement on this query:
    id: 1
    select_type: SIMPLE
    table: [TABLE]
    type: index
    possible_keys: NULL
    key: [BTREE_INDEXED_MEDIUMINT(9)]
    key_len: 4
    ref: NULL
    rows: 202848
    Extra: Using where

    Second query:
    Idetical, except ORDER BY criterion is [UNINDEXED_BIGINT(14)]:

    select distinct [PRIMARY_KEY_BIGINT(20)] from [TABLE] where lower(concat_ws(' ', [VARCHAR(30)], [VARCHAR(60)])) like 'const_chars%' order by [UNINDEXED_BIGINT(14)] asc limit 20, 20

    Exec time: ~0.5 sec

    EXPLAIN statement on this query:
    id: 1
    select_type: SIMPLE
    table: [TABLE]
    type: ALL
    possible_keys: NULL
    key: NULL
    key_len: NULL
    ref: NULL
    rows: 202848
    Extra: Using where; Using filesort


    Some other facts.
    1) if ORDER BY criterion is removed from the first query, execution time of the query becomes almost instant (~0.5 sec). Will provide EXPLAIN results upon request
    2) there is ~200000 entries in the table
    3) MySQL server version is 5.0.41-community
    4) it seems that older versions of databases (lots of inserts/deletes performed) takes more time to process the first query (up to several minutes on db, which was created several month ago), while younger versions takes about 1.5 sec
    5) OPTIMIZE/ANALYZE statements on this table did not helped.


    Thank You very much!

  • #2
    First of all one of your main problems is that you are using a ... WHERE concat_ws(col1,col2) LIKE 'something%'
    This means that you are always forcing mysql to perform a table scan since it can't use an index when you have a function on the column.
    So you should start by thinking about rewriting this query to be able to use an index for the LIKE condition.

    Second is that you are using a function that is redundant.
    lower() is not needed because LIKE's are always performed case insensitive.


    Now to what probably takes time.
    The difference between the two queries is that query 1 is using an index to retrieve the rows in sorted order, while query 2 sorts the rows afterwards.
    Usually it is advantageous to avoid the post sort hence why the database is using an index to retrieve the rows in sorted order.

    This means that the steps to perform query 1 is:
    1. Get first id from index.
    2. Jump to that position in the table
    3. Read the VARCHAR30 and VARCHAR60 fields,
    4. Concatenate these and test the condition.
    5. Throw away rows that does not match.



    While the execution plan in query 2 is:
    1. Read all rows sequentially from the table.
    2. Test the condition
    3. Throw away rows that does not match.
    4. Sort the remaining rows.


    In this case you have different things that potentially takes time in the two execution plans.
    In query 1 execution plan you have a "jump to the position in the table and read the two columns" which is highly dependent on the speed of the disks (if you don't have enough RAM memory so that the entire table is in OS file cache).
    This is typically around 8-10 ms which means that generally you can only fetch about 125 rows/second from the disk.
    In query 2 it is instead reading the entire table sequentially and then discarding rows that doesn't match.
    And reading sequentially from a disk is typically today 50-100MB/s.
    This means that we get a break even at about 50,000,000/125=400,000 bytes/row.
    So as you can see it is only if the rows are larger than 400kb that you gain anything by using an index if you have a lot of matching rows.

    And in your case as you can see reading the table sequentially is much faster than reading it randomly using the index.

    That was a long explanation for something that should take a little time. )

    Start by looking at how you can change your WHERE, especially how you can get rid of the concat_ws. Then you can get an index to work for the WHERE and then you can start to look at the rest.

    Comment


    • #3
      Thank You very much, sterin, for such a nice explanation!
      It is really frustrating, that there is no way to make MySQL optimizer not to use index in such cases, if query becomes so slow.

      However, for the way to solve this out, our team has chosen to create additional row in the [TABLE] ( ( ), where concated [VARCHAR(30)] and [VARCHAR(60)] are stored and being updated by trigger on table inserts and updates ( ( ). Further, we have indexed this row. We really need these values for meeting functional requirements.
      This solves the slowness problem for like 'const_chars%', but does not solve the problem for like '%const_chars%' (or '%const_chars'). ( And I do not see a way to do it right now, at least, for MySQL DBMS.

      Comment


      • #4
        lemn wrote on Mon, 12 November 2007 06:48

        This solves the slowness problem for like 'const_chars%', but does not solve the problem for like '%const_chars%' (or '%const_chars'). ( And I do not see a way to do it right now, at least, for MySQL DBMS.

        When you use LIKE '%something%' on those two columns do you need them to be in order or do you search on half the words?

        Reason why I'm asking is that another solution to this would be to create a lookup table with only a VARCHAR and a id INT column that points the the right row in the main table.
        That way you can join with this lookup table and always have a 'something%' and hence avoiding a index range scan of a any of the columns.

        If you need the '%something' then no DB can solve that for you in a good way. It's not just a MySQL thing.
        To solve it you have to find a boundary where you can say: -"This is always going to be the first character in the search string. Hence I'm going to base my column and index on this."

        Comment

        Working...
        X