GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Mysql not using index for ORDER BY ?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mysql not using index for ORDER BY ?

    Hi all,

    I am having trouble with a query on a large (39m rows) table : MySQL won't use the index, for the ORDER BY clause, and creates a temporary table, which takes a lot of disk space and time.


    EXPLAIN SELECT * FROM histos_backtest ORDER BY id;+----+-------------+-----------------+------+---------------+------+---------+------+----------+----------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-----------------+------+---------------+------+---------+------+----------+----------------+| 1 | SIMPLE | histos_backtest | ALL | NULL | NULL | NULL | NULL | 39252003 | Using filesort |+----+-------------+-----------------+------+---------------+------+---------+------+----------+----------------+



    SHOW INDEX FROM histos_backtest;+-----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |+-----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+| histos_backtest | 0 | PRIMARY | 1 | id | A | 39252003 | NULL | NULL | | BTREE | NULL || histos_backtest | 1 | id | 1 | id | A | 39252003 | NULL | NULL | | BTREE | NULL |+-----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+


    I created an index on the same column as the primary key just to make sure... I use Sybase at work and sometimes, the primary key is not enough. Apparently it is not the case with MySQL.

    Any help would be greatly appreciated )

  • #2
    Whats the schema for this table?

    Youre right about mysql by the way, a PK in mysql is a unique constraint and also an index.

    In MSSQL (and presumably sybase), a PK is only a constraint, which is usually accompanied by an independant clustered index

    Comment


    • #3
      carpii wrote on Sun, 13 May 2007 19:41

      In MSSQL (and presumably sybase), a PK is only a constraint, which is usually accompanied by an independant clustered index


      Is this something with older versions?

      I'm not a mssql or sybase expert but as far as I know (or have read), when you define a column as primary key both is implicitly creating a unique index on it.

      And all databases that I mostly have worked with does it (mysql, postgresql, oracle).

      This due to:
      1. You basically always need a index to get decent performance for the unique constraint.
      2. You very often use the primary key for selecting and then you benefit from an index anyway.

      Ref:
      http://infocenter.sybase.com/help/index.jsp?topic=/com.sybas e.dc34982_1500/html/mig_gde/mig_gde73.htm

      Comment


      • #4
        sterin wrote on Sun, 13 May 2007 17:15


        Is this something with older versions?



        Not that I know of. My experience is with MSSQL, but since they based it on the Sybase code, and arnoooooo came to the same concllusion, I assumed Sybase was the same.

        In MSSQL, if you create a primary key on a table, it will, as you say, automatically add a clustered index on the same fields. That is, if a clustered index does not already exist - you can obviously only have one clustered index.

        But this clustered index can be dropped, and you can add a clustered index to something else if you want. This does not affect the enforcement of the Primary Key constraint, although most likely it would affect performance of it.

        Comment


        • #5
          OK

          As for the initial problem with that mysql is not using an index.

          You are using a MyISAM table, right?

          Because if you used a InnoDB table then it would use the primary key exclusively since the data is stored as leaves on the primary key in a InnoDB table.

          While playing around on my mysql version 5.1 I noticed that it wants to use the filesort as you said.

          The only way I got it to use the index is by using a the hint FORCE INDEX():

          mysql> explain select * from a order by id;+----+-------------+-------+------+---------------+------+---------+------+------+----------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+------+----------------+| 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 4 | Using filesort |+----+-------------+-------+------+---------------+------+---------+------+------+----------------+1 row in set (0.00 sec)mysql> explain select * from a use index (primary) order by id;+----+-------------+-------+------+---------------+------+---------+------+------+----------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------+---------------+------+---------+------+------+----------------+| 1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 4 | Using filesort |+----+-------------+-------+------+---------------+------+---------+------+------+----------------+1 row in set (0.00 sec)mysql> explain select * from a force index (primary) order by id;+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+| 1 | SIMPLE | a | index | NULL | PRIMARY | 4 | NULL | 4 | |+----+-------------+-------+-------+---------------+---------+---------+------+------+-------+1 row in set (0.00 sec)

          Which tells mysql that a table scan is _very_ expensive.
          Hope it helps.

          Comment


          • #6
            Thank you for your suggestions.

            carpii, here is the schema for the table :


            DESCRIBE histos_backtest ; +-----------+------------+------+-----+-------------------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------+------------+------+-----+-------------------+----------------+ | id | bigint(20) | NO | PRI | NULL | auto_increment | | isin | char(12) | NO | | | | | exchange | char(4) | NO | | | | | time | timestamp | NO | | CURRENT_TIMESTAMP | | | field | varchar(5) | NO | | | | | value | float | NO | | | | | validated | tinyint(1) | NO | | | | +-----------+------------+------+-----+-------------------+----------------+ 7 rows in set (0.15 sec)


            sterin : Yes, I am using a MyISAM table. I tried FORCE INDEX, and apparently, MySQL no longer uses a filesort. It seems to be preparing the query for hours however, until it apparently runs out of ram and starts swapping so much that it renders the whole system unusable ! I let it run for a few hours then decided to shut it down.

            Is MySQL trying to cache all the rows before it returns them ? If I try to do a select on a few rows with arbitrary ids, it is very fast. I think I might resort to using a few millions of one row selects ?

            Comment


            • #7
              What version of mysql are you using?

              Does it perform any better if you add a LIMIT clause?

              Also on a completely unrelated note..
              Given the size of your table, and the fact its the only variable length field there, I wonder if that varchar might be better off being a char[] field. Then all your rows would be fixed length and makes it a little easier for mysql to locate rows

              This wont solve your problem however, just an observation )

              Comment


              • #8
                1.
                Just to make clear: I assume that you actually want all rows returned right?

                2.
                Yes the default behaviour for the client library is to cache all rows.
                But if you for example run the mysql frontend you can use it like this:

                mysql --quick [user and database entries here]

                the --quick means that it should not cache up the result on the client side.
                But instead start by printing it out directly instead.

                Comment


                • #9
                  carpii :

                  - I am using MySQL 5.0.27

                  - It does perform well with a LIMIT clause, although I have not tried a huge LIMIT clause, in which case I imagine the results would be the same ?

                  - Using a char field might be a good idea indeed.


                  sterin :

                  - Yes, I want all the rows returned. Or at least I will use all the rows in my program, even if I split the query into several smaller queries.

                  Considering that I will use the rows one by one, is it a mistake to use only one SELECT statement ?

                  I currently am doing the following to temporarily solve the problem :

                  I loop on "SELECT * FROM table WHERE id>XX ORDER BY id LIMIT 1"
                  where XX is initially set to -1 then set to the latest value of id fetched...

                  - Do you know how to get the equivalent of --quick within a Perl program using DBD::MySQL ?

                  [EDIT] apparently I can do that with the RowCacheSize hint... assuming DBD::MySQL supports it

                  Comment


                  • #10
                    When working with large databases learn to not think about using LIMIT think about using WHERE instead to limit the nr of rows returned from a query.
                    The reason is that LIMIT is evaluate last in the query while WHERE is evaluated first and the sooner that the DBMS can rule out unnecessary rows the better.

                    In your case if you are running a perl program that retrieves all rows for you and then process them.
                    I would suggest that you retrieve maybe 10,000 rows (or try some good figure) at a time and if your primary key is normal auto_increment then just use that in a WHERE to limit the nr of rows:

                    -- first selectSELECT * FROM yourTable WHERE id > 0 and id < 10000;-- second selectSELECT * FROM yourTable WHERE id > 10000 and id < 20000;-- etc

                    But naturally you implement this by using prepared statements to speed it up even further.

                    And even if you have deleted a lot of rows from the DB then it still doesn't really matter that you get exactly 10,000 rows each time.
                    Since you are looping thru the table until you reach the COUNT(*) rows in the table you will get all rows in the end anyway.

                    These individual select will perform a range scan on the index and the time to get them is not that different from selecting them all at once.
                    You will get a small overhead with each query but at the same time that will be pretty negligable if you select about 10,000 rows at a time.

                    Comment

                    Working...
                    X