Announcement Module
No announcement yet.

Optimisation of SELECT on a very large table

Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimisation of SELECT on a very large table

    Hi Everyone

    I have a large table (40m + records):

    CREATE TABLE `Stats_Web_Access_Raw` (
    `Time` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
    `Web_Group_ID` smallint(5) unsigned default NULL,
    `Web_User_ID` mediumint(cool: unsigned default NULL,
    `Secure` enum('Y','N') NOT NULL default 'N',
    `URL` varchar(200) default NULL,
    `Category` int(11) default NULL,
    `filelocation` int(11) default '0',
    KEY `Time` (`Time`,`Web_Group_ID`,`Web_User_ID`),
    KEY `Time_2` (`Time`,`Web_User_ID`)

    The queries I need to run have the following qualities:

    1. order will always be by Time or Time DESC.
    2. Groups (which are identified by Web_Group_ID) contain users (identified by Web_User_ID).
    3. I need to be able to query 1 or more Web_Group_IDs or Web_User_IDs, eg: 'Web_Group_ID in (1,32,97,101)'

    My problem is that the Time part of the index is the only part ever to be used. Here is a typical explain:

    mysql> explain select * from Stats_Web_Access_Raw where Time > "07-02-20 10:48:57" AND Web_User_ID = 37 order by Time desc limit 10;
    +----+-------------+----------------------+-------+--------- ------+--------+---------+------+---------+-------------+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +----+-------------+----------------------+-------+--------- ------+--------+---------+------+---------+-------------+
    | 1 | SIMPLE | Stats_Web_Access_Raw | range | Time,Time_2 | Time_2 | 4 | NULL | 6019062 | Using where |
    +----+-------------+----------------------+-------+--------- ------+--------+---------+------+---------+-------------+

    Note the 'key_len=4' value, indicating that the Time column is the only index part to be used. The result is a slow (several 10s of seconds) query, if the first 10 records are not well distributed (ie all at one end of the table!). I tried playing with a range of different values for the max_seeks_for_key value but it made no difference.

    How do I get mysql to use the additional fields in the index?

    Any comments/tips/suggestions very gratefully received.

  • #2
    You need Time to go after WebUserId so both columns are used

    If you have column with "<" etc, all following columns in index can't be used.


    • #3
      Hi Peter,

      thanks very much for getting back to me so quickly.

      Peter said: "If you have column with "<" etc, all following columns in index can't be used."

      Is there anywhere in the docs where these rules are explained fully? I don't recall seeing anothing covering this particular issue before and I thought I had read the docs pretty thoroughly. If not, what operators are ok? =, <=>, <>? Does changing the engine change these rules?

      Ok so I have already tried reversing the indexes as you suggest. They work brilliantly where we have 1 group or user in the query. As soon as we query on several (eg 'Web_Group_ID in (1,2,3,4)') the DB engine does a filesort to get the ordering correct. If there are a large number of records reterned, the query can take minutes/hours to complete.

      Is there any way to harness the fact that InnoDB clusters the primary key? I believe in my case is a monotonic 6-byte ID allocated by InnoDB (as I don't specify I primary key in the create table), which essentially delivers and ordering by date, and could mean I could lose the Time field from the index ...

      Otherwise I don't see how I can get this working any better, unless there is a solution using multiple table merging, say 1 table per web_group_id ... ?

      Thanks again