Announcement

Announcement Module
Collapse
No announcement yet.

Query Performance Advice

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Query Performance Advice

    Hi,

    I have the following query:

    select messages.sid, messages.mid, messages.sender_addr, recipients.rcpt_addr, messages.subject,
    date_format(from_unixtime(messages.timestamp),"%M %d, %Y %H:%i:%s") as sent_date, ip_system.hostname
    from messages, recipients,ip_system
    where messages.mid = recipients.mid
    AND messages.sid = recipients.sid
    AND messages.sid = ip_system.sid
    AND sender_domain = "sendingcompany.com"
    AND subject like "Some Subject%"
    AND recipients.rcpt_domain = "receivingcompany.com"
    AND messages.timestamp >= unix_timestamp("2007-03-08 20:00:00")
    group by messages.mid
    order by sent_date LIMIT 0,10

    Using InnoDB tables on a Windows 2003 Server (not my choice) with 3.5GB of RAM. Dual Xeon 3.6GHz.

    messages table has about 45M rows
    recipients table has about 70M rows
    ip_system table has 4 rows (2 are logically deleted)

    Explain Extended:
    1,SIMPLE,messages,range,PRIMARY,msg_complete_idx,m sg_sender_ domain_idx,msg_timestamp_idx,msg_sid_domain_timest amp_idx,ms g_sender_domain_idx,14,,430,Using where; Using temporary; Using filesort
    1,SIMPLE,recipients,ref,PRIMARY,rcpt_domain_idx,rc pt_sid_mid _rid_domain,rcpt_sid_mid_rid_domain,6,message_trac ker.messag es.sid,message_tracker.messages.mid,1,Using where
    1,SIMPLE,ip_system,ALL,PRIMARY,,,,4,Using where

    Typically the query uses the index (sender_domain[8],timestamp) on the messages table. sender_domain is unique for these set of companies in 8 characters or less.

    Query times vary based on timeframe (the longer the worse) specified and the message volume of "sendingcompany.com". But can easily take 10-15 minutes. For small volume/timeframes it will be less than a minute.

    I am thinking the group by on the computed field could be changed to the source column.

    Can a sortfile be prevented by using both columns in the index? i.e. sender_domain, timestamp?

    The primary index on the recipients table is mid,sid,rid. This should facilitate the joining of rows from the messages table.

    I believe the ip_system table isn't a big player since it logically has only 2 rows.

    INNODB STATUS OUTPUT
    Per second averages calculated from the last 0 seconds
    ----------
    SEMAPHORES
    ----------
    OS WAIT ARRAY INFO: reservation count 566606, signal count 564804
    Mutex spin waits 15390019, rounds 21880070, OS waits 217206
    RW-shared spins 438734, OS waits 182672; RW-excl spins 448303, OS waits 125671
    ------------
    TRANSACTIONS
    ------------

    --------
    FILE I/O
    --------
    I/O thread 0 state: wait Windows aio (insert buffer thread)
    I/O thread 1 state: wait Windows aio (log thread)
    I/O thread 2 state: wait Windows aio (read thread)
    I/O thread 3 state: wait Windows aio (write thread)
    Pending normal aio reads: 0, aio writes: 0,
    ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
    Pending flushes (fsync) log: 0; buffer pool: 0
    9601252 OS file reads, 8643216 OS file writes, 371528 OS fsyncs
    0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
    -------------------------------------
    INSERT BUFFER AND ADAPTIVE HASH INDEX
    -------------------------------------
    Ibuf for space 0: size 1, free list len 74, seg size 76, is empty
    Ibuf for space 0: size 1, free list len 74, seg size 76,
    5461531 inserts, 5461531 merged recs, 845038 merges
    Hash table size 6607487, used cells 946582, node heap has 1003 buffer(s)
    0.00 hash searches/s, 0.00 non-hash searches/s
    ---
    LOG
    ---
    Log sequence number 404 3161598899
    Log flushed up to 404 3161598899
    Last checkpoint at 404 3161598899
    0 pending log writes, 0 pending chkp writes
    153464 log i/o's done, 0.00 log i/o's/second
    ----------------------
    BUFFER POOL AND MEMORY
    ----------------------
    Total memory allocated 1771661386; in additional pool allocated 490240
    Buffer pool size 101888
    Free buffers 0
    Database pages 100885
    Modified db pages 0
    Pending reads 0
    Pending writes: LRU 0, flush list 0, single page 0
    Pages read 9601242, created 2974457, written 8324174
    0.00 reads/s, 0.00 creates/s, 0.00 writes/s
    No buffer pool page gets since the last printout
    --------------
    ROW OPERATIONS
    --------------
    0 queries inside InnoDB, 0 queries in queue
    1 read views open inside InnoDB
    Main thread id 2760, state: waiting for server activity
    Number of rows inserted 201389704, updated 426, deleted 113685, read 314844469
    0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s

    Any help advice would be appreciated.
    David

  • #2
    Yes the sort on the date is bad from a performance point of view.

    Because of this sort the DB needs to first convert each date to the new format and then it has to compare them.

    Imagine if you have 2,000,000 messages that match your timeframe/company name and each date has to be converted and then you have to sort these 2,000,000 rows.

    But you have a lot of inconcistency in your query:
    1.
    You group on message.mid but you do _not_ use group functions in the SELECT part.
    Which means that you don't know which values you are going to get.
    So which values do you want to get?

    2.
    I also find the sort a bit strange since the output you are going to get is:
    April 4
    December 2
    January 30

    Which means that it is not sorted in date order but in string order which I feel is very strange when I look at a list of dates.
    If you drop this sort we can start to tweak your query.

    Comment

    Working...
    X