Announcement

Announcement Module
Collapse
No announcement yet.

How to speed up a search on a table with 2million records?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to speed up a search on a table with 2million records?

    Hi All,

    I have the following table with 2million records.
    Books (BIBID, TITLE, AUTHOR, ISBN, LOC, CALLNO)

    Users will be searching (via a php form) the table based on either TITLE or AUTHOR.
    Two sample queries.
    ------------------------------------------------------------ -------
    SELECT * FROM (SELECT * FROM CATALOG2 WHERE TITLE like '%xml%') AS CATALOGRESULT LIMIT 60, 20
    ------------------------------------------------------------ -------
    SELECT * FROM (SELECT * FROM CATALOG2 WHERE AUTHOR like '%tom%') AS CATALOGRESULT LIMIT 40, 20
    ------------------------------------------------------------ -------
    (I use the 'LIMIT' coz only 20 records are been displayed in a single php page. When users click on Prev or Next a new query is executed and retrieve another 20 records)

    Please point me to the right direction to increase performance.
    1. I already use indexes. But doesn't seems to be enough. Is there any specific type of indexes that I should focus on?
    2. What about Stored Procs and Functions? (whz the difference between those two)
    3. Currently using MYISAM. Do we need to move to INNODB or MEMORY?
    4. Good Tutorial to create a FULLTEXT index if that's the best way to go.

    P.S.
    - MySQL client version: 5.0.51b
    - Server: Solaris 10 with 16GB RAM (Table is only 400MB, so if there's a way we could even dedicate 1GB from RAM just for the table)
    - This is a static table that doesn't get updated after we setup the system


    Thanks in advance.

  • #2
    You could move authors into second table
    Quote:


    authors:
    author int primary key autoincrement
    f_name varchar( 20 ) // You should check the longest name
    s_name varchar( 100 ) // You should check the longest name



    This will also allow You to connect more than one author to one book
    Quote:


    author_to_book:
    author int index
    bibid int index

    (int or longint, which will suit better)
    Search by the author could be made with (pseudocode)
    Quote:




    SELECT * FROM CATALOG2 WHERE AUTHOR = (int)$author


    If You need to find author by part of name (without using html select) You could do it the same way as before. Authors table should contain less records than books table (I'm sure that almost every author has got more that one book on his account).
    Or You could add column "single_char" (+index) with first letter from the s_name, eg: JRR, Tolkien, T.
    Quote:


    (somebody is looking for author with name which starts with Tolk)
    select * from catalog2 inner join authors using author where single_char = 'T' and s_name like '%Tolk%'



    As for title, You could create dictionary table
    Quote:


    dictionary:
    word varchar( 50 ) index
    bibid int index

    (or word+bibid index ?)
    This table will contain used words and books that have got those words in their name. You will have to wait for this table to populate. After that You could look for books inside this table:
    Quote:


    select bibid from dictionary where word = '$word'
    select bibid from dictionary where word in ('$words') // or - (word = '$word1') and/or (word = '$word2') ...

    (second one when user looks for more than one word)
    If there are no books with that word then You should search for them in books table, obtain bibids and insert them inside dictionary table.
    You should also think about something that will block search inside books table when word was used before and didn't get any results (maybe You could put 0 as dictionary.bibid). And You should define min and max word length (so no "a", "in", "do" will be remembered).


    With static table myisam should be faster than innodb.
    As for fulltext, You should find something in mysql documentation and for example in: "MySQL FULLTEXT Searching | O'Reilly Media", "Using Fulltext Indexes in MySQL - Part 1"

    Comment


    • #3
      I think you can use sphinx

      Comment


      • #4
        Thanks for the replies.

        @manhnd: We are looking in to sphinx for a long term solution.

        @januzi: Since we have the data in one table and we have to put out a temporary working version out soon, it might not be feasible for us to split the columns in to different tables.
        We were able to set it up as InnoDB and set the innodb_buffer_size to 1GB.

        Query -
        SELECT * FROM (SELECT * FROM CATALOG2 WHERE TITLE like '%xml%') AS CATALOGRESULT LIMIT 60, 20


        - When we ran the above query for the first time using PHPMyAdmin it takes about 10seconds. But all subsequent requests with the same query takes only about 1sec.
        - When using the PHP form the same query takes about 20-30sec.

        1) Any idea why the discrepancy2
        2) Is there anyway that we can force the MySQL engine to load the whole DB in to the memory and keep it there?

        Comment


        • #5
          query_cache_size ?

          Comment


          • #6
            januzi wrote on Mon, 22 December 2008 11:52

            query_cache_size ?


            This is what we have currently

            query_cache_size-16777216

            Any suggestions on how much should we increase it to?

            There's a very slim chance that two users might search for the same book. So what we need is that the query to be fast the first time.

            Comment


            • #7
              [quote title=dhanu wrote on Fri, 19 December 2008 17:26]Hi All,

              I have the following table with 2million records.
              Books (BIBID, TITLE, AUTHOR, ISBN, LOC, CALLNO)

              Users will be searching (via a php form) the table based on either TITLE or AUTHOR.
              Two sample queries.
              ------------------------------------------------------------ -------
              SELECT * FROM (SELECT * FROM CATALOG2 WHERE TITLE like '%xml%') AS CATALOGRESULT LIMIT 60, 20
              ------------------------------------------------------------ -------
              SELECT * FROM (SELECT * FROM CATALOG2 WHERE AUTHOR like '%tom%') AS CATALOGRESULT LIMIT 40, 20
              ------------------------------------------------------------ -------

              Why do you need those subqueries? Why not directly :

              SELECT *
              FROM CATALOG2
              WHERE TITLE like '%xml%'
              LIMIT 60, 20

              and

              SELECT *
              FROM CATALOG2
              WHERE AUTHOR like '%tom%'
              LIMIT 40, 20

              Compare the EXPLAIN output for your 2 queries and those 2. MySQL is usually really bad with subqueries so those 2 might be faster BUT, as usual, "try, test and time" !

              Comment


              • #8
                bstjean wrote on Tue, 23 December 2008 09:39



                Why do you need those subqueries? Why not directly :

                SELECT *
                FROM CATALOG2
                WHERE TITLE like '%xml%'
                LIMIT 60, 20

                and

                SELECT *
                FROM CATALOG2
                WHERE AUTHOR like '%tom%'
                LIMIT 40, 20

                Compare the EXPLAIN output for your 2 queries and those 2. MySQL is usually really bad with subqueries so those 2 might be faster BUT, as usual, "try, test and time" !






                Yeah. That was a bad idea. I don't know why I did that. Anyway removed the sub query created a MyISAM DB and created a fulltext index and the search time went down drastically.

                Thanks everyone who helped )

                Comment


                • #9
                  Keep an eye on the fulltext mysql index, you may find after a certain threshold it will cease to work or get very very slow!

                  Sphinx is always a good option, and to be honest bloody easy to install!

                  Comment

                  Working...
                  X