GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Count .... Group By... any other way?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count .... Group By... any other way?

    Hello all!

    I am glad i found this site and i have to say it is awesome! Helped me in a lot of ways using mysql.

    Here is my problem. I have two tables in which i store and index keywords for search engine purpose.

    Tables are:
    - keyword2 (230.000 rows)

    CREATE TABLE `keyword2` ( `id` int(10) unsigned NOT NULL auto_increment, `keyword` varchar(50) NOT NULL default '', PRIMARY KEY (`id`), UNIQUE KEY `keyword` (`keyword`)) ENGINE=MyISAM;


    - keyword2_index (almost 5.000.000 rows)

    CREATE TABLE `keyword2_index` ( `item_id` int(10) unsigned NOT NULL default '0', `keyword_id` int(10) unsigned NOT NULL default '0', `cat` enum('a','s','l') NOT NULL default 'a', `type` enum('a','t') NOT NULL default 'a', PRIMARY KEY (`item_id`,`keyword_id`,`cat`), KEY `keyword_id` (`keyword_id`)) ENGINE=MyISAM;


    in table keyword2 are unque keywords for songs, albums, lyrics
    in table keyword2_index are keyword id, song/lyric/album id, category and type of keyword (belonging to artist or title).

    now to the query...
    for instance i want to search for lyric with following keywords:
    Shakira feat. Wyclef Jean Hips Don't Lie

    I strip string with php so i get clean keywords:
    shakira feat wyclef jean hips don t lie

    query:

    SELECT keyword2_index.item_id AS id, COUNT(keyword2_index.item_id) AS numFROM keyword2_index,keyword2WHERE keyword2.id=keyword2_index.keyword_id AND keyword2_index.cat='l' AND ( keyword2.keyword='shakira' OR keyword2.keyword='feat' OR keyword2.keyword='wyclef' OR keyword2.keyword='jean' OR keyword2.keyword='hips' OR keyword2.keyword='don' OR keyword2.keyword='t' OR keyword2.keyword='lie')GROUP BY keyword2_index.item_id ORDER BY num DESC;


    this takes up to 50 seconds to execute. If i do it without COUNT .. GROUP BY it takes less then a second.
    I need the results to be grouped like that so i get the relevance of the search.. from items with most matching keywords to items with less keywords (usualy i don't display items with only one matching keyword)

    EXPLAIN query....

    id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE keyword2 range PRIMARY,keyword keyword 150 NULL 8 Using where; Using temporary; Using filesort1 SIMPLE keyword2_index ref keyword_id keyword_id 4 keyword2.id 21 Using where


    i am using server with 1.5 Ghz and 512 RAM with Mysql 4.1.14

    so.. is there any other solution for this?

    I hope you can help me.
    Thank you in advance!

  • #2
    It would be helpful if you could provide both EXPLAIN statements and queries without COUNT and with count.

    I expect in your case there is large amount of matches (so it is not really 8x21 rows you could see in the stats) so temporary table is needed etc.

    I also should ask you why are you trying to implement search engine in SQL ? It is going to be slow as you will have to traverse very many rows for common keywords.

    Comment


    • #3
      Quote:

      I also should ask you why are you trying to implement search engine in SQL ? It is going to be slow as you will have to traverse very many rows for common keywords.

      What should i use then?

      Here is EXPLAIN with COUNT ... GROUP BY

      id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE keyword2 range PRIMARY,keyword keyword 150 NULL 8 Using where; Using temporary; Using filesort1 SIMPLE keyword2_index ref keyword_id keyword_id 4 keyword2.id 21 Using where


      Here is EXPLAIN without COUNT ... GROUP BY

      id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE keywordd range PRIMARY,keyword keyword 150 NULL 8 Using where1 SIMPLE keyword2_index ref keyword_id keyword_id 4 keyword2.id 21 Using where


      Thank you!

      Matej

      Comment


      • #4
        And if you do _repeated_ runs for these queries they take 50 and 1 sec appropriately ?

        repeated is important as first run could have been uncached and so very different.

        You can use ether MySQL Full Text search (which is also slow but should work for your sizes), Sphinx search (http://www.sphinxsearch.com), lucene or other solutions.

        Comment


        • #5
          yes repeated run is faster but still slow ... 1-10sec ... i think it depends on number of results.
          Problem is also that table will grow... i expect it will grow up to 10 mil by the end of the year.

          How would Full Text version look like then.. table desgin and query?

          Matej

          Comment


          • #6
            If you're using full text search you will simply create FULLTEXT index on columns you want to search.

            Check this page for details:

            http://dev.mysql.com/doc/refman/5.0/...xt-search.html

            Comment


            • #7
              Thank you for all your help!

              I have treid Mysql FULL TEXT index and i have to say i was impressed by speed! Queries that took 50 sec before now took only max 3 sec.
              But then i installed Sphinx and i think i will stick to it.

              It feels so great to make query load from almost a minute down to less then a second... and it's ordered by relevance too )

              Matej

              Comment


              • #8
                Good to hear you got your problem solved

                On medium data sizes both sphinx and MySQL FT Search work well, sphinx however scales much better

                Comment

                Working...
                X