GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Tag schema

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tag schema

    What tag schema is in your opinon the most efficient?

    --------------
    1 table:
    --------------

    TABLE items
    id (int)
    ...
    tags (text) FULLTEXT search

    --------------
    2 tables:
    --------------

    TABLE items
    id (int)
    ...

    TABLE tags
    item_id (int)
    tag (varchar)

    --------------
    3 tables
    --------------

    TABLE items
    id(int)
    ...

    TABLE tags
    item_id (int)
    tag_id (int)

    TABLE tag_names
    id (int)
    name (varchar)

    Thanks in advance

  • #2
    It of course depends on what you really going to do with it as well as data distribution, such as number of unique tags etc.

    Generally I favor using FullText search approach but in case there are just few real tags using tag_id based approach is better.

    Comment


    • #3
      Funny that this post comes up 30 mins after I was reading this:
      http://www.pui.ch/phred/archives/2005/04/tags-database-schem as.html

      Hope it's useful!
      Toasty

      Comment


      • #4
        Yeah. I know that post does not really favors MySQL FullText Search solution.

        At large extent because MySQL FullText is slow by itself. But you should do good with other search engines for sure.

        Note in real world you would often need more complicated things, such as finding latest, most popular items by tags etc.

        If you do sorting inside of MySQL and it comes together with selfjoins etc it might become quite ugly.

        Comment


        • #5
          >If you do sorting inside of MySQL and it comes together with selfjoins etc it might become quite ugly.

          True it can - I've come across this a fair few times myself! Usually it's pretty workable though. One thing with self-joins like this is that I've really wanted the ability to say to MySQL "order by any tag table I don't mind it doesn't matter" when I'm trying to find the 'top' N objects that match a set of tags.

          Usually I've had 1 object table and X tag tables in a query (as a result of self-joins), and I've wanted to have an ORDER BY on the tag table that mysql looks at first so that I can skip a big filesort just to find the latest N that match. As a workaround you can run a bunch of EXPLAINs first to see which of the tag tables in the self join seems to match the fewest rows and then either order by that one and hope mysql picks it as the first table, or just force a STRAIGHT_JOIN yourself. Either way it's lots of "fun"
          When you guess wrong you can really suffer if you have a large dataset though.

          A pretty horrible alternative with MyISAM at least is to leave the ORDER BY clause off altogether as you can often get things back in the index order you want anyway but this is really really hacky and I wouldn't recommend it at all

          Another alternative if you're looking for say the latest N objects where all the tags you're searching by are very common is just to 'scan' through the object table using a time index rather than have the database search by tag first. (eg: USE INDEX (time_added)) This works much much more quickly than a large filesort in many cases.

          (BTW Peter I know you know all this stuff I just thought it might be of interest to Zanzi


          Actually I've been comparing the self join method with the fulltext method just today, and found the self join to be about twice as quick in most cases if you build your indexes and queries carefully. Both are pretty fast though. (150K objects, 500K tags, always under 0.1 secs for self join (even moderately complex ones), usually around 0.15 secs for boolean mode fulltext)

          The two approaches are not really like-for-like though: fulltext will make writing queries much easier (see: IN BOOLEAN MODE) but as you have to use MyISAM you lose referential integrity etc etc. I'm sure there are other plusses and minuses to each approach too.

          My current thinking for a project I'm currently working on is to use a combination of both, though the full text approach will probably be used just for things that I'm too lazy to write a proper query for (just kidding - I'm not 100% sure what I'm going to do yet)


          Also Zanzi, I've never really tried the normalised 3 table approach to this. I started doing these things a long time before MySQL got the greedy join plan optimiser so didn't want to have to join 13 tables for a 6 tag search as it would have taken ages to come up with the query plan.


          Also I've never really seriously looked at alternatives to fulltext for this because the projects I've been working on have been able to be 'non-fuzzy' in their approach. That is, they've not been search engine style 'something that matches pretty well is fine' type systems, they've needed to return quite specific results.

          Hmm, enough musing out loud I think

          Hope this helps,
          Toasty

          Comment


          • #6
            You're right. It is quite workable both ways and both of them have benefits and drawbacks.

            MySQL FullText search generally is expected to be slower, especially with few different tags as there is a lot of data redundancy.

            Also one does not have to exclude another, for example

            http://www.ljtags.com uses combined approach.

            For search, full text search via sphinx (http://www.sphinxsearch.com) is used.

            For some of related tags stuff working with tag_ids is done.

            In this particular database tags themselve constitute small portion of records anyway so using them instead of ids is fast.

            Also for internal search and sorting sphinx uses words id anyway which makes it very efficient.

            Comment


            • #7
              How about Freetag?
              http://code.google.com/p/freetag/


              -- MySQL dump 9.11-- Freetag Structure v2.02---- Table structure for table `freetags`--CREATE TABLE freetags ( id int(10) unsigned NOT NULL auto_increment, tag varchar(30) NOT NULL default '', raw_tag varchar(50) NOT NULL default '', PRIMARY KEY (id)) TYPE=MyISAM;---- Table structure for table `freetagged_objects`--CREATE TABLE freetagged_objects ( tag_id int(10) unsigned NOT NULL default '0', tagger_id int(10) unsigned NOT NULL default '0', object_id int(10) unsigned NOT NULL default '0', tagged_on datetime NOT NULL default '0000-00-00 00:00:00', PRIMARY KEY (`tag_id`,`tagger_id`,`object_id`), KEY `tag_id_index` (`tag_id`), KEY `tagger_id_index` (`tagger_id`), KEY `object_id_index` (`object_id`)) TYPE=MyISAM;

              Comment

              Working...
              X