GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

which is fastest: a join or one big table?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • which is fastest: a join or one big table?

    I seem to be facing two possible scenarios. Which of these would offer the fastest SELECT?

    Option 1) A master table that references 3 other tables via IDs. The nature of the SELECT will require that *every* row of the master table will be joined with a row from one or all of the 3 other tables. Any column in any affected table could be selectable, sortable or "countable" -- and so could benefit from an index; but will all the -joined- table indexes be usable?

    Option 2) All data is in one big table already, with indexes on any selectable column, so virtually any query would be directly indexed; no joins are made. The number of columns has been multiplied greatly (and with much empty data) because I have essentially already joined everything myself and inserted the data into a big, fully indexed table. I wonder if this is better because MySQL need not create a temporary (joined) table.

    Any advice is appreciated.

  • #2
    As usually the answer is: -"It depends!".

    When you normalize your data into different tables you avoid storing duplicates.

    And this essentially compresses the data in the database since the reference id (used in the join) is usually only a few bytes.

    And this means that the DB size is less than if you store everything in one table.

    If this means that it is the difference between that mysql can hold everything in RAM or that it has to read it from disk then you can benefit from it greatly.


    But if on the other hand (due to your datas nature) it does not "compress" so much, then you won't benefit anything since as you said the DBMS has to performa a join instead.


    Then you also have the issue with that if you have a large DB where essentially everything is stored on disk.
    That disks are pretty fast reading sequentially but they are very slow reading randomly.

    On one of the projects I worked on we had a large table (80GB, 200,000,000 rows).
    And unless you knew that your select would use less than 5% percent of the rows in the table then you always forced it to use a table scan.

    A table scan always took about 25 minutes (depending on load on system).
    While a query that used indexes might easily take about 8 hours due to disk seek times.


    So as you see there isn't a straight answer.
    You will have to test it if you want to be sure.

    Comment


    • #3
      Thanks for the reply. Very good to know.

      If we assume that both scenarios take place in RAM -- will a join always be slower than reading from one table? I know, it probably "depends" )

      Quote:

      A table scan always took about 25 minutes (depending on load on system). While a query that used indexes might easily take about 8 hours due to disk seek times.


      That's crazy - I thought indexes were supposed to speed up select queries? (How do you force a table scan?)

      Final question: How do I know if mysql can hold everything in RAM? If I remember correctly, there are mysql settings for this.

      Thanks.

      Comment


      • #4
        bluem wrote on Tue, 22 May 2007 18:19


        If we assume that both scenarios take place in RAM -- will a join always be slower than reading from one table?


        Usually you don't have to think about it since if they are both in RAM then the amount of rows is small enough and since RAM is so very much faster than disks you don't really have the problem.

        Quote:

        A table scan always took about 25 minutes (depending on load on system). While a query that used indexes might easily take about 8 hours due to disk seek times.


        bluem wrote on Tue, 22 May 2007 18:19


        That's crazy - I thought indexes were supposed to speed up select queries? (How do you force a table scan?)


        They are, but using an index means random access and if you are selecting just 1 row from a large table then it is a huge difference.
        But if you are selecting all rows then using an index is just plain silly.
        And somewhere between there you have a breaking point where it is more advantagous to just run a table scan.
        For that perticular database that point was at about 1/20 of the rows.
        And I forced it with a hint in the query but that perticular database was Oracle.


        bluem wrote on Tue, 22 May 2007 18:19


        Final question: How do I know if mysql can hold everything in RAM? If I remember correctly, there are mysql settings for this.


        You can check the size of your DB and check how much RAM you got available on the server.
        If you are running MyISAM table types then the important variable is key_buffer_size.
        If you are running InnoDB then innnodb_buffer_pool_size is the important.
        Read about them in the manual:
        http://dev.mysql.com/doc/refman/5.1/en/server-system-variabl es.html
        http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.htm l

        Comment

        Working...
        X