GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

multiple versus one row

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • multiple versus one row

    From a purely speed-performance standpoint,

    would it be faster to read a set of data if each item had its own row (ie using a while loop), or if that dataset were packed into a single row, and then parsed in my script?

    Assume updates are infrequent, and the table is very large/trafficed.

  • #2
    I really can't answer such a vague question. In general, you should build a fully normalized schema, and then denormalize IF performance is a problem AND you can prove that this is needed to enhance performance.

    Comment


    • #3
      what about the question is "vague"?

      Comment


      • #4
        Everything that matters. What kind of data? What exactly? You have all kinds of unqualified terms in there like "very large" and "frequent." In some cases one way is faster, in some cases another is. Don't answer with details -- I don't want to know. It doesn't matter. Start with a normalized schema, populate it with realistic test data, run realistic test queries against it, and measure what happens.

        Comment


        • #5
          I wrote a response yesterday but decided I really didn't understand your question either.

          Are you reading the entire table, multiple rows, or just a single row?

          "Packed" how? Multiple columns? Multiple details in a single text column? How do you plan on searching the data? If it's always by key it might be ok but who knows...

          Normalize first. Test. Test. Test some more. Check that your indexes are useful. Test. Test. Test some more. If you're experiencing slowness or if you're bored try stuffing it all into one table and test against your known benchmarks. Everyone's data is slightly different so what works in one case may not work so well in another.

          While a read from a single table could be fast and useful in one specific scenario, it rarely works out so well for all the other tasks the database is asked to handle. You can see this in data warehouses where the data is - at least partially - denormalized to speed up queries BUT that data is (typically) loaded from a normalized transactional database.

          Troy

          Comment


          • #6
            Troy wrote on Wed, 28 April 2010 10:05


            Are you reading the entire table, multiple rows, or just a single row?

            "Packed" how? Multiple columns? Multiple details in a single text column? How do you plan on searching the data? If it's always by key it might be ok but who knows...


            Troy




            Say i want to read a list of favorite cars of a given user from a single table called "favcars", where each car has a made-in-year value. I have 100,000 users.

            i could structure the favcars table where each car/year had its own row in the table, each with the users id as a key, using a while-loop (or self-join) of SELECT/WHERE userid=##### to read out each car for that user

            OR

            i could structure the favcars table where the whole list of car/years for that user is packed into a single cell in one row keyed for that user, using separator/delimiter strings to pack the cars within the cell.

            Comment


            • #7
              Oh, that's horribly ugly. How big is that last column going to be? I have about 100 favorite cars so my record would need to contain about 4,000 characters. You'll need some sort of split or loop to parse them out - and possibly some code if you wanted to do something neat like link the name to a page about the car. If the next guy only likes 1 car there's a ton of wasted space in your database (differing of course on the table type) and temp tables.

              I'm not sure why you'd need a loop or self join in your example. Let's say your table contained 3 columns (member_id, car_year, and car_model). To get the list of cars for a specific member:
              SELECT member_id, CONCAT(car_year, ' ', car_model) AS car
              FROM favcars
              WHERE member_id = 1001

              So now your code just has to loop through the records and comma separate them or whatever. You can't get a lot faster/simpler than that - especially if the member_id is indexed. You'll need to find one record in your packed table or 0 or more records in the multi-row table. A good index will even them out.

              If you're lazy (kidding) and your configuration is set up correctly, you can have MySQL stick it all together for you:
              SELECT member_id, GROUP_CONCAT(CONCAT(car_year, ' ', car_model) SEPARATOR ', ') AS car
              FROM favcars
              WHERE member_id = 1001
              GROUP BY member_id

              Assuming you have a "member" table you can get a list of members and their favorite cars:
              SELECT member.name, CONCAT(favcars.car_year, ' ', favcars.car_model) AS car
              FROM member
              INNER JOIN favcars ON member.id = favcars.member_id

              Your code just needs to watch for when the member name changes to start a new line. Pretty simple.

              Are the users entering their favorite cars in text or choosing from a list? If you're storing text it becomes more cumbersome because it's not as easy to find multiples of the same/similar cars (so finding other members who like the same car gets difficult).

              How about having a "cars" table (with or without the year) and a "favorite_cars" table with keys of the user_id and car_id (hopefully referencing a primary key in the user and car tables). A 3 way join would be very fast and you could then do neat stuff like finding members with similar interests, ranking favorite cars, and storing additional information about the cars or favorites (like ranking favorite brands or ranking favorites for each user).

              I run a couple of car web sites with forums so I run across stuff like this fairly often. If you stick it all in one table or one column you're hamstringing yourself when it comes to functionality and making a data model that's tough to modify. A good data model will be plenty fast. Don't bother denormalizing the data until you can prove there's a speed problem.

              Troy

              Comment


              • #8
                Your advice sounds good. Thats what i needed. Thanks very much Troy!

                Comment

                Working...
                X