GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

About 20 million rows, should I be worried?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • About 20 million rows, should I be worried?

    I am currently implementing a system that interfaces between retailers and wholesalers and needs to have logging capabilities. Problem is that there will be roughly 20,000 retailers accessing about 300 wholesalers.

    Obviously, a row will need to be entered into the table for each retailer accessing each wholesaler (or a potential 20,000 x 300 = 6,000,000 rows).

    Problem is that I need to log visits on a monthly basis for the last 12 months, meaning 6,000,000 x 12 = 72,000,000! However, realistically not every retailer will visit every wholesaler so I'm estimating that in the end I'm looking at about 20,000,000 rows.

    There are two tables:

    retailer_wholesaler_association_table:

    ------------------------------------------------association_id | retailer_id | wholesaler_id------------------------------------------------


    retailer_monthly_visits_table:

    --------------------------------------association_id | month_id | visits--------------------------------------


    My question is, should I be very worried about speed? A SELECT query will look something like:

    SELECT * FROM retailer_wholesaler_association LEFT JOIN retailer_monthly_visits WHERE retailer_id = %

    And an UPDATE will look very similar. Obviously, not very complex statements.

    Server is a Quad Core Intel Xeon Harpertown 3.0 GHz with 16GB DDR2 RAM.

  • #2
    If you have the proper indices, that query should execute very quickly.

    I run a moderately busy forum that routinely runs joins on a 1.2 GB table with 16,000,000 rows (smf_log_topics):


    SELECT t.numReplies, t.numViews, t.locked, ms.subject, t.isSticky, t.ID_POLL, t.ID_MEMBER_STARTED, t.ID_FIRST_MSG, t.ID_LAST_MSG, IFNULL(lt.ID_MSG, -1) + 1 AS new_from FROM (smf_topics AS t, smf_messages AS ms) LEFT JOIN smf_log_topics AS lt ON (lt.ID_TOPIC = 18105 AND lt.ID_MEMBER = 1373) WHERE t.ID_TOPIC = 18105 AND ms.ID_MSG = t.ID_FIRST_MSG LIMIT 1 in /forum/Sources/Display.php line 160, which took 0.00165296 seconds.


    There is an index on smf_log_topics.ID_TOPIC. I'm using InnoDB. smf_log_topics suffers severely from lock contention if using MyISAM, so you may wish to keep that in mind for any large table that is frequently updated.

    Oh, and the machine is a 3.06 GHz Celeron with 1 GB of RAM, 400 MB of which is devoted to the InnoDB buffer pool. The machine handles roughly 1,000,000 queries per day, on top of running the nginx webserver and PHP. CPU load is around 25% and the single IDE disk is about 95% idle according to iostat.

    Your machine is overkill, but it's still critical to use the right database engine and tune it properly.

    Comment


    • #3
      MarkRose,

      Thanks for your reply. Your experience is giving me hope that my current model will have sufficient speed. I appreciate all the data you've given me about your setup and the speeds you're getting. It's very encouraging you are getting such speeds on a relatively modest server, too. Based on what you're saying, my server definitely does seem like overkill! )

      Also we are using InnoDB so hopefully we'll get the same results you are!

      Comment


      • #4
        ylibrach wrote on Sun, 21 September 2008 18:24

        MarkRose,

        Thanks for your reply. Your experience is giving me hope that my current model will have sufficient speed. I appreciate all the data you've given me about your setup and the speeds you're getting. It's very encouraging you are getting such speeds on a relatively modest server, too. Based on what you're saying, my server definitely does seem like overkill! )

        Also we are using InnoDB so hopefully we'll get the same results you are!


        Yeah, just make sure to give InnoDB a large buffer pool. The rule of thumb is roughly 75% of RAM in a DB only machine, leaving enough extra for other miscellaneous things.

        You may also want to look into the setting innodb_flush_method=O_DIRECT. This eliminates the double caching of I/O by InnoDB and the OS.

        Comment

        Working...
        X