GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Help with really big tables

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with really big tables

    I have this project where we store keywords by the hour. Then at the end of the month we sum up the data and move it over to another table for research.

    In the past we just stored the keyword, and the datetime in a table labeled after the month that it was in, then just rolled them keywords and summed count over to another table for monthly research.

    The problem is that the keywords where being used as the keys for cross referencing them to other data that was collected, and this as we all know is not a good thing.

    While the first build works really well for what we had in mind, its just not going to cut it where we are trying to head with it.

    At this point we are looking at around 4 billion unique keywords per month. So I thought about using innodb for the daily data, then using another table with each keyword. That table would hold a years worth of unique keywords(all time) worth of data to assign keys for each keyword.

    The problem with this is that the system would be running a lot of inserts into the monthly table with the key, count, and date, however with the central table holding the keyword and key it would mean that the system and the users would all be touching that table and thus create a bottleneck in the system as millions of selects/inserts where run every minute on the one table with over 400 billion keywords, include fulltext searches.

    I have looked at using innob, Sphinx, Partitions, and Im just left stumped at what would be the best way to provide very fast search for users on the monthly totals while at the same time reduce the load from the keywords being used as their own keys.

    In a nutshell I have a data warehouse project, that Im trying to find a way to keep everyone in line with their keys without causing a bottleneck from the all the queries, updates, and inserts on the central table that would keep all the keys aligned for joins.

    So any ideas on the best way to store 400 billion keywords for search, with a key for reference, and still give very fast results?

  • #2
    Here are my new tables. I would think at 400 billion rows in keyword_lexicon it would become a heaping mess with some very very slow queries and the users expect to be able to do complex search as if they are on Google. ;p

    CREATE TABLE `keyword_Count` (
    `ID` int(15) NOT NULL default '0',
    `Count` int(15) NOT NULL default '0',
    `Date` datetime NOT NULL default '0000-00-00 00:00:00',
    PRIMARY KEY (`ID`,`Date`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

    -- --------------------------------------------------------

    --
    -- Table structure for table `keyword_Lexicon`
    --

    CREATE TABLE `keyword_Lexicon` (
    `ID` int(15) NOT NULL auto_increment,
    `Keyword` varchar(80) collate utf8_unicode_ci NOT NULL default '',
    `Status` enum('-1','0','1','2','3','4') collate utf8_unicode_ci NOT NULL default '0',
    PRIMARY KEY (`ID`),
    UNIQUE KEY `Keyword` (`Keyword`),
    FULLTEXT KEY `Keyword_2` (`Keyword`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

    -- --------------------------------------------------------
    --
    -- Table structure for table `keyword_Monthly`
    --
    CREATE TABLE `keyword_Monthly` (
    `ID` int(15) NOT NULL default '0',
    `Count` int(15) NOT NULL default '0',
    `MonthDate` date NOT NULL default '0000-00-00',
    PRIMARY KEY (`ID`,`MonthDate`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

    Comment

    Working...
    X