GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Break huge table into thousands of smaller ones?

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Break huge table into thousands of smaller ones?

    Hey,

    Quickie for you, in an app, picture a HUGE table that contains parameter values for subscribers, where each subscriber might have 3,000 parameters associated to them, so we'd have a table like

    CREATE TABLE my_large_table (
    id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    param_id BIGINT NOT NULL,
    pvalue VARCHAR(64000) DEFAULT NULL,
    ptype_id BIGINT NOT NULL DEFAULT 'unknown',
    paccess VARCHAR(64000) DEFAULT NULL,
    subscriberid BIGINT NOT NULL,
    INDEX idx_subscriberid ( subscriberid )
    )
    AUTO_INCREMENT = 1
    ENGINE = MyISAM
    DEFAULT CHARACTER SET utf8
    DEFAULT COLLATE utf8_bin;

    We could have eg. 100,000 - 1,000,000,000 subscribers

    Now .. this is MyISAM and as it is, does not suit do to if you have 30,000 subscribers logging in and updating parameters, MyISAM will lock the table on every update/insert blocking others. Everything crawls

    Ok .. there are 2 solutions, I know the one I'd back, but due to some guy here in work just thinking the other solution is cool or something, we're going with it.. I'd just like to get peoples opinion

    Solution 1: Change the table to InnoDB so enable row locking and get rid of the problem with multiple subscribers updating their parameters (not sure if it could be partitioned based on subscriberid, think I looked at this and was shot down due to it not being unique in the table)

    Solution 2: Split the table up into thousands ( 100,000 - 1,000,000,000 ) of smaller tables, so it works out 1 per subscriber, it's still MyISAM but table locking isn't an issue anymore

    IMHO it has to be 1,
    Solution 2 isn't very very very VERY VERY VERY bad, but if you want to do any queries that involve multiple subscribers, you're screwed. Imagine having to find all subscribers with a particular parameter value! Still, it solves the immediate problem, its just very short sighted. Mysql will handle a database with millions of tables and if all you do is get 1 subscribers details, update them, read 1 etc. its fine

    Is there a limit on the number of tables you can throw into a union? Also, the number of tables would be dynamic, a new subscriber comes along, its a new table.. how can you write a query to do union an ever changing number of tables!

    I'd be very interested in others opinions.. I don't want to colour peoples, so please look at this with an open mind

    Bal

  • #2
    Whoever suggested solution #2 should benchmark how the intended filesystem copes with a directory with 100k files in it.

    I suggest getting a copy of our book (second edition of High Performance MySQL). These kinds of questions are answered thoroughly there. There's no need to reinvent the wheel and rediscover the acute misery of a bad design that is hard to change by the time you discover how bad it is.

    Comment


    • #3
      Well put, listen to the Baron!

      Shoot down the other guy immediately!
      Databases are not built to handle 10 rows in 1,000,000 tables. They are built to handle 1,000,000 rows in 10 tables.

      Splitting related data into so many tables will just create a nightmare to work with and you will get bad performance.

      Comment


      • #4
        In this particular case, there are 50,000 subscribers with about 3,000 paramater values each.. but it should be able to scale into the millions of subscribers, the table file is about 160gb with about 150million rows

        The problem was, it was a MyISAM table, so locking was a serious issue, one subscriber updates data and the entire table is locked.

        Proper layout would have been a partitioned innodb (with ndbcluster option) table... problem solved.

        But the guys here read something about some big names like google etc. splitting things up into millions of tables (possibly something like geographical mapping s/w with x/y tables) so I was outvoted :/

        I'm still just pissed about the whole thing, one guy got it into his head about splitting them and wasn't really open to listening to anything else. A small bit of knowledge is a very dangerous thing :/

        Now .. the issue of running reports on particular parameters has cropped up .. I've done the 'I told you so' ... but was responded with 'this is what we have and where we are now (ie. split tables) .. so find a way to do a fast query on parameter values across all subscribers'. It's horrible .. read all required parameters from all tables and dump into a temp table, do a sql with thousands of unions.. the options are all disgusting

        I feel I've pretty decent MySQL knowledge and am confident enough with it, but not having the power to implement things as I see it frustrating.. its a typical 'youre a code monkey, get back in your box'

        Thanks for the replies, I needed a little reassurance I wasn't insane )

        Comment


        • #5
          balmark wrote on Thu, 02 September 2010 11:34


          Thanks for the replies, I needed a little reassurance I wasn't insane )


          No you are not!
          The structure that you have now is very much insane.
          Partitioning is one thing but creating one table per user is totally another.

          The biggest problems you are facing now are:
          1. SQL query changes when adding/removing _one_ user.
          As you yourself noted working with this structure is a nightmare, and if you add one user to the database the queries changes.
          So all queries has to be dynamically created.

          2. You can't use prepared statements to speed things up in a proper way since the tablename is constantly changing.

          3. Number of files in one directory (an OS filesystem limit).
          With 50,000 users you already have 150,000 files with MyISAM, this is usually where the performance really start to deteriorate. Most filesystems handle the files in a directory like a linked list and finding a specific file in the directory increases with the nr of files.

          4. File descriptors (OS limit)
          MyISAM needs one filedescriptor per .frm file and one per index file.
          But then it needs to open one file descriptor per table per query.
          So if you have a query with 50,000 tables in the UNION then MySQL will open:
          50,000(.frm) + 50,000(.MYI) + (nrOfSimultaneousQueries * 50,000(.MYD)) = at least 150,000 filedescriptors or 200,000 if there are two simultaneous queries.

          balmark


          A small bit of knowledge is a very dangerous thing :/


          Yes it is!

          balmark


          Now .. the issue of running reports on particular parameters has cropped up .. I've done the 'I told you so' ... but was responded with 'this is what we have and where we are now (ie. split tables)


          The response would instead be that you should change this as as fast as possible before you pour more work into this solution and have more to change later.

          balmark


          .. so find a way to do a fast query on parameter values across all subscribers'.


          It will never happen with a structure like this.

          Seriously, if they don't want to switch to InnoDB (which do have some other drawbacks like much larger larger footprint on disk etc, but you also get a lot of benefits).
          You should at least convince them to use Partitioning the way it is supposed to be used. That is what it's there for, not so that you should invent it by yourself on a per user/table basis.

          And at least you should convince them to make a short performance/usability comparison. It wouldn't take many hours to prove that one table per user is a horrible way to go.

          Good luck!

          Comment


          • #6
            I agree its a very poor design as it stands.
            But to split it into one table per subscriber is just making things worse.

            Personally the first thing Id do is to switch it to innodb, without question.


            >> But the guys here read something about some big names like google etc. splitting things up into millions of tables (possibly something like geographical mapping s/w with x/y tables) so I was outvoted :/

            Typical office problem. A developer reads a bit of info and then generalises it for everything )

            You can either live with the fact you were outvoted, or spend a little time presenting a test case, which compares their idea with that of switching to innodb.
            Sure its a hassle and will involve some coding, but this is the best way to deal with misinformation because people cant argue with facts, especially when you demonstrate them.

            Another idea which you probably wont like, would be to store the documents in mongodb. This is designed to allow documents to have arbitrary attributes, and might be well suited to the XML you're trying to store in a varchar.
            It might not be that easy to integrate, but its well worth considering because performance of mongodb is much better than you'd expect

            It depends how you plan to query the data I guess. Another alternative would be to wait until Sphinx has live index updates to its text fields (coming soon, apparently), and then use Sphinx to query the textual data in SphinxQL (a mysql extension), and then you can join it to whatever query you need.

            Difficult to make any better suggestions without knowing more about the data and how you plan to interrogate it.
            Arbitrary attributes have always been the bane of RDBMS's )

            Comment

            Working...
            X