No announcement yet.

Large table with huge rows vs. huge columns vs. multiple tables

  • Filter
  • Time
  • Show
Clear All
new posts

  • Large table with huge rows vs. huge columns vs. multiple tables


    I am organizing a database for all the experiment on microarray chips. There are about 230,000 probes that will have different reading and need to be stored in the database. We will perform multiple experiment, which will output a huge number of data. I wonder which set up would give me the best result when updating and querying data.

    1. Large table setup, with 230,000 rows and columns will be added on as we perform more and more experiments. I think this will be a pain every time we do an update.

    2. Large table, but the columns fixed, only the rows will be increasing, which means that we will have A LOT of rows by the end of the 20th experiment. This will help updating but I'm not sure it is recommended.

    3. Store each experiment on a separate table. This seems to be the easiest. But is it the optimized solution?

    Thank you for any help!

  • #2


    • #3
      More information are needed. 230.000 are not so many rows.
      What do you mean with many rows.
      What kind of hardware are you running?
      How does your schema look like?
      Which MySQL-Version do you use?


      • #4
        The schema right now is like this

        Raw data table
        Probe_ID | IP_1 | WCE_1 | norm_IP_1 | norm_WCE_1 | p-value_1 | IP_2 | WCE_2 | norm_IP_2 | norm_WCE_2...

        There are 230,000 probes per experiment, and updating are from .tsv file.

        Probe info table:
        probe_id | probe_location | gene_location .....

        Fragment info table:
        fragment_id | probe_id | Sequence

        (probe_id is fixed and identical for every experiment)

        Views are created to query from probe_info table and raw_data table and fragment_info table to analyze in R (another statistics analysis language) and might be inserted back to a table named Analysis with probe_id and other analyzed data.

        Right now, I just use a very basic set up to get the experiments up and running. MySQL 5.0 is set up on a Desktop Pentium 4, 2.8ghz, nothing fancy.

        The problem was that every time we update the database (new experiments come in), it takes 10-20 minutes to update the columns because the script has to match the probe_id from the table to the probe_id from data file (loaded onto a temp. table)

        Anyway, the loading time is not a problem for now. But I'm afraid that the number of columns will grow too fast to handle (presented was a truncated version, actual experiment has 10 columns each, and we have about 20 experiments right now). We plan to input more data in (looking at 70 experiments in the future), I'm trying to look for a better solution of storing data. That's where the question comes in, multiple tables, add new Rows or new Columns.

        I'm relatively new to SQL and just need to use it as a tool, so please bear with me.

        Thank you


        • #5
          Hi popophobia )

          1. The columntypes are missing.
          2. Memory is missing.
          3. Linux or Windows?
          4. What is this script doing. try the SQL-statement directly.
          5. What kind of StorageEngine are you using?