GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Load large data from file into innodb table

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Load large data from file into innodb table

    I need to import 100 million records from a file into a table (schema given below). The table is stored in innodb engine and am using mysql server version 5.0.45.
    I tried using 'LOAD DATA INFILE' to import data; however, its performance deteriorates as more rows are inserted. Is there any trick that can be used to complete this import in less than a couple of hours. Need a response on an urgent basis. Thanks.

    Table schema:

    CREATE TABLE `t` (
    `id` int(11) NOT NULL auto_increment,
    `pid` int(11) NOT NULL,
    `cname` varchar(255) NOT NULL,
    `dname` varchar(255) default NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `index_t_on_pid_and_cname` (`pid`,`cname`),
    KEY `index_tags_on_cname` (`cname`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8

  • #2
    Yeah, I had the same problem but with about a billion rows. You can imagine my frustration waiting for it!

    I found out you can make it go hundreds of times faster by splitting the file up. Go here:
    http://www.fxfisherman.com/forums/forex-metatrader/tools-uti lities/75-csv-splitter-divide-large-csv-files.html

    and about half way down is a program to split a csv file up. I found splitting it into groups of 500,000 has the best trade of size and performance. I then used Navicat to do a batch import - opening all of the files at once and importing them all to the same table.

    Comment


    • #3
      Hello,

      What I could advise is to try an ETL tool. It is one of the best way to load data into Innodb.

      Talend Open Studio is an open source ETL tool for data integration and migration experts. It's easy to learn for a non-technical user. What distinguishes Talend, when it comes to business users, is the tMap component. It allows the user to get a graphical and functional view of integration processes.

      For more information: http://www.talend.com/

      Comment


      • #4
        engine=innodb;

        one CSV datafile 100 million rows sorted in primary key order (order is important to improve bulk loading times – remember innodb clustered primary keys)



        truncate ;

        set autocommit = 0;

        load data infile into table
        ...

        commit;

        runtime 15 mins depending on your hardware and mysql config.



        typical import stats i have observed during bulk loads:

        3.5 - 6.5 million rows imported per min
        210 - 400 million rows per hour

        other optimisations as mentioned above if applicable:

        unqiue_checks = 0;
        foreign_key_checks = 0;

        split the file into chunks

        Comment


        • #5
          See also http://www.mysqlperformanceblog.com/2008/07/03/how-to-load-l arge-files-safely-into-innodb-with-load-data-infile/

          Comment

          • Working...
            X