Announcement

Announcement Module
Collapse
No announcement yet.

Slow bulk loads innodb

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Slow bulk loads innodb

    Hi,

    I am running out of ideas how to speed up bulk loads of my data (approx 30m rows) into a single innodb table. The load is running at about 200r/s. The same bulk load in an otherwise identical MyISAM table is running at about 3k r/s.

    Here is what I am running (I am using mk-fifo-split script to load in chunks of various sizes, currently at 10k per chunk so I can see progress readily)

    ALTER TABLE cdr_test2 DISABLE KEYS;
    set foreign_key_checks=0;
    set sql_log_bin=0;
    set unique_checks=0;
    LOAD DATA LOCAL INFILE '/tmp/mk-fifo-split' INTO TABLE cdr_test2 CHARACTER SET 'UTF8' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' (CallID,ParentCallID,SessionID,ParentSessionID,Sip SessionID, AccountID,ApplicationID,PPID,StartTime,EndTime,Dur ation,Outb ound,Status,Network,Channel,StartUrl,CalledID,Call erID,Servi ceID,PhoneNumberSid,Disposition,RecordingDuration, DateCreate d,BrowserIP,ScriptThrowable,applicationType)

    The box is on ec2 (ebs volume) with 15G ram and 1TB disk with 4 (HT) cores. The source data and database files are both on the same ebs volume.

    I will add that I have also made the following variable changes which have not improved the load speed much at all, if they have it's not been very noticeable.

    innodb_buffer_pool_size = 5125M
    innodb_log_file_size = 1024M
    innodb_log_buffer_size = 8M

    I have also tried setting this prior to a load:

    set global innodb_flush_log_at_trx_commit=0;

    Any ideas or help would be much appreciated,
    Thanks!
    Aaron

  • #2
    What data type is the PK column?

    It appears that InnoDB requires a PK even for bulk loads (?), and if the clustered key is non-sequential, then engine could be performing a lot of maintenance work.

    Comment


    • #3
      Well actually i did not specify a PK for this table (nor does it have any indexes). Reading on innodb is telling me I should be explicit about the PK. If I were it would be a GUID which is probably not ideal. I'm going to try another more sensible column for a PK and see what happens

      Comment


      • #4
        I went ahead and altered the table to include a new auto increment id column that is also the PK. Surprisingly, this did not help the load rate I'm back to having no ideas

        Comment


        • #5
          Try with SET autocommit=0; ?

          Comment


          • #6
            Just tried that as well, committing every 50k rows (using fifo). That did not lead to a significant speedup. May have helped 5% or so

            Comment


            • #7
              Wonder if the console loader would be any more efficient: http://dev.mysql.com/doc/refman/5.5/en/mysqlimport.html

              Comment


              • #8
                What is "200r/w"?

                If an InnoDB table has no primary key or unique index, it will add an (invisible) one by itself. Either that, or a self-defined auto-increment key is the best you can have, because all inserts are done at the end of the table and not at random spots.

                Maybe you can find some hints at http://www.mysqlperformanceblog.com/2011/01/07/high-rate-ins ertion-with-mysql-and-innodb/

                You should measure what the bottleneck is.

                Comment


                • #9
                  You didn't say how big the table or the database overall is. Does it fit fully into memory?

                  Comment


                  • #10
                    @xaprb: Size is 20million rows. If bulk to load to MyISAM the .MYD file is 170G.
                    @gmouse: that is 200 rows/sec, i've fixed that now thanks

                    Comment


                    • #11
                      @gmouse this is a typical 4 samples dstat gives me during a load. I am interpreting that as disk writes is not the bottleneck, nor is memory for that matter

                      ---load-avg--- ----total-cpu-usage---- -dsk/total- ------memory-usage-----
                      1m 5m 15m |usr sys idl wai hiq siq | read writ | used buff cach free
                      4 3.9 3.5 | 31 1 60 7 0 0 |2985k 30M |6891M 84M 8369M 23M
                      3.6 3.8 3.5 | 32 1 51 15 0 0 |1340k 36M |6891M 84M 8369M 24M
                      3.9 3.9 3.5 | 32 1 54 13 0 0 |1993k 39M |6892M 84M 8371M 20M
                      3.8 3.8 3.5 | 30 1 60 8 0 0 | 12M 24M |6883M 84M 7945M 455M

                      Comment


                      • #12
                        I'm a bit puzzled when you said that creating an explicit auto increment column didn't speed things up, because a default primary key usually performs pretty poorly in bulk insert situations.
                        But that said you could still have an I/O bottleneck.
                        Right now you perform 36MB/s writes with 16% iowait and the 30-32% usr could still be iowait due to that InnoDB uses spin waits since they are faster than going all the way out to OS waits.

                        At the same time 200rows/s sounds suspiciously close to what you would get if you where not using innodb_flush_log_at_trx_commit=0;

                        So there are a couple of options to look into.

                        So:
                        1. Can you show us what your table looks like with the output from SHOW CREATE TABLE ...?
                        2. Attach output from SHOW ENGINE INNODB STATUS (taken during import) to this thread so that we can see a little more what InnoDB is doing.
                        3. Attach output from SHOW GLOBAL VARIABLES when you are running
                        4. And out of curiosity what kind of speed do you get if you simply copy say for example a 50GB file within the same volume? It's not directly comparable since a file copy is entirely sequential but at least we have something to compare with.

                        Comment


                        • #13
                          Well, I have stumbled upon the answer and now I am getting ~25k r/s loads! Turns out my csv data sources were space padded and not trimmed, so basically they were about 10-20x larger than they should have been. Dumb mistake on my part but it illustrates something interesting that apparently the fatter the rows the slower the loads. I'm sure that is not surprising to anyone. It was interesting though that all the performance tweaks I made only helped in small single digit percentage speedups since the rows were so huge.

                          Guess this explains the iowait @sterin

                          Thanks for all your help!

                          Aaron

                          Comment


                          • #14
                            Did you have an opportunity to compare InnoDB to MyISAM loads rates with the new data sources? It would be interesting to see if both engines see a similar rate increase.

                            Comment


                            • #15
                              I just ran 500k rows (the new shrunked down ones) in both InnoDB and MyISAM table with the same DDL (myisam one does not have the auto inc id column). The innodb ran in 22 seconds and MyISAM in 11 seconds. MyISAM was running at about 3,500 rows/sec with the bloated rows. Now it's 50k/sec with the trimmed rows. InnoDB went from 300 rows/sec to 25k r/s. So that is a 15x speed up for MyISAM and a 83x speed up for InnoDB

                              Comment

                              Working...
                              X