Announcement

Announcement Module
Collapse
No announcement yet.

Large number of inserts

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Large number of inserts

    I have a private website for the use of myself and a few friends. It has a database of stuff generated from XML files. For simplicity, every time something in the XML changes, I run a PHP script to regenerate the MySQL tables from scratch. There are 22 tables generated, and about 60,000 rows in total, most of which have at least one text field...

    This used to run in about 10 seconds. Then about a month ago for reasons of general hygiene I reformatted my hard drive, reinstalling Windows and so on, and redownloaded and installed PHP and MySQL, restored my backups...

    Now the PHP script takes something like 15-20 minutes to run. Or is it half an hour? There is never more than 1% CPU usage during this time, but the hard drive is basically sweating the whole time, clicking away. The parsing of the XML files is practically instantaneous. I took the opportunity to rewrite the queries in the script to use prepared statements, but I knew that wasn't the problem (and it didn't help). It's clearly the database queries themselves that are being executed slowly.

    I added innodb_file_per_table, thinking that somehow fragmentation of the ibdata0 file could be the problem. This didn't help, and I did see the files growing so I know I enabled the option correctly. Then I removed innodb_file_per_table, and ran the script several times in a row, thinking that it might burrow itself a nice big hole in the ibdata1 file which could be reused without having to expanding the file any further, but this didn't help performance at all either.

    I don't remember what version of MySQL I had installed earlier. There are also two versions of PHP (thread-safe and non-thread-safe), and possibly different ways of installing MySQL, but I also don't remember how I had them set up earlier. Could something here be causing such a large problem?

    What could it be? Perhaps there is something which wasn't an error (or warning) in a previous version, but is now, and 60,000 warnings are being inefficiently written to a log file somewhere? I don't know anything about how that works in MySQL, though. I don't consider myself at all an expert concerning MySQL.

    I've Googled all over the web for help with this problem, but nothing remotely relevant comes up. Apparently I'm the only one who does so much inserting. I could, of course, rewrite the insertion script to try to keep old stuff that doesn't change, but the script is already 80kb and it would take a ton of work to get something more complex working smoothly. And besides, there is something ugly afoot here, something which should be fixable.

    Any help is appreciated!

  • #2
    I will bet a dollar that you don't bundle your inserts into reasonable size transactions and end up with autocommiting each insert.
    This combined with that you probably have the default setting for innodb_flush_log_at_trx_commit means that each insert will force a flush of the transaction log.
    And on a simple HDD that can only happen about 100-150 times per second. Hence your inserts go very slow while no CPU is reported used.

    Add this to your my.ini:
    innodb_flush_log_at_trx_commit=2
    and things should probably be back to as they where before.

    The drawback is that you can loose the latest inserts with this setting, while with the default 1 it always waits until all data is securely written to the disk.
    But since you recreate that tables all the time that shouldn't be problem for you right?

    Comment


    • #3
      Thank you, that solved it!

      About this possibility of losing inserts... you mean only if there is a crash, right? I can live with that.

      There are a number of ways I can improve my script for incremental speed improvements, but I wanted to get to the bottom of this problem first. Perhaps I will butcher it to do all inserts in single, massive queries, or somehow figure out how to group all the queries into an atomic transaction, then put that setting back to 1.

      Comment


      • #4
        froht wrote on Sat, 27 November 2010 02:32


        About this possibility of losing inserts... you mean only if there is a crash, right? I can live with that.


        Yes that is what I mean, power failure or OS crash.

        froht wrote on Sat, 27 November 2010 02:32


        There are a number of ways I can improve my script for incremental speed improvements, but I wanted to get to the bottom of this problem first. Perhaps I will butcher it to do all inserts in single, massive queries, or somehow figure out how to group all the queries into an atomic transaction, then put that setting back to 1.

        You have two main options:
        1.
        Use MySQL extended insert statements:

        INSERT INTO yourTable (some,columns,here) VALUES (1,2,3),(4,5,6),(7,8,9), ... ;

        That way each insert will insert a lot of rows.
        You can start to try with about max 1000 rows each insert or something. That will probably solve most of your problems.

        2.
        Use "BEGIN" before your first insert to start a transaction (you don't have to change the auto_commit setting since you are now explicitly saying that you want to start a transaction).
        Then run about 1000 insert statements.
        Issue "COMMIT" to commit these 1000 rows and then run the next 1000 rows.
        I.e:

        1. BEGIN2. INSERT, INSERT, INSERT, ...3. COMMIT4. jump to 1 until finished.


        Either of these two alternatives had solved your problem that the server was slow since with these two alternatives you would have written 1000 rows per each flush instead of just 1 row.

        Comment


        • #5
          Hello,
          The drawback is that you can loose the latest inserts with this setting, while with the default 1 it always waits until all data is securely written to the disk...

          Comment


          • #6
            AnnabelleR wrote on Thu, 02 December 2010 13:08
            Hello,
            The drawback is that you can loose the latest inserts with this setting, while with the default 1 it always waits until all data is securely written to the disk...
            Thanks for copying verbatim what I wrote 4 posts up!

            Comment

            Working...
            X