Announcement Module
No announcement yet.

Parallel Import in MySQL not improving performance...

Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parallel Import in MySQL not improving performance...

    Hi all,

    As suggested in many post and for instance also in the book 'High Performance MySQL' it is a good idea to restore different tables of a database in parallel. But, it seems like this is not giving a significant improvement in restore time at all.

    For instance. I first tried the following. Using an awk script, I split up one sequential mysql dump file (SQL format) into separate SQLs, taking into account the bits at the start and the end of the files. Then I imported those files in parallel using xargs with the -P option. In this I made sure to import the biggest files first to minimize total import time.

    On one system where I tried this (windows, 1 quad core with mysql 5.0.82-community-nt-log), using 4 parallel import, I did not achieve any increase in restore time at all. The total time was practically the same. Yet what I saw in 'show processlist' was that it was executing SQL statements for different tables in parallel.

    Then I though it could be related to windows and/or this old MySQL version so I tried it on one of the newer systems. This is a dual 6-core processor machine running Red Hat EL 6.2 and mysql 5.1.61. This time, after googling around some more I dumped the database in mysqlimport format with one txt file per table. Then I used mysqlimport with the --use-threads=4 option.

    In the latter case, I am seeing a performance improvement, going from 2 hours 45 minutes with a single SQL file imported in a single thread to 2 hours with 4 threads using mysqlimport.

    I am willing to ignore the results of the windows system (as that will be replaced anyway in the near future), but on the linux system, I would have expected a much higher performance increase.

    Any ideas as to what could be the problem? The mysql configuration on the windows and linux systems is similar (the linux my.cnf is based on the windows my.ini file). It looks like there is some bottleneck in the database which is preventing truly parallel insertion.

    Also, it is perhaps good to mention that we are using a single innodb data file and not file per table. I experimented also with using file per table on my laptop but got a much lower performance.

    Any ideas as to what could be going on?


  • #2
    Some more info.
    On both machines a Dell RAID card is used with write back caching and battery backup. Also the database on the second system (let's focus only on the linux system) is about 80GB in size. The largest table is approx. 7 GB.

    Also, I have attached a mysql config file with passwords and hostnames removed.

    The database is a slave to another database but during restore the slave is (obviously) turned off. Also, there is another system which is a slave to this database. The database is not being using during backup as the application connects only to the master.

    Also, the database uses innodb tables exclusively for the restored database.