Recently I had a customer ask me about loading two huge files into InnoDB with LOAD DATA INFILE. The goal was to load this data on many servers without putting it into the binary log. While this is generally a fast way to load data (especially if you disable unique key checks and foreign key checks), I recommended against this. There are several problems with the very large transaction caused by the single statement. We didn’t want to split the file into pieces for the load for various reasons. However, I found a way to load the single file in chunks as though it were many small files, which avoided splitting the file and let us load with many transactions instead of one huge transaction.
The smaller file is 4.1GB and has 260M lines in it; each row is just two bigints. The bigger file was about 20GB and had wider rows with textual data and about 60M lines (as I recall).
When InnoDB loads the file, it creates one big transaction with a lot of undo log entries. This has a lot of costs. To name a few:
- the big LOAD DATA INFILE clogs the binary log and slows replication down. If the load takes 4 hours on the master, it will cause the slave to fall 4 hours behind.
- lots of undo log entries collect in the tablespace. Not only from the load — but from other transactions’ changes too; the purge thread cannot purge them, so everything gets bloated and slow. Even simple SELECT queries might have to scan through lots of obsolete, but not-yet-purged, row versions. Later, the purge thread will have to clean these up. This is how you make InnoDB behave like PostgreSQL
- If the undo log space grows really big, it won’t fit in the buffer pool and InnoDB essentially starts swapping between its buffer pool and the tablespace on disk.
Most seriously, if something should happen and the load needs to roll back, it will take a Very Long Time to do — I hate to think how long. I’m sure it would be faster to just shut everything down and re-clone the machine from another, which takes about 10 or 12 hours. InnoDB is not optimized for rollbacks, it’s optimized for transactions that succeed and commit. Rollback can take an order of magnitude longer to do.
For that reason, we decided to load the file in chunks of a million rows each. (InnoDB internally does operations such as ALTER TABLE in 10k row chunks, by the way; I chose 1M because the rows were small). But how to do this without splitting the file? The answer lies in the Unix fifo. I created a script that reads lines out of the huge file and prints them to a fifo. Then we could use LOAD DATA INFILE on the fifo. Every million lines, the script prints an EOF character to the fifo, closes it and removes it, then re-creates it and keeps printing more lines. If you ‘cat’ the fifo file, you get a million lines at a time from it. The code is pretty simple and I’ve included it in Maatkit just for fun. (It’s unreleased as of yet, but you can get it with the following command: “wget http://www.maatkit.org/trunk/fifo”).
So how did it work? Did it speed up the load?
Not appreciably. There actually was a tiny speedup, but it’s statistically insignificant IMO. I tested this first on an otherwise idle machine with the same hardware as the production machines. First, I did it in one big 4.1GB transaction, then I did it 1 million rows at a time. Here’s the CREATE TABLE:
CREATE TABLE load_test (
col1 bigint(20) NOT NULL,
col2 bigint(20) default NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Here’s the result of loading the entire 4GB file in one chunk:
time mysql -e "set foreign_key_checks=0; set sql_log_bin=0; set unique_checks=0; load data local infile 'infile.txt' into table load_test fields terminated by '\t' lines terminated by '\n' (col1, col2);"
While this ran, I captured vmstat output every 5 seconds and logged it to a file; I also captured the output of “mysqladmin ext -ri5 | grep Handler_write” and logged that to a file.
To load the file in chunks, I split my screen session in two and then ran (approximately — edited for clarity) the following in one terminal:
perl mk-fifo-split infile.txt --fifo /tmp/my-fifo --lines 1000000
And this in the other terminal:
while [ -e /tmp/my-fifo ]; do
time mysql -e "set foreign_key_checks=0; set sql_log_bin=0; set unique_checks=0; load data local infile '/tmp/my-fifo' into table load_test fields terminated by '\t' lines terminated by '\n' (col1, col2);"
Note that the file mentioned in LOAD DATA INFILE is /tmp/my-fifo, not infile.txt!
After I was done, I ran a quick Perl script on the vmstat and mysqladmin log files to grab out the disk activity and rows-per-second to see what the progress was. Here are some graphs. This one is the rows per second from mysqladmin, and the blocks written out per second from vmstat.
And this one is the bytes/sec from Cacti running against this machine. This is only the bytes out per second; for some reason Cacti didn’t seem to be capturing the bytes in per second.
You can see how the curves are roughly logarithmic, which is what you should expect for B-Tree indexes. The two curves on the Cacti graph actually show both files being loaded. It might seem counter-intuitive, but the second (smaller) curve is actually the larger file. It has fewer rows and that’s why it causes less I/O overall.
I also used ‘time’ to run the Perl fifo script, and it used a few minutes of CPU time during the loads. So not very much at all.
Some interesting things to note: the load was probably mostly CPU-bound. vmstat showed from 1% to 3% I/O wait during this time. (I didn’t think to use iostat to see how much the device was actually used, so this isn’t a scientific measurement of how much the load was really waiting for I/O). The single-file load showed about 1 or 2 percent higher I/O wait, and you can see the single-file load uses more blocks per row; I can only speculate that this is the undo log entries being written to disk. (Peter arrived at the same guess independently.)
Unfortunately I didn’t think to log the “cool-down period” after the load ended. It would be fun to see that. Cacti seemed to show no cool-down period — as soon as the load was done it looked like things went back to normal. I suspect that’s not completely true, since the buffer pool must have been overly full with this table’s data.
Next time I do something like this I’ll try smaller chunks, such as 10k rows; and I’ll try to collect more stats. It would also be interesting to try this on an I/O-bound server and see what the performance impact is, especially on other transactions running at the same time.