I ran into pretty interesting behavior today. We needed to dump and reload large database and we had pretty good IO subsystem so we started number of mysqldump processes in parallel. Unlike in other case when we did load in parallel, dump in parallel did not increase IO rate significantly and we could still see just about 1.5 average outstanding IO requests to the disk.
Lets look at SHOW INNODB STATUS:
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: doing file i/o (read thread) ev set
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 256, aio writes: 0,
ibuf aio reads: 0, log i/o’s: 0, sync i/o’s: 0
Pending flushes (fsync) log: 0; buffer pool: 0
112084404 OS file reads, 29836003 OS file writes, 2038246 OS fsyncs
1 pending preads, 0 pending pwrites
286.27 reads/s, 82658 avg bytes/read, 2.96 writes/s, 0.71 fsyncs/s
INSERT BUFFER AND ADAPTIVE HASH INDEX
Ibuf: size 108931, free list len 64619, seg size 173551,
56290428 inserts, 59538912 merged recs, 3876269 merges
Hash table size 25499819, used cells 1303661, node heap has 1974 buffer(s)
867.08 hash searches/s, 532.42 non-hash searches/s
Log sequence number 586 10447960
Log flushed up to 586 10447960
Last checkpoint at 586 10264850
0 pending log writes, 0 pending chkp writes
1338030 log i/o’s done, 0.47 log i/o’s/second
BUFFER POOL AND MEMORY
Total memory allocated 13917841802; in additional pool allocated 1046272
Buffer pool size 786432
Free buffers 0
Database pages 784458
Modified db pages 361
Pending reads 261
Pending writes: LRU 0, flush list 0, single page 0
Pages read 344728059, created 2867101, written 37374308
1445.95 reads/s, 0.00 creates/s, 3.35 writes/s
Buffer pool hit rate 933 / 1000
8 queries inside InnoDB, 3 queries in queue
12 read views open inside InnoDB
Main thread process no. 3956, id 1157658976, state: sleeping
Number of rows inserted 60790248, updated 11571576, deleted 0, read 63850963520
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 24860.96 reads/s
The stats are rather interesting. There are 8 queries inside Innodb and we can see only one pending pread with 256 Pending normal aio reads.
Pending normal aio reads are in fact counted in pages rather than distinct IO requests, so in this case it is likely these are 4 read-ahead requests, 64 pages each.
What seems to be happening (I’m using “seems” as there are no detailed enough stats available) is once thread has issued read-ahead request it has to wait for it to complete. It can’t simply go and read the next page it needs, if the same page is already being read via read-ahead. Now as we’re dumping tables which were inserted in PK order we have relatively sequential IO and so read-ahead is constantly triggered which makes all thread to wait on single read-ahead thread which is unable to keep up.
This does not explain the whole story to me, such as I’d expect read-aheads to complete faster giving much more than 30MB/sec from this RAID system. Also as average read size is some 80KB we have a lot of non read ahead requests happening as well, at least 10 times more than number of read-ahead requests. May be it is poor read-ahead performance plus multiple threads doing scattered O_DIRECT (so no OS read-ahead) single page reads which hurt performance ?
I guess we really should try Mark Callaghan patches to see what happens if we allow multiple read-ahead threads.
I also can’t wait for Christoffer Hall’s work of Linux AIO support to be merged to the mainline.
This also reminds me about other work Christoffer was doing (I was adviseor in this project from MySQL AB side) – to improve how Innodb read-ahead works by having more overlap between requests so thread would rarely need to stall unable to perform any quick IO and unable to progress until scheduled read ahead can be completed. Hopefully this also will be looked at one day.