Innodb Recovery – Is large buffer pool always better ?Peter Zaitsev
How does Buffer Pool size affects Innodb Performance ? I always expected the effect to be positive, Innodb with large buffer pool to performing better. Including Recovery of course. I even blogged about it.
It turns out it is not always the case. Last week I was called to help with Innodb crash recovery on box with large amount of memory (64GB) with Innodb Buffer pool configured to be just a bit over 20GB. Innodb recovery was taking long hours with reasonably sized Innodb log files (256M) gradually progressing slower and slower so it was just about 50% after 3 hours with very little disk IO and one CPU 100% busy.
I took oprofile to see what exactly CPU is being spent for and saw the following picture:
samples % app name symbol name
1345161 95.2609 mysqld buf_flush_insert_sorted_into_flush_list
37849 2.6804 no-vmlinux (no symbols)
8865 0.6278 mysql (no symbols)
7423 0.5257 libc-2.4.so (no symbols)
2623 0.1858 mysqld buf_calc_page_new_checksum
So it is inserting pages in the flush list which takes a lot of time…
Innodb is smart about a way it flushes pages, it tries to merge sequential pages together and generally flush things in order. It is not however overly smart inserting pages to the flush list – keeping simple sorted linked list, so being forced to scan good portion of it finding place to insert the page.
So I took a shot at restarting MySQL with much smaller buffer pool size and we got it recovered much faster.
Interesting enough the problem does not seems to happen with all recoveries out where as 20GB buffer pool is not way more than typical size – I keep buffer pool at 12GB on 16GB boxes for example. I guess the problem in this case was – there was enough memory for everything so flush list could grow unrestricted without Innodb bothering to flush anything (I only could see reads while recovery was going with large buffer pool) plus large distinct amount of pages modified so a lot of pages had to be added to the flush list.
I ran into this problem during Innodb recovery and I guess this is when the problem is most visible as Innodb is not doing anything else. I however would guess such inefficient flush list handling can be affecting performance on normal workloads as well.
Lets hope Heikki will find a time to assign this to someone to be fixed soon 🙂
Update: Heikki is telling me it only happens on recovery which is good, Also this function in fact sorts page not by position but by LSN this is why it is used only on recovery.