Battery Learning still problem many years afterPeter Zaitsev
The performance problems caused by battery auto learning go many years back. We wrote about it, other people from MySQL Community too. The situation did not get better, at least not with Dell RAID controllers, H700 and H800 have the same problem too. At the same time situation got worse as a lot more people are running Innodb in full durability mode which is dramatically affected by this setting.
First I should wonder how common this problem is outside of Dell product line ? (Which is using LSI chips) Are there RAID controllers which do not have this problem ? For many installations it would make sense to pay few hundred dollars more per server just to avoid nightmare of scheduling learning cycles.
I’m surprised it takes so many years to do it. Can’t one use capacitor instead and bundle 512MB cache with 512MB of Flash so when power goes down the cache is stored on Flash ? Or can’t one put batteries which can be moved independently ?
It looks like H800 has “transportable non volatile cache (TVNC) as an option but it does not look like it.
As the problem is still there what can you do about it ? First Test it. You can trigger learn cycle by disabling auto-learn and when triggering learning either by MegaCLI or by Open Management tools (see this for example). You will see for how long battery cache gets disabled in your system (it is only part of all learn phase). You can also shift cache mode in Write Through if you do not have very long time for testing. I recommend this testing as part of complex IO subsystem performance testing – if you have RAID check what performance is going to be with failed hard drive (and during rebuild stage), what overhead LVM takes for backup etc. It may be performance drop is not such a bad issue for you so you can just take it during the night or you might need to do something such as getting Slave out of rotation when it is going through the process.
Second. Schedule it. Most systems would be much better with scheduled learning during the night or weekend, where it can be done on different servers at different times with team informed about slower performance than catching everyone by surprise (at least first time).
Third you may chose to compromise on ACID during such period of times. RAID gives an option to force write back even with no battery which will likely trash your database if power goes down during learning process. It may be fine for your data if not you may be able to get less penalty going from innodb_flush_log_at_trx_commit=1 and sync_binlog=1 to values 2 and 0 appropriately. Both can be done without server restart with Innodb Plugin, Percona Server and MySQL 5.5. Note it might also be good to increase innodb_write_io_threads
to get more outstanding requests – without cache it matters a lot for writes. This is a lot better than forcing write cache without Battery as database should not get corrupted in case of bad crash timing, though you may lose some uncommitted transactions and binlog may get out of sync with Innodb transaction logs.
I’m also wondering if this is something where Facebook Flash Cache can be helpful – if it can act instead of hardware BBU cache. Would be interesting to test.