October 21, 2014

Magic Innodb Recovery self healing

We have certain type of the table corrupting with Innodb, as it is limited to only one particular index on one particular table type it is likely to be Innodb bug but Heikki currently could not find what could be causing it.

Happily as we have data stored in many tables of same format rather than one monster table these rare corruptions did not cause too much problems to us – as it was not clustered index we always could repair table by running ALTER TABLE with limited impact to production system (as only one table of about a hundred will be locked)

So everything kind of worked (of course we still hoped new MySQL release will have this Innodb bug fixed) until today we got MySQL to crash during recovery process right after 17% of log records were applied.

Such corruption is one of the worst onces, because you can’t really recover data on per table basics.

Also if MySQL crashes during recovery lower values of innodb_force_recovery normally would not work. Reading Documentation one may think you need to set it as far as 6 (SRV_FORCE_NO_LOG_REDO) if crash happens during redo stage of log recovery. As it is very nasty option I wanted to check if this is really required by trying lower recovery settings.

Happily I was able to restart MySQL with innodb_force_recovery=4 (SRV_FORCE_NO_IBUF_MERGE). So in my case probably Insert buffer merge was affecting corrupted page rather than log replay itself so I got lucky.

My initial plan was to now dump potentially corrupted tables (in this particular corruption instance Innodb did not print table name in question), drop tables, restart Innodb without innodb_force_recovery and load them back. But just yesterday one of the customers told me he had Innodb magically healing itself after he was able to complete recovery with innodb_force_recovery=4. So I decided to check if I’m to get lucky second time today.

And Indeed Restarting Innodb with innodb_force_recovery=0 allowed it to start normally and I could rebuild affected tables my
“normal way”.

So when dealing with Innodb recovery for bad corruptions you may want to:

– Start with lower innodb_force_recovery settings and see which one allows you to start.

– After recovery succeeds you can try restarting with lower recovery setting as it may allow you to recover more data or avoid full dump and restore.

Another trick which helped me in previous instances, but not this one is to do full system restart before attempting recovery. In few instances problems were caused by Kernel bug, inconsistence in OS cache or RAID cache or something else which made problem to disappear after system restart.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. peter says:

    This situation got interesting development today. Several hours later we have completed ALTER TABLE run for all potentially affected tables and almost going to put box in production I decided to wait until insert buffer merge is completed.

    This process took about 2 hours and as I was hoping it will complete successfully it crashed.

    This time it did not crash on applying log records to the database but at later stage as Insert buffer was merged, so corruption probably was in buffer pool.

    So now we’ve got to do dump and reload or restore from backup our little 1.3TB database which surely going to be a lot of fun.

    I wish there would be tools to examine insert-buffer content and possibly discard it. I’d rather have few tables with corrupted indexes which I can rebuild than this.

Speak Your Mind

*