September 2, 2014

Add an option to Fail on Innodb Initialize failure, Please ?

I already wrote about this issue but as I is third team I’m helping customers to resolve this “frm corruption” issue it is the time to return to it again.

During MySQL 5.0 release cycle the change was made so now MySQL does not stop if Innodb storage engine failed to initialize but starts properly… just having Innodb tables unavailable. I honestly do not know any case when this behavior change helps… of course you silently want to get good portion of your database unavailable right ? But I guess this change was made for reasons quite far from improving user experience.

It is worth to note MySQL 5.0 introduces STRICT option which will makes error control more strict, what is often expected by the Enterprise users – so instead of cutting the string lengths or converging NULLs to zeros MySQL can be tuned to abort such statements. In this, in my opinion much more serious case there is no option to only start MySQL if all enabled storage engines could be initialized. It should be pretty trivial to add one I think.

Note if we use typical comparison of Storage Engines to File Systems – you have an option to specify which of file systems are mandatory so system will not complete boot up without having these systems clearly mounted. The reason is same – you do not want to deal with all kind of gotchas which may happen when system is partially functional in such crazy way. It is much better to have it safely down so monitoring will scream and let you to get out quickly and fix things.

But it gets worse. Innodb failed initialization is not only very hidden – you will not see it unless you look in MySQL server Error Logs but the error message accessing Innodb tables is also very obfuscated:

How this supposed to tell you Innodb (or for that sake any other Storage Engine) failed to initialize ?

This error suppose to mean .frm file is corrupted, and indeed you still can get this error message in such case, but now it is used for completely different reason.

Why not to have error message “Storage Engine is Not Available” or something similar ?

Users are confused by this error message and think it is indeed .frm corruption – trying to restore .frm from backup or other servers and being extremely frustrated as it does not work.

Of course if something looks wrong with your server you should first take a look at MySQL Server error logs but few people do it or spend a time examining error messages accurately enough.

Again if you made your made on this behavior, please at least add option such as –strict-start which would require server to start without serious errors in order to start listening on the sockets.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Pawel says:

    > Honestly do not know any case when this behavior change helps…
    It may help when server acts as master and innodb crashes.
    In this case binlonlogs will be sent to slaves.
    However, this can be achived in other way.

  2. peter says:

    Pawel,

    I think this is a bit artificial case. If Innodb crashes and it restarts it normally would self recover anyway. If it is bad database corruption so recovery does not work Innodb will in most cases assert and crash, ie due to page read failed.

    Innodb fails to initialized and is disabled typically in case of configuration changes permission problems etc.

    Now even imagine if you have Innodb as a master – for most people using basic monitoring people will not detect master is down if it is just Innodb which failed to initialize. And if you’re good with things and use custom monitoring and fall back scripts you can handle it by starting system with –skip-innodb to perform log shipping unless you use some other way for it.

  3. I know too many people (including myself) who have wasted too much time on this issue. An easy way for this to happen is to change innodb_log_file_size in /etc/my.cnf so that it doesn’t match the real size. I filed a feature request for this just now: http://bugs.mysql.com/bug.php?id=31578

  4. peter says:

    Thanks Mark,

    Indeed changing innodb_log_file_size is most common case of this problem, or related issues such as restoring from backup when log file size was different etc.

  5. Peter Bach says:

    One other problem we have had is that as a MySQL client, you cannot tell when the database is going though recovery when starting up, other than reading the log file.

    Your connection will fail, but you have no idea why. We have an embedded appliance, that tries to connect serveral times with a rudimentary backoff algorithm. If we can’t connect, we just shutdown the server, or in the extreme case reboot the box.

    So we ran into a problem when scale testing that the recovery was taking quite a bit of time, as expected, but the box would get rebooted underneath it. So recovery would fail, and you go through this again the next time you start up the system / database.

    So what is the proper way to do this without user intervetion? Is there a way I can see the recovery is going on, maybe with a different error code, so I should at least know to retry, and the database is not really in a hung state.

  6. peter says:

    Peter,

    This is the good question. Actually a while ago MySQL would just start listening on the socket but connection will not succeed until recovery is completed. Right now it will not even start accepting connections until storage engines are initialized which is better.

    Accepting them but refusing queries to say Innodb tables with “Recovery In Progress” or staling them on table lock could be an option but for most of the applications it would complicate things you have to think about and monitor – right now you have either MySQL is down or it is up and fully operational which is easy to check.

    In your case I guess you need to check MySQL Error logs as well to see if you need to reboot the box. This requires some hacking for remote boxes but should not be that bad.

  7. This cause much of frustrating when moving database between computer.

Speak Your Mind

*