Over last couple of years I have ran into random MySQL crashes in production when multiple key caches were used. Unfortunately this never was frequent or critical enough issue so I could spend time creating repeatable test case and search of the bug in the MySQL database did not find anything. Recently we had this problem again and now discussed it with Monty’s team – this time we found the bug for this issue.
It is no surprise why I could not find the bug easily – the bug is not really related to multiple key caches but to online key cache resize. It is just this code most actively used in case you’re using multiple key caches. It is very rare one would resize single key cache in production and it only triggers crash sometimes, while if you’re using multiple key caches there are often some scripts in place which adjust their size or change mappings of the tables.
No let me put my complain hat on. Looking into this crash bug I see it was opened in early 2006, this is 3.5 years ago when I was still with MySQL living in Seattle area…. looks like century ago really.
As usually with such tricky bugs Shane Bester steps in to create a test case to crash it soon. Shane seems to be the best guy in the universe when it comes to making non repeatable bugs repeatable.
It took over a year to complete the bug fix. For all this time bug existed in 4.1, 5.0 and 5.1 trees. When bug was fixed and closed (so you will not find this bug if you search for active bugs) but in reality it was only fixed in MySQL 5.1 while even though bug closed MySQL 5.0 and MySQL 4.1 still did not have it fixed.
It was planned for it to be backported in MySQL 5.0 after MySQL 5.1 is in production for a few months but I trust this was forgotten.
Now this looks like the nastiest bug type to me. This is the bug in relatively rarely used feature – I’d expect less than 1% of people would resize key cache while server is up or use multiple key caches. The fix of it is not easy and may affect 100% of the users if bug is introduced. It is also race condition bug so not all people using the feature will run into the bug – In my practice number of people using multiple key caches with relatively light load never ran into this issue.
It is the hard choice what to do in such case but I think closing the bug and leaving it unfixed at the same time is not a good idea. At least I’d like to see the bug “cloned” for version 5.0 and 4.1 so it is can be found as active bug and it is clear these versions still have it. Or may be have some special bug status to indicate it was fixed only in later MySQL versions but left unfixed in former ones ?
I also would like to have no crash bugs in the server. If there are some features which are known to be buggy and crashing I’d like to see them disabled as run time and enabled with –enable-buggy-features switch (I’m sure marketing can come up with better name for this one)