Buy Percona ServicesBuy Now!

Funniest bug ever

 | January 31, 2009 |  Posted In: Events and Announcements


Recently my attention was brought to this bug which is a nightmare bug for any consultant.

Working with production systems we assume reads are reads and if we’re just reading we can’t break anything. OK may be we can crash the server with some select query which runs into some bug but not cause the data loss.

This case teaches us things can be different – reads can in fact cause certain writes (updates) inside which add risk, such as exposed by this bug.

This is why transparency is important – to understand how safe something is it is not enough to know what is this logically but also what really happens inside and so what can go wrong.

Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.


  • Hi,

    We inside cluster team actually consider it as one worst bug ever…
    Not exactly our proudest moment 🙁

    But I agree, it’s so bad that it’s actually funny…


  • At least it looks like it doesn’t represent the typical usecase of what most people will be doing – but very serious indeed. I’ve always found this bug the funniest:

  • @baron, data recovery services for NDB disk data are not relevant.
    For this bug it is simply not an option, ndb just DROPed the tables from the cluster, you’d have to restore from backup.
    If disk data files do get corrupt on a particular individual node you simply restart that node with –initial option and those files are restored from the peer in the node group.

  • Correction… “simply restart that node with –initial option [after deleting the on-disk data and log files]” (–initial will not remove on-disk data files).

  • Matthew, there is probably no system in existence (that uses on-disk storage) for which data recovery from on-disk files is not relevant. The point of our data recovery tools and services are to recover data that has been dropped, deleted, corrupted, etc and there is no backup. If it’s been dropped from the cluster, are the 1s and 0s on disk anywhere? If yes, then that’s exactly the type of scenario I’m thinking of. NDB can’t find the data anymore, but maybe something else can. And the customer might call us up, and we might write tools on the spot to do the recovery — that’s how our other tools got started 😉

    What if there’s a bug in NDB such that the disk data files get corrupt on every node and there is no peer in the node group with a good copy? If it hasn’t happened yet, it may someday, who knows.

    Disclaimer: I have not investigated the on-disk format of NDB at all.

  • Jonas, right. I meant “Funny” in this case which is really kind of tragic.

    Though it is very nice to see you got the fix for it relatively quickly and honestly publishing such bug also gives you a good credit.

  • Baron, Matthew,

    Right. If you would have a good backup (with point in time recovery) we would not have any recovery tools. In practice however backups sometimes are found to be broken and you have to recover the data. Our experience shows no one is immune – number of companies you’ve think should have a backup have contacted us for help (with Innodb)

    With Innodb it is easy – thanks to the page format it is possible to locate data even if filesystem was totally ruined (like RAID meltdown).

  • Hi Baron,

    Though I haven’t benchmarked myself the difference between ‘atime’ and noatime, please see what Linus Torvalds has to say about it:

    He claims more then 10% savings, though he wasn’t testing MySQL.
    Do you have any benchmarks comparing with/out “noatime”?


  • Shlomi,

    Linus mentions “mail spool” which tend to have a lot of tiny files…. and this is where overhead is significant. Unless you’re dealing with tens of thousands of tables in MySQL you’re in different situation. This also means any benchmarks you would like to do should be workload specific – in your particular case it is possible you will see significant gain.

  • No, I don’t, but anecdotally I can say that it matters if you have a lot of files. For example, suppose you have 100k tables, which is pretty common in certain types of apps. That’s at least 300k files if you’re using indexed MyISAM tables (also common for the same scenarios). Now suppose that you’re accessing them all randomly; you can do the math at how many times you’ll be doing an atime/diratime write. In many cases you won’t access a file more than once a second, so each access suffers the hit.

    My anecdotal evidence is that I haven’t seen significant performance changes from adding noatime,nodiratime to the mount options on “normal” servers with a few hundred tables. You can change the mount options at runtime so it’s pretty easy to see.

Comments are closed