Funniest bug ever

Recently my attention was brought to this bug which is a nightmare bug for any consultant.

Working with production systems we assume reads are reads and if we’re just reading we can’t break anything. OK may be we can crash the server with some select query which runs into some bug but not cause the data loss.

This case teaches us things can be different – reads can in fact cause certain writes (updates) inside which add risk, such as exposed by this bug.

This is why transparency is important – to understand how safe something is it is not enough to know what is this logically but also what really happens inside and so what can go wrong.

Share this post

Comments (19)

  • Pedro Melo

    Sorry Peter,

    that is not the funniest bug ever…

    This is:

    Best regards,

    January 31, 2009 at 1:43 am
  • Jonas


    We inside cluster team actually consider it as one worst bug ever…
    Not exactly our proudest moment 🙁

    But I agree, it’s so bad that it’s actually funny…


    January 31, 2009 at 4:29 am
  • Morgan Tocker

    At least it looks like it doesn’t represent the typical usecase of what most people will be doing – but very serious indeed. I’ve always found this bug the funniest:

    January 31, 2009 at 5:00 am
  • Shlomi Noach

    Reminds me of the ‘noatime’ option on unix file systems.
    When first I learned that by reading any file or file property I commit a write – I was in utter surprise.

    January 31, 2009 at 5:39 am
  • Baron Schwartz

    Perhaps we should investigate the on-disk format of NDB so we can start providing data recovery services for it, too.

    January 31, 2009 at 6:03 am
  • Matthew Montgomery

    @baron, data recovery services for NDB disk data are not relevant.
    For this bug it is simply not an option, ndb just DROPed the tables from the cluster, you’d have to restore from backup.
    If disk data files do get corrupt on a particular individual node you simply restart that node with –initial option and those files are restored from the peer in the node group.

    January 31, 2009 at 11:18 am
  • Matthew Montgomery

    Correction… “simply restart that node with –initial option [after deleting the on-disk data and log files]” (–initial will not remove on-disk data files).

    January 31, 2009 at 11:25 am
  • Baron Schwartz

    Matthew, there is probably no system in existence (that uses on-disk storage) for which data recovery from on-disk files is not relevant. The point of our data recovery tools and services are to recover data that has been dropped, deleted, corrupted, etc and there is no backup. If it’s been dropped from the cluster, are the 1s and 0s on disk anywhere? If yes, then that’s exactly the type of scenario I’m thinking of. NDB can’t find the data anymore, but maybe something else can. And the customer might call us up, and we might write tools on the spot to do the recovery — that’s how our other tools got started 😉

    What if there’s a bug in NDB such that the disk data files get corrupt on every node and there is no peer in the node group with a good copy? If it hasn’t happened yet, it may someday, who knows.

    Disclaimer: I have not investigated the on-disk format of NDB at all.

    January 31, 2009 at 3:16 pm
  • peter

    Jonas, right. I meant “Funny” in this case which is really kind of tragic.

    Though it is very nice to see you got the fix for it relatively quickly and honestly publishing such bug also gives you a good credit.

    January 31, 2009 at 11:55 pm
  • peter

    Baron, Matthew,

    Right. If you would have a good backup (with point in time recovery) we would not have any recovery tools. In practice however backups sometimes are found to be broken and you have to recover the data. Our experience shows no one is immune – number of companies you’ve think should have a backup have contacted us for help (with Innodb)

    With Innodb it is easy – thanks to the page format it is possible to locate data even if filesystem was totally ruined (like RAID meltdown).

    January 31, 2009 at 11:58 pm
  • o.u.

    Wow, Shlomi Noach @ 4, re: the atime .. a write on every read, even from cache – I’m kind of shocked.

    February 1, 2009 at 8:19 pm
  • Baron Schwartz

    It’s not quite that bad. It’s only once per second. (There’s only a write if the atime has actually changed, which is only true once a second.)

    February 2, 2009 at 3:53 pm
  • johan

    Ha, it was a funny bug indeed. Unfortunately, I found it on a customer site 🙁

    February 3, 2009 at 1:33 am
  • Log Buffer

    “Peter Zaitsev shares the funniest bug ever.”

    Log Buffer #134

    February 6, 2009 at 1:37 pm
  • o.u.

    Thank you Barron – ok, not as crazy as it sounded then, though still a shock.

    February 9, 2009 at 3:25 pm
  • Shlomi Noach

    Hi Baron,

    Though I haven’t benchmarked myself the difference between ‘atime’ and noatime, please see what Linus Torvalds has to say about it:

    He claims more then 10% savings, though he wasn’t testing MySQL.
    Do you have any benchmarks comparing with/out “noatime”?


    February 10, 2009 at 1:07 am
  • peter


    Linus mentions “mail spool” which tend to have a lot of tiny files…. and this is where overhead is significant. Unless you’re dealing with tens of thousands of tables in MySQL you’re in different situation. This also means any benchmarks you would like to do should be workload specific – in your particular case it is possible you will see significant gain.

    February 10, 2009 at 8:37 am
  • Baron Schwartz

    No, I don’t, but anecdotally I can say that it matters if you have a lot of files. For example, suppose you have 100k tables, which is pretty common in certain types of apps. That’s at least 300k files if you’re using indexed MyISAM tables (also common for the same scenarios). Now suppose that you’re accessing them all randomly; you can do the math at how many times you’ll be doing an atime/diratime write. In many cases you won’t access a file more than once a second, so each access suffers the hit.

    My anecdotal evidence is that I haven’t seen significant performance changes from adding noatime,nodiratime to the mount options on “normal” servers with a few hundred tables. You can change the mount options at runtime so it’s pretty easy to see.

    February 10, 2009 at 8:39 am
  • Shlomi Noach

    @Peter, @Baron,

    Thanks for the information. It does sound a lot more reasonable in light of your explanation.

    February 10, 2009 at 10:40 pm

Comments are closed.

Use Percona's Technical Forum to ask any follow-up questions on this blog topic.