Clarification on “Call me Maybe: MariaDB Galera Cluster”

Recently Aphyr (Kyle Kingsbury) published

The article is technically valid, I am not going to dispute a conclusion Aphyr made, but it is also quite technically involved, so users who just jump to conclusion may get the wrong impression and we’re left with more questions than ever.

So, let me state what is the real conclusion of this article:
“Galera cluster does not support SNAPSHOT ISOLATION LEVEL, in contract to what was stated in the documentation”.
Following that conclusion is using Galera cluster may result in “corrupted” data.

I do not quite like the usage of the word “corrupted” here. For me, the more correct word be to use is “inconsistent”.

So with this clarification, the Aphyr’s conclusion that Galera Cluster (and it affects both MariaDB Galera and Percona XtraDB Cluster products)
does not support SNAPSHOT ISOLATION and may leave data in inconsistent state is valid.
But there I need to add quite IMPORTANT addition: it may leave data in inconsistent state
if you use SPECIAL TYPE of transactions in default isolation levels that Aphyr uses in his test.
Moreover, if we test the same workload on a simple single instance InnoDB, we will get the same result.

Before getting too scary of “inconsistent data”, let’s review what kind of transactions are used and what are practical implications.

Aphyr uses following logic:
Assume we have a table

We have N rows in table accounts, and each row populated with initial balance 100.
That results in SUM(balance) FROM accounts == 100*N

Application logic: Execute following transactions concurrently:

As you see it includes some application logic, so on database side, the transactions looks like:
(assuming we move 25 from account 5 to 8)

Aphyr’s proves that these transactions executed concurrently should keep balances consistent (that is SUM(balances)==N*100) if database support SNAPSHOT ISOLATION or SERIALIZABLE isolation levels.
In his test he shows that running on Galera cluster these transactions executing concurrently results in inconsistent balance, therefore Galera cluster does not support SNAPSHOT ISOLATION level.

This however is totally expected.
Moreover, if you try this test on a single server against InnoDB in REPEATABLE-READ (default) mode,
you also will end up in inconsistent state (you can find my code here:

This is because how InnoDB handles REPEATABLE-READ mode (one may argue that InnoDB’s REPEATABLE-READ is weaker
than standard defined REPEATABLE-READ, and it is more closer to READ-COMMITED. This is a good opportunity for Asphyr to start another FUD “Call Me Maybe: InnoDB”). In simplified terms, InnoDB executes reads in repeatable-read mode, and writes or locked-read in read-committed mode.

What does it mean from practical standpoint?
From my opinion these transactions are little bit artificial (although are totally valid).

If you use this in a real life, the more obvious way to write these transactions is:

If you do this, it will NOT produce an inconsistent state.

Another way to handle this (for a single server InnoDB) is to use SERIALIZABLE isolation level (with an expected performance penalty).
Unfortunately Galera Cluster does not support SERIALIZABLE isolation level, as it does not pass read-set between nodes,
and node communication happens on COMMIT stage.

Third way, MySQL also provides an extension: SELECT .. FOR UPDATE statements to handle cases exactly like these.
So if you want to keep REPEATABLE-READ and original transactions, you will need to rewrite this as

This will result in a consistent state for table accounts and will work for both single InnoDB and multi-node Percona XtraDB Cluster deployments.

One thing to remember, that with Percona XtraDB Cluster you may get a DEADLOCK error trying to execute "COMMIT" statement, so your application should be ready to handle this error, rollback and repeat the transaction if needed.

So in conclusion of my post:

Using transactions described in may result in inconsistent state (not in data corrupted state!), for both Galera Cluster and single instance InnoDB. But this is because these transactions do not use properly InnoDB’s REPETABLE-READ isolation level. To reflect InnoDB’s requirement we need to use “SELECT ... FOR UPDATE” or to rewrite transactions in a described way.

UPDATE 18-Sep-2015.
Based on Twitter comments and comments from, I would like to add following.

Share this post

Comments (17)

  • Tom Santero

    Instead of getting defensive and nitpicking the language of Kyle’s post, the professional thing to do would have been to update the documentation on your site that claims Snapshot-Isolation [0] and be grateful that someone spent several hours of their life, independently verifying the behavior of your system in a transparent, repeatable manner (which also lead to the discovery of a corner case where the system fails in an interesting way).

    Distributed systems are hard; apparently taking criticism even more so.


    September 17, 2015 at 3:37 pm
  • Mao Geng

    Good point and examples. Thanks!
    Oracle (and Postgres) doesn’t support REPEATABLE_READ at all (Tom has a good explanation here –, why similar MVCC based InnoDB supports it? Even the consistency it provides is “weak”, and performance cost is high (
    I am curious if you can share a situation where using REPEATABLE_READ is more appropriate than using READ_COMMITTED with explicit locks?

    September 17, 2015 at 6:52 pm
  • Doug

    Nice post, Vadim. I appreciate the detailed distinction made concerning isolation levels and how to rewrite the transaction to avoid inconsistencies.

    September 17, 2015 at 8:36 pm
  • Mark Callaghan

    Excellent post. Thanks for doing the hard work to explain this. My first response on reading his post was the assumption that the problem existed for InnoDB independent of Galera, but I don’t want to credit as you did the work.

    While figuring out what to do for repeatable read for RocksDB, I documented InnoDB, Oracle and Postgres. I think MySQL+RocksDB will end up with Postres-style.

    September 17, 2015 at 8:53 pm
  • Vadim Tkachenko

    By some reason a comment from Tom Santero (@tsantero) did not come through (our webmasters are looking into this)
    I repost it here that there is no claims in censorship:

    His text:
    Instead of getting defensive and nitpicking the language of Kyle’s post, the professional thing to do would have been to update the documentation on your site that claims Snapshot-Isolation [0] and be grateful that someone spent several hours of their life, independently verifying the behavior of your system in a transparent, repeatable manner (which also lead to the discovery of a corner case where the system fails in an interesting way).

    Distributed systems are hard; apparently taking criticism even more so.


    September 18, 2015 at 12:12 am
  • Vadim Tkachenko

    @Mark, thank you!

    Good luck with adding isolations to RocksDB. It seriously is a quite complicated project, especially given MySQL API layer.

    September 18, 2015 at 12:14 am
  • Jonathan Oliver


    Thanks for clarifying this. I don’t see your post as defensive or nit-picky at all. Language matters. Inconsistent and corrupted are two very different words with very different meanings.

    September 18, 2015 at 9:33 am
  • R. Kyle Murphy

    As a developer I have to say that Aphyr’s example is more realistic and reflects more accurately how DBs are typically used. The usual flow from a application perspective is to start a new transaction, query for the records you need, perform some transformations of them in your application, write the updated records back, and then commit the transaction which is exactly what Aphyr did. Although the SQL you provided would work in this particular instance, it’s very problematic from the standpoint of an application because now you’ve moved part of the business logic down into the DB. Taken to the logic conclusion in order to maintain consistency you will end up having to use stored procedures for all application logic as anything that touches the DB records could potentially lead to an inconsistent state unless ALL of the logic is contained inside of the DB (and wrapped in a transaction).

    September 18, 2015 at 9:40 am
  • Mao Geng

    I left a comment yesterday however seems it did not make it. Let me retry.
    Very good examples and clarification, however I am curious why InnoDB supports REPEATABLE_READ, while with similar MVCC design Oracle/Postgres don’t support it at all (Tom has a good post about it – In what situations, REPEATABLE_READ is better than READ_COMMITTED plus explicit locking, and better than SERIALIZABLE – can you please give an example?

    September 18, 2015 at 1:03 pm
  • Vadim Tkachenko


    Let me provide you a quick answer, let’s see if it help, if not I will need to look into detailed example.

    InnoDB REPEATABLE-READ is better than READ-COMMITTED, as in READ-COMMITTED you are vulnerable to “phantom read” anomaly.
    This is interesting, as standard allows to have “phantom reads” in REPEATABLE-READ, but InnoDB’s implementation does not have it.

    SERIALIZABLE gives you a way to execute a transaction as it is totally isolated, and no other transaction may interfere. That makes possible for original transactions in this post to work.
    Unfortunately it comes with a serious performance and locking overhead. Your queries might be locked (while not locked in REPEATABLE-READ) and even result in more DEADLOCK and LOCK WAIT TIMEOUT errors under high concurrency.

    Please also check:

    September 18, 2015 at 1:50 pm
  • Mark Callaghan

    Mao Geng – Postgres supports repeatable read. See this for a comparison with InnoDB behavior –

    R Kyle Murphy – the community of InnoDB experts knows all about this. Alas, the community of InnoDB users is larger than that and sometimes they make the mistake of not using “FOR UPDATE” in the select statement as described above. The workaround doesn’t require embedding business logic. This is well documented behavior for InnoDB. I don’t think it will change. The alternative as implemented by Postgres isn’t without problems — there will be more rollbacks from write-write conflicts.

    lots of people supporting InnoDB in production know that users sometimes do the wrong thing with InnoDB (start transaction, SELECT … WHERE k, UPDATE … WHERE k). Adding ‘for update’ fixes that. This is well documented behavior. As much as I like most of the work

    September 18, 2015 at 2:43 pm
  • Mao Geng

    Thanks Mark. I just found Postgres added support for repeatable_read since 9.1. Also I found its document ( describes the difference between repeatable_read and read_committed pretty well, especially it points out “select for update” statement in repeatable_read transaction might fail – the repeatable read transaction will be rolled back with the message:
    “ERROR: could not serialize access due to concurrent update”.

    Valim, thanks for reply. I saw the differences. I have a long history on Oracle. It is a bit hard for me to think a case repeatable_read is required.
    In your example above:
    SELECT balance FROM accounts WHERE account_id=5
    SELECT balance FROM accounts WHERE account_id=8
    UPDATE account SET balance=balance-25 WHERE account_id=5
    UPDATE account SET balance=balance+25 WHERE account_id=8
    Is repeatable_read required here to keep consistency? The next “select for update” example can use read_committed, isn’t it?


    September 18, 2015 at 3:25 pm
  • Henrik Ingo

    Hi Tom

    I’m afraid you’re barking up the wrong tree here, as Vadim or Percona are not the creators of Galera. Nor is Vadim being defensive, it seems to me he is quite open about InnoDB’s limitations.

    Regarding Kyle, this example is quite similar to most of his posts in “Call me maybe” series. At least I can say this regarding the DB technologies I know well enough to have an opinion of. While Kyle’s method of investigating consistency levels provided by different databases is really advanced, it is impossible for a single person to be an expert in all databases, so often his analysis is also quite superficial. Combined with his entertaining but not that professional “you’re all shit anyway” attitude, it makes his posts hard to read. An average reader will get the conclusion that Galera (or even just InnoDB) is useless for transactional apps, when in reality the correct conclusion is simply that Kyle didn’t know how to use InnoDB correctly. I’m not even super-experienced with InnoDB myself, and even then my first comment when reading that post was “but shouldn’t you use FOR UPDATE to do that?”

    Even then, it must be said that Kyle usually succeeds in uncovering some genuine bug or flaw in the systems he tests. Which proves that this is a difficult topic for vendors to get right and on a general level Kyle’s methodology is very sound. In this case the genuine finding was that Galera documentation claimed to support snapshot isolation, but it turned out that is not true, and the Galera developers admitted that. It is indeed bad that they didn’t rush to fix their documentation yet.

    September 19, 2015 at 6:44 am