Recently Aphyr (Kyle Kingsbury) published https://aphyr.com/posts/327-call-me-maybe-mariadb-galera-cluster
The article is technically valid, I am not going to dispute a conclusion Aphyr made, but it is also quite technically involved, so users who just jump to conclusion may get the wrong impression and we’re left with more questions than ever.
So, let me state what is the real conclusion of this article:
“Galera cluster does not support SNAPSHOT ISOLATION LEVEL, in contract to what was stated in the documentation”.
Following that conclusion is using Galera cluster may result in “corrupted” data.
I do not quite like the usage of the word “corrupted” here. For me, the more correct word be to use is “inconsistent”.
So with this clarification, the Aphyr’s conclusion that Galera Cluster (and it affects both MariaDB Galera and Percona XtraDB Cluster products)
does not support SNAPSHOT ISOLATION and may leave data in inconsistent state is valid.
But there I need to add quite IMPORTANT addition: it may leave data in inconsistent state
if you use SPECIAL TYPE of transactions in default isolation levels that Aphyr uses in his test.
Moreover, if we test the same workload on a simple single instance InnoDB, we will get the same result.
Before getting too scary of “inconsistent data”, let’s review what kind of transactions are used and what are practical implications.
Aphyr uses following logic:
Assume we have a table
|
1 2 3 4 5 |
CREATE TABLE `accounts` ( `account_id` int(11) NOT NULL, `balance` int(11) NOT NULL, PRIMARY KEY (`account_id`) ) ENGINE=InnoDB |
We have N rows in table accounts, and each row populated with initial balance 100.
That results in SUM(balance) FROM accounts == 100*N
Application logic: Execute following transactions concurrently:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
BEGIN (start transactions) // in application: renerate random $fromAccount in a range [1;N] SELECT balance FROM accounts WHERE account_id=$fromAccount // in application read balance into $from_balance variable // in application: renerate random $toAccount in a range [1;N], but != $fromAccount SELECT balance FROM accounts WHERE account_id=$toAccount // in application read balance into $to_balance variable // in application, generate random amount to move from one account to another, as $moveAmt // in application, calculate new balance for account1: $newBalance1=$from_balance-$moveAmt // in application, calculate new balance for account2: $newBalance2=$to_balance+$moveAmt // execute queries: UPDATE account SET balance=$newBalance1 WHERE account_id=$fromAccount UPDATE account SET balance=$newBalance2 WHERE account_id=$toAccount COMMIT; |
As you see it includes some application logic, so on database side, the transactions looks like:
(assuming we move 25 from account 5 to 8)
|
1 2 3 4 5 6 |
BEGIN SELECT balance FROM accounts WHERE account_id=5 SELECT balance FROM accounts WHERE account_id=8 UPDATE account SET balance=75 WHERE account_id=5 UPDATE account SET balance=125 WHERE account_id=8 COMMIT; |
Aphyr’s proves that these transactions executed concurrently should keep balances consistent (that is SUM(balances)==N*100) if database support SNAPSHOT ISOLATION or SERIALIZABLE isolation levels.
In his test he shows that running on Galera cluster these transactions executing concurrently results in inconsistent balance, therefore Galera cluster does not support SNAPSHOT ISOLATION level.
This however is totally expected.
Moreover, if you try this test on a single server against InnoDB in REPEATABLE-READ (default) mode,
you also will end up in inconsistent state (you can find my code here: https://github.com/vadimtk/go-trx-iso-test).
This is because how InnoDB handles REPEATABLE-READ mode (one may argue that InnoDB’s REPEATABLE-READ is weaker
than standard defined REPEATABLE-READ, and it is more closer to READ-COMMITED. This is a good opportunity for Asphyr to start another FUD “Call Me Maybe: InnoDB”). In simplified terms, InnoDB executes reads in repeatable-read mode, and writes or locked-read in read-committed mode.
What does it mean from practical standpoint?
From my opinion these transactions are little bit artificial (although are totally valid).
If you use this in a real life, the more obvious way to write these transactions is:
|
1 2 3 4 5 6 |
BEGIN SELECT balance FROM accounts WHERE account_id=5 SELECT balance FROM accounts WHERE account_id=8 UPDATE account SET balance=balance-25 WHERE account_id=5 UPDATE account SET balance=balance+25 WHERE account_id=8 COMMIT; |
If you do this, it will NOT produce an inconsistent state.
Another way to handle this (for a single server InnoDB) is to use SERIALIZABLE isolation level (with an expected performance penalty).
Unfortunately Galera Cluster does not support SERIALIZABLE isolation level, as it does not pass read-set between nodes,
and node communication happens on COMMIT stage.
Third way, MySQL also provides an extension: SELECT .. FOR UPDATE statements to handle cases exactly like these.
So if you want to keep REPEATABLE-READ and original transactions, you will need to rewrite this as
|
1 2 3 4 5 6 |
BEGIN SELECT balance FROM accounts WHERE account_id=5 FOR UPDATE; SELECT balance FROM accounts WHERE account_id=8 FOR UPDATE; UPDATE account SET balance=75 WHERE account_id=5 UPDATE account SET balance=125 WHERE account_id=8 COMMIT; |
This will result in a consistent state for table accounts and will work for both single InnoDB and multi-node Percona XtraDB Cluster deployments.
One thing to remember, that with Percona XtraDB Cluster you may get a DEADLOCK error trying to execute "COMMIT" statement, so your application should be ready to handle this error, rollback and repeat the transaction if needed.
So in conclusion of my post:
Using transactions described in https://aphyr.com/posts/327-call-me-maybe-mariadb-galera-cluster may result in inconsistent state (not in data corrupted state!), for both Galera Cluster and single instance InnoDB. But this is because these transactions do not use properly InnoDB’s REPETABLE-READ isolation level. To reflect InnoDB’s requirement we need to use “SELECT ... FOR UPDATE” or to rewrite transactions in a described way.
UPDATE 18-Sep-2015.
Based on Twitter comments and comments from https://news.ycombinator.com/item?id=10238690, I would like to add following.
SNAPSHOT ISOLATION level. At the same time it states “It is recommended that you avoid using SERIALIZABLE in Galera Cluster.”