Typical misconceptions on Galera Replication for MySQLStephane Combaudon
Even if a Galera node looks like a regular MySQL server, the underlying replication mechanism is very different. This implies some changes in the way you have to configure the Galera nodes. Here are some of the most common misconceptions about Galera when using Percona XtraDB Cluster.
Q: Why should I enable binary logging as it is not needed by Galera replication?
Unlike for regular asynchronous MySQL replication, it is true that you don’t need to enable binary logging to use Galera replication. However what if someone runs an accidental
In this case, the statement will be replicated immediately on all nodes. Then your main option to recover lost data is to use a backup. But if binary logging is not enabled, all changes to the table after the backup will be lost.
What if you have an asynchronous slave that is intentionally delayed? This is another good option to quickly recover the lost data, but to be able to set up an async slave, you will also have to enable binary logging!
So don’t forget to add on all your nodes:
binlog_format = ROW
Q: If I set
innodb_flush_log_at_trx_commit = 2, I may lose data in some cases, right?
For a regular MySQL master, it is recommended to set innodb_flush_log_at_trx_commit = 1 because it is the only way to make sure that every committed transaction is stored permanently on disk. The main drawback is that it can slow down writes a lot because it involves one fsync at each commit.
With Galera, the difference is that commit is synchronous: this means that a transaction is committed on node #1, it has already been replicated to all the other nodes (it has not necessarily been executed on the remote nodes though).
As Galera implements cluster-wide durability, there is no need to have single server durability and you can safely use
innodb_flush_log_at_trx_commit = 2 on all nodes.
Actually this is not totally accurate… For instance if all nodes lose power at the same time, you may lose some transactions. The likeliness of such a failure is related to how nodes are hosted: with each node in a separate datacenter, this is very unlikely. But with 3 nodes being VMs on a single physical host, it could well happen from time to time.
Q: The network RTT between my servers is 100ms. I know that each commit takes at least as much time as a network RTT so can I only hope to run 10 writes/s?
This needs some clarification: commit is synchronous because the full transaction is replicated to all nodes when it is committed. However you pay the same price whether the transaction has a single statements or several statements.
So if all your writes are auto-committed transactions, each write will trigger a commit that will need at least one RTT to complete. If RTT is 100ms, that means 10 writes/s.
But if your transactions have 10 writes, you will only need a commit every 10 writes: with 10 commits/s, you can now run 100 writes/s.
And finally several threads can commit at the same time, increasing the write throughput. With 10 concurrent threads executing transactions with 10 statements, you get 1000 writes/s.
Of course, this is only theory. In the real world, you will probably not get so perfectly aligned numbers. You can look at this post (Comparing Percona XtraDB Cluster with Semi-Sync replication Cross-WAN) to see real numbers with 1 thread and 32 threads.
Conclusion: network latency is a limiting factor for write throughput, that’s true. But that may not be as bad as you can think. And remember Callaghan’s law: “In a Galera cluster a given row can’t be modified more than once per RTT”.
I want to write on all nodes to get write scalability. Is it a good idea?
Let me start by stating that Galera cannot be a true solution for scale writes: the reason is simply that all writes need to be applied on all nodes.
But Galera brings a limited form of write scalability when writing concurrently to several nodes because:
- Write sets can be applied in parallel on the remote nodes.
- Galera is using row-based replication, so applying the replicated events can be faster than executing the original write.
There is a caveat however when writing on several nodes: because of optimistic locking, concurrent transactions on different nodes can lead to write conflicts. In this case Galera will roll back one of the transactions, and it is up to the application to retry executing the transaction.
Galera Replication is a nice technology that can help solve challenges around high availability. But a basic understanding of how it works is useful as it can avoid frustration or unreasonable expectations.