I’ve recently worked with customers using replication rings with 4+ servers; several servers accepting writes. The idea behind this design is always the same: by having multiple servers, you get high availability and by having multiple writer nodes, you get write scalability. Alas, this is simply not true. Here is why.
Having several servers is a necessary condition to have high availability, but it’s far from sufficient. What happens if for instance C suddenly disappears?
- The replication ring is broken, so updates from A and B will never go to D. D will then quickly become so out-of-date that it’s not going to be usable. But wait! A will no longer receive the updates from B so A will quickly become non usable as well. Same thing for B. So unless you are very quick to configure a smaller ring with the remaining nodes, the whole chain will soon be non working.
- If an event from C is still being executed on one of the other servers, it will go into an infinite loop, simply because C is the only server being able to prevent an event originally coming from C to cycle through the ring.
Conclusion: each time a server goes down, the whole system goes down. In other words, availability is poorer than with a single server.
You can think that if you are able to run 1000 writes/s on a single server, writing on 4 servers in parallel will allow you to run 4000 writes/s over the whole cluster. However reality is quite different.
Don’t forget that ALL writes will be executed on ALL servers. So we have 2 separate scenarios:
- Scenario #1: 1000 writes/s is the point where you’re hitting a bottleneck (for instance disk saturation). Then you’ll never be able to handle the extra load coming from replication. What is going to happen is simply that the servers will become slow because of overload and they’ll never be able to go beyond the 1000 writes/s mark.
- Scenario #2: a single server could handle 5000 writes/s. Then writing on all servers will indeed allow you to claim that your cluster can absorb 4000 writes/s. But you would achieve the same result by running 4000 writes/s on a single server. This has nothing to do with write scalability.
Conclusion: As all writes are run on all servers, writing on multiple nodes doesn’t magically create extra write capacity. You’re still bound by the capacity of a single server.
Another concern when allowing multiple writers is write conflicts. MySQL doesn’t have any mechanism to detect or solve write conflicts.
So lots of “funny” things can happen when writes are conflicting:
- Duplicate key errors that will cause replication to halt. And no,
auto_increment_offsetcannot resolve all possible situations when duplicate key errors can happen.
- An even funnier situation is when conflicting writes do not generate a replication error, but instead create hidden data inconsistencies. Like you have value=100 in a field, A does value=value+2 and B does value=valuex2. You can end up with one server having value=202 and another server having value=204. Which one is the right value? Impossible to know…
If you’re interested in learning more on the risk of writing on multiple nodes while using regular MySQL replication, you can check out this webinar.
A ring is one the worst MySQL replication topologies as it dramatically increases the complexity of all operations on the ring while providing no benefit.
If you need an HA solution, it is not an easy choice as there are many of them and all have tradeoffs, but a ring is definitely not the right option. This post can help you find the right candidate(s).
If you need write scalability, the options are limited, but again, MySQL ring replication is not a good fit. The main question to answer is how many writes do you want to be able to run? For instance, if you want 10x write scalability but your current workload is 100 writes/s, that’s easy: just make sure you have a decent schema, decent indexes and decent hardware. If you want 10x write scalability but you’re already running 5000 writes/s, it’s probably time to explore sharding.