MySQL Limitations Part 1: Single-Threaded ReplicationBaron Schwartz
I recently mentioned a few of the big “non-starter” limitations Postgres has overcome for specific use cases. I decided to write a series of blog posts on MySQL’s unsolved severe limitations. I mean limitations that really hobble it for major, important needs — not in areas where it isn’t used, but in areas where it is used and thus is used very wastefully compared to its potential.
The first glaring problem is single-threaded replication. It is severe and getting much worse as servers get more and more CPUs and CPU cores. The replication replay process executes in a single thread on the replicas, and thus has no hope of keeping up with even a moderately busy write load on the primary, where many updates are occurring concurrently.
In a lot of Web applications, this isn’t really seen as a huge limitation. That’s because these apps are mostly read traffic, so a single primary can delegate the read workload across several replicas, and the write workload is still a relatively small fraction of the server’s total capacity. But eventually, it does become a problem if the app gets large enough, no matter how large the read-to-write ratio is.
What are some workarounds? Here are a few I’ve seen:
- Use DRBD replication instead of MySQL replication. Problem: you end up with an idle standby server, which can’t be used for serving reads, only for disaster recovery. That’s costly.
- Shard. Write workload is the single most legitimate reason for sharding. It’s too bad that “the replica can’t keep up with the write workload” becomes the reason to shard. I wish we could avoid that.
- Do the replication in the application. I know of applications where they just don’t use built-in replication. When they modify some data, they do it in both places. That’s a headache.
- Try obscure techniques such as external processes to prefetch the data the replica is trying to modify, so it can do it faster. This rarely works.
I’m not criticizing anyone who does these things — there really isn’t much of a good choice. It’s all a matter of picking the solution that’s least evil.
Why isn’t there multi-threaded replication? I think that a) it’s not as easy as it seems, and b) there are a hundred edge cases that make it nearly impossible to choose a solution that works for all situations. For example, mixtures of transactional and non-transactional tables are a nightmare.
Here are a few ideas I’ve either heard, or discussed, or thought of myself:
- One thread per database. If the application is built such that each database is fully independent, then on the replica, we could start up a thread for each database we see in the binary log, and simply pass the replication events to the appropriate thread.
- A pool of threads on the replica, and a coordinating thread that hands work to each of them. The coordinator would read from the relay log until it has a complete transaction’s worth of events (not a single statement), and hand the whole transaction to the worker thread. The worker thread would run up until the COMMIT, but not actually commit, and then report back to the coordinator thread. The coordinator thread would ensure that all of the transactions begin in the same order as defined in the relay log, and commit in the same order. If any error occurred, such as a deadlock or lock wait timeout, then the coordinator would instruct the workers to roll back, and either retry or make them execute the problematic transactions in serial instead of concurrently.
- Multiple binary logs on the primary, one per database; one replication process per binary log on the replica. This would have the advantage that it would allow a replica to subscribe to multiple masters, which is currently impossible.
These solutions represent different types of trade-offs. For example, solution 1) only works for specific uses, and I don’t think it’s general-purpose enough. Solution 2) has potentially complex behavior that might not work well in some cases, such as when deadlocks are common; but it is overall the least disruptive or different from the user’s point of view. Solution 3) requires modifying the binary logging code, which is risky. It also requires maintaining many master.info files on the replica, and new SQL syntax for administering replicas, and is generally not something I personally want to administer (replication is fragile enough… imagine recovering after a crash when you have to fix multiple threads that have forgotten where they should be reading in multiple binlogs?).
Regardless of the solution, it is certain that nothing is going to work in all cases; the most common cases will require use of InnoDB with the proper transaction isolation level, at a minimum. This behavior is going to have to default to single-threaded as replication currently does, and only enable the multi-threaded behavior if the user configures it to do so.
I would be in favor of solution 2) with an option to configure it to behave as solution 1).
The other big question is who wants to sponsor its development? You know who to contact if you do!