The purpose of this blueprint is to describe how we can parallelize replication for simple-case scenarios with some constraints. We have two suggested approaches, each with benefits and drawbacks. We have considered many details, but we omit them for now until we decide on which approach to take.
The primary target for this particular implementation is:
- SaaS application, with allocation one database (schema) per user
- sharding databases
- MySQL Hosting providers
The biggest problem in current master→slave schema is slave is not being able to catch up with master, because of restriction of single execution thread on slave, while on master queries are executed in multi-concurrent way.
The system has the following constraints:
- All tables must be InnoDB.
- Queries are parallelized at the database level. Each transaction goes to a specific thread based on the connection's default database (set by USE).
- Databases are 100% independent of each other. No guarantees are made if there are cross-database foreign keys, views, triggers, stored procedures, or similar. We can optionally check and warn that each statement doesn't access tables outside of its proper database, but that is all.
- The IO thread's behavior is unchanged.
- The dynamic variable SLAVE_SQL_THREADS controls how many SQL threads there are.
- Each database is assigned to a thread using a hash function against the database name. There is no user control over this.
- SHOW SLAVE STATUS has no additional rows. There is full information on all threads in the INFORMATION_SCHEMA.SLAVE_SQL_THREADS table.
- SQL threads can optionally be started or stopped independently.
- Configuration to setup amount of threads
- INFORMATION_SCHEMA table with information about each thread
- SHOW SLAVE STATUS for THREAD <N>
- START / STOP SQL_THREAD <N>
- ability to recovery from error for THREAD <N>
This section explains the approaches we have considered.
This design makes each database's replication fully independent of the others. Each database is replicated as fast as possible. No attempt is made to track the “position” of the server as a whole; instead, each thread keeps track of the updates it has applied to each of the databases for which it is responsible.
- A slow transaction in one database does not slow down anything else in the system.
- The state of the databases is fully independent. Some can be very delayed, others can be up-to-date. The system has no simple state; it is a complex mix depending on how fast things execute.
- For STATEMENT BASED REPLICATION: we store information about each slave thread in InnoDB table. This way it will provide guarantee we have current position even after crash or system failure.
- For ROW BASED REPLICATION: We need to run RBR in IDEMPOTENT mode, in order to recovery from hard-crash or power-off. And it keeps possibility to have database in *undefined* state during re-applying logs, as some databases will be with already applied information, and some others not.
This design forces transactions to begin and commit in the same order they occur in the binary log from the master.
- The system as a whole has a single, well-defined “position” relative to the master, the same as normal replication.
- All the data in the system is self-consistent.
- It is possible to restart after a crash, just as normal replication.
- It is theoretically possible to remove some of the above constraints in the future, and make the system work in more cases.
- Variations in query execution time will force some transactions to wait for others to complete and commit before they can commit. Thus, the workload will be at least partially serialized, and the benefit of parallelization will be reduced. The more variation in execution time, the less advantage there will be.
- Very slow transactions will delay replication for the system as a whole.