In this blog post, I will review some of the MySQL replication concepts that are part of the MySQL environment (and Percona Server for MySQL specifically). I will also try to clarify some of the misconceptions people have about replication.
Since I’ve been working on the Solution Engineering team, I’ve noticed that – although information is plentiful – replication is often misunderstood or incompletely understood.
So What is Replication?
Replication guarantees information gets copied and purposely populated into another environment, instead of only stored in one location (based on the transactions of the source environment).
The idea is to use secondary servers on your infrastructure for either reads or other administrative solutions. The below diagram shows an example of a MySQL replication environment.
Fine, But What Choices Do I Have in MySQL?
You actually have several different choices:
Standard asynchronous replication
Asynchronous replication means that the transaction is completed on the local environment completely, and is not influenced by the replication slaves themselves.
After completion of its changes, the master populates the binary log with the data modification or the actual statement (the difference between row-based replication or statement-based replication – more on this later). This dump thread reads the binary log and sends it to the slave IO thread. The slave places it in its own preprocessing queue (called a relay log) using its IO thread.
The slave executes each change on the slave’s database using the SQL thread.
Semi-synchronous replication means that the slave and the master communicate with each other to guarantee the correct transfer of the transaction. The master only populates the binlog and continues its session if one of the slaves provides confirmation that the transaction was properly placed in one of the slave’s relay log.
Semi-synchronous replication guarantees that a transaction is correctly copied, but it does not guarantee that the commit on the slave actually takes place.
Important to note is that semi-sync replication makes sure that the master waits to continue processing transactions in a specific session until at least one of the slaves has ACKed the reception of the transaction (or reaches a timeout). This differs from asynchronous replication, as semi-sync allows for additional data integrity.
Keep in mind that semi-synchronous replication impacts performance because it needs to wait for the round trip of the actual ACK from the slave.
This is a new concept introduced in the MySQL Community Edition 5.7, and was GA’ed in MySQL 5.7.17. It’s a rather new plugin build for virtual synchronous replication.
Whenever a transaction is executed on a node, the plugin tries to get consensus with the other nodes before returning it completed back to the client. Although the solution is a completely different concept compared to standard MySQL replication, it is based on the generation and handling of log events using the binlog.
Below is an example architecture for Group Replication.
If Group Replication interests you, read the following blog posts:
There will be a tutorial at the Percona Live Open Source Database Conference in Santa Clara in April, 2017.
Percona XtraDB Cluster / Galera Cluster
Another solution that allows you to replicate information to other nodes is Percona XtraDB Cluster. This solution focuses on delivering consistency, and also uses a certification process to guarantee that transactions avoid conflicts and are performed correctly.
In this case, we are talking about a clustered solution. Each environment is subject to the same data, and there is communication in-between nodes to guarantee consistency.
Percona XtraDB Cluster has multiple components:
- Percona Server for MySQL
- Percona XtraBackup for performing snapshots of the running cluster (if recovering or adding a node).
- wsrep patches / Galera Library
This solution is virtually synchronous, which is comparable to Group Replication. However, it also has the capability to use multi-master replication. Solutions like Percona XtraDB Cluster are a component to improve the availability of your database infrastructure.
A tutorial on Percona XtraDB Cluster will be given at the Percona Live Open Source Database Conference in Santa Clara in April 2017.
Row-Based Replication Vs. Statement-Based Replication
With statement-based replication, the SQL query itself is written to the binary log. For example, the exact same INSERT/UPDATE/DELETE statements are executed by the slave.
There are many advantages and disadvantages to this system:
- Auditing the database is much easier as the actual statements are logged in the binary log
- Less data is transfered over the wire
- Non-deterministic queries can create actual havoc in the slave environment
- There might be a performance disadvantage, with some queries using statement-based replication (INSERT based on SELECT)
- Statement-based replication is slower due to SQL optimizing and execution
Row-based replication is the default choice since MySQL 5.7.7, and it has many advantages. The row changes are logged in the binary log, and it does not require context information. This removes the impact of non-deterministic queries.
Some additional advantages are:
- Performance improvements with high concurrency queries containing few row changes
- Significant data-consistency improvement
And, of course, some disadvantages:
- Network traffic can be significantly larger if you have queries that modify a large number of rows
- It’s more difficult to audit the changes on the database
- Row-based replication can be slower than statement-based replication in some cases
Some Misconceptions About Replication
Replication is a cluster.
Standard asynchronous replication is not a synchronous cluster. Keep in mind that standard and semi-synchronous replication do not guarantee that the environments are serving the same dataset. This is different when using Percona XtraDB Cluster, where every server actually needs to process each change. If not, the impacted node is removed from the cluster. Asynchronous replication does not have this fail safe. It still accepts reads while in an inconsistent state.
Replication sounds perfect, I can use this as a manual failover solution.
Theoretically, the environments should be comparable. However, there are many parameters influencing the efficiency and consistency of the data transfer. As long as you use asynchronous replication, there is no guarantee that the transaction correctly took place. You can circumvent this by enhancing the durability of the configuration, but this comes at a performance cost. You can verify the consistency of your master and slaves using the pt-table-checksum tool.
I have replication, so I actually don’t need backups.
Replication is a great solution for having an accessible copy of the dataset (e.g., reporting issues, read queries, generating backups). This is not a backup solution, however. Having an offsite backup provides you with the certainty that you can rebuild your environment in the case of any major disasters, user error or other reasons (remember the Bobby Tables comic). Some people use delayed slaves. However, even delayed slaves are not a replacement for proper disaster recovery procedures.
I have replication, so the environment will now load balance the transactions.
Although you’ve potentially improved the availability of your environment by having a secondary instance running with the same dataset, you still might need to point the read queries towards the slaves and the write queries to the master. You can use proxy tools, or define this functionality in your own application.
Replication will slow down my master significantly.
Replication has only minor performance impacts on your master. Peter Zaitsev has an interesting post on this here, which discusses the potential impact of slaves on the master. Keep in mind that writing to the binary log can potentially impact performance, especially if you have a lot of small transactions that are then dumped and received by multiple slaves.
There are, of course, many other parameters that might impact the performance of the actual master and slave setup.