This blog provides high availability (HA) guidelines using group replication architecture and deployment recommendations in MySQL, based on our best practices.
Every architecture and deployment depends on the customer requirements and application demands for high availability and the estimated level of usage. For example, using high read or high write applications, or both, with a need for 99.999% availability.
Here, we give architecture and deployment recommendations along with a technical overview for a solution that provides a high level of high availability and assumes the usage of high read/write applications (20k or more queries per second).
This architecture is composed of two main layers:
- Connection and distribution layer
- RDBMS (Relational Database Management System) layer
The connection layer is composed of:
- Application to proxy redirection mechanism which can be anything from a Virtual IP managed by Keepalived local service to a DNS resolution service like Amazon Route 53. Its function is to redirect the traffic to the active Proxy node.
- Proxy connection distribution is composed of two or more nodes. Its role is to redirect the traffic to the active nodes of the Group Replication cluster. In cases like ProxySQL where the proxy is a level 7 proxy and is able to perform Read/Write split, this layer is also in charge of redirecting writes to the Primary node and reads to the Replicas, and of HA to prevent a single point of failure
The data layer is composed of:
- Primary node serving writes (or source) – this is the node that will accept writes and DDL modifications. Data will be processed following the ACID paradigm (atomicity, consistency, isolation, durability) and replicated to all other nodes.
- Replica nodes are the elements serving read requests. Some replica nodes can be elected Primary in case of Primary node failure. A replica node should be able to leave and join back a healthy cluster without impacting the service.
- Replication mechanism to distribute changes across nodes and in this solution is done with Group Replication. Group Replication is a tightly coupled solution which means that the database cluster is based on a Datacentric approach (single state of the data, distributed commit). In this case, the data is consistent in time across nodes and replication requires a high performant link. Given that, geographic distribution is strongly discouraged and Disaster Recovery (DR) is not implicitly supported by the main Group Replication mechanism.
The node characteristics (CPU/RAM/Storage) are not relevant to the main solution design. They instead must reflect the estimated workload the solution will have to cover, which is a case-by-case identification.
What is important to keep in mind is that all nodes that are part of the cluster must have the same characteristics. If they don’t, the cluster will be imbalanced and service will be affected.
As a generic indication, we recommend using solutions with at least 8 cores and 16GB RAM when production.
How do we measure availability and at what point does it become “high” availability?
Generally speaking, the measurement of availability is done by establishing a measurement time frame and dividing it by the time that it was available. This ratio will rarely be 1, which is equal to 100% availability. At Percona we don’t consider a solution to be highly available if it is not at least 99% or “two nines” available.
|Availability %||Downtime per year||Downtime per month||Downtime per week||Downtime per day|
|99% (“two nines”)||3.65 days||7.31 hours||1.68 hours||14.40 minutes|
|99.5% (“two nines five”)||1.83 days||3.65 hours||50.40 minutes||7.20 minutes|
|99.9% (“three nines”)||8.77 hours||43.83 minutes||10.08 minutes||1.44 minutes|
|99.95% (“three nines five”)||4.38 hours||21.92 minutes||5.04 minutes||43.20 seconds|
|99.99% (“four nines”)||52.60 minutes||4.38 minutes||1.01 minutes||8.64 seconds|
|99.995% (“four nines five”)||26.30 minutes||2.19 minutes||30.24 seconds||4.32 seconds|
|99.999% (“five nines”)||5.26 minutes||26.30 seconds||6.05 seconds||864.00 milliseconds|
How is High Availability Achieved?
There are three key components to high availability:
- Infrastructure – This is the physical or virtual hardware that database systems rely on to run. Without enough infrastructure (VM’s, networking, etc) there cannot be high availability. The easiest example is: there is no way to make a single server highly available.
- Topology Management – This is the software management related specifically to the database and managing its ability to stay consistent in the event of a failure. Many clustering or synchronous replication solutions offer this capability out of the box. However, for asynchronous replication, this is handled by additional software.
- Connection Management – This is the software management related specifically to the networking and connectivity aspect of the database. Clustering solutions typically bundle with a connection manager, however in asynchronous clusters deploying a connection manager is mandatory for high availability.
This Solution Provides:
The proposed solution, based on a tightly coupled database cluster, offers an HA level of 99.995% when coupled with the Group replication setting group_replication_consistency=AFTER.
If properly planned and architected, a database failure or configuration change that requires a restart shouldn’t affect the stability of the database infrastructure. Failovers are an integral part of a stability strategy and aligning the business requirements for availability and uptime with failover methodologies is critical to achieving those goals. Below are the 3 main types of failovers that can occur in database environments.
- Planned Failover: A planned failover is a failover that has been scheduled in advance or occurs at a regular interval. There can be many reasons for planned failovers including patching, large data operations, retiring existing infrastructure, or simply testing the failover strategy.
- Unplanned Failover: An unplanned failover is what occurs when a database unexpectedly becomes unresponsive or experiences instability. This could also include emergency changes that do not fall under the planned failover cadence or scheduling parameters. Unplanned failovers are generally considered higher-risk operations due to the high stress and high potential for either data corruption or data fragmentation.
- Regional or Disaster Recovery Failover: Unplanned failovers still work with the assumption that additional database infrastructure is immediately available and in a usable state. In a regional or DR failover, we would be making the assumption that there is a large-scale infrastructure outage that requires the business to move its operations away from its current availability zone.
- This solution covers both planned and unplanned failovers.
Major vs Minor Maintenance: Although it may not be obvious at first, not all maintenance activities are created equal and do not have the same dependencies. It is good to separate maintenance that demands downtime or failover from maintenance that can be done without impacting those important stability metrics. When defining these maintenance dependencies there can be a change in the actual maintenance process that allows for a different cadence.
Maintenance Without Service Interruption: With rolling restart and using proper version upgrade it is possible to cover both major and minor maintenance without service interruption.
When referring to database stability, uptime is likely the largest indicator of stability and oftentimes is the most obvious symptom of an unstable database environment. Uptime is composed of 3 key components and, contrary to common perception, is based on what happens when the database software is not able to take incoming requests rather than maintaining the ability to take requests with errors.
Recovery Time Objective (RTO): This can be boiled down to a very simple question “How long can the business sustain a database outage?”. Once the business is aligned with a goal of a minimum viable recovery time objective, it is much more straightforward to plan and invest in the infrastructure required to meet that requirement. It is important to acknowledge that while everyone desires 100% uptime, there is a need for realistic expectations that align with the business needs and not a technical desire.
Recovery Point Objective (RPO): There is a big distinction between the Recovery Point and the Recovery Time for database infrastructure. The database can be available, but not to the exact state that it was when it became unavailable. That is where Recovery Point comes in. The question we ask here is “How much data can the business lose during a database outage?”. All businesses have their own requirements here and it is worthy to note that it is always the goal to never sustain any data loss. But this is framed in a worst-case scenario how much data could be lost and the business maintains the ability to continue.
Disaster Recovery: While RTO and RPO are great for unplanned outages or small-scale hiccups to the infrastructure, when we talk about Disaster Recovery this is a major large-scale outage not strictly for the database infrastructure. How capable is the business of restarting operations with the assumption that all resources are completely unavailable in the main availability zone? The assumption here is that there is no viable restoration point or time that aligns with the business requirements. While each DR scenario is unique based on available infrastructure, backup strategy, and technology stack, there are some common threads for any scenario.
This solution helps improve uptime:
Using this solution will help you to significantly reduce both RPO and RTO. Given the tightly coupled cluster solution approach, the failure of a single node will not result in service interruption.
Increasing the number of nodes will also improve the cluster resilience by the formula:
F = (N -1) / 2
F – Number of admissible failures
N – number of nodes in the cluster
In a cluster of 5 nodes, F = (5 – 1)/2 = 2.
The cluster can support up to 2 failures.
In a cluster of 4 nodes, F = (4 – 1)/2 = 1.
The cluster can support up to 1 failure.
This solution also allows for a more restrictive backup policy, dedicating a node to the backup cycle, which will help in keeping RPO low. As previously mentioned, DR is not covered by default by the solution which will require an additional replication setup and controller.
Measurement and Monitoring
To ensure database infrastructure is performing as intended or at its best, it is necessary to measure specific metrics and alert when some of these metrics are not in line with expectations. Periodic review of these measurements is also encouraged to promote stability and understand potential risks associated with the database infrastructure. Below are the 3 aspects of Database performance measurement and monitoring
Measurement: To understand how a database infrastructure is performing there is a need to measure multiple aspects of the infrastructure. With measurement, it’s important to understand the impact of the sample sizes, sample timing, and sample types.
Metrics: Metrics refer to the actual parts of the database infrastructure being measured. When we discuss metrics, more isn’t always better as it could introduce unintentional noise or allow for troubleshooting to become overly burdensome.
Alerting: When one or many metrics of the database infrastructure is not within a normal or acceptable range, an alert should be generated so that the team responsible for the appropriate portion of the database infrastructure can investigate and remedy it
Monitoring for this solution is covered by:
Percona Monitoring and Management has a specific dashboard to monitor the Group Replication state and cluster status as a whole. (https://www.percona.com/doc/percona-monitoring-and-management/2.x/introduction.html) has a specific dashboard to monitor Group Replication state, and cluster status as a whole.
How to Implement the Infrastructure
In this section, we are providing the step by step instructions on how to implement the above solution.
The following will be used:
- 1 Virtual IP for ProxySQL failover – 192.168.4.194
- 2 ProxySQL nodes
- Proxy1 192.168.4.191
- Proxy2 192.168.4.192
- 4 MySQL nodes in Single Primary mode
- Gr1 192.168.4.81 – Initial Primary
- Gr2 192.168.4.82 – Replica / failover
- Gr3 192.168.4.83 – Replica / failover
- Gr4 192.168.4.84 – Replica / Backup
- Ports. All ports must be open if a firewall is in place or any other restriction like AppArmor or SELinux.
- MySQL – GR
First, you need to install the Percona Distribution for MySQL, the Percona Server for MySQL-based variant, on each node. Follow the instructions at https://www.percona.com/doc/percona-distribution-mysql/8.0/installing.html to install Percona Server for MySQL v8.0.
Configure the Nodes
Before anything, make sure that all the nodes use the same time-zone and time:
[root@gr1 ps8]# date
Tue Aug 18 08:22:12 EDT 2020
Check also for ntpd service to be present and enabled. Be sure that each node resolves the other nodes by name:
root@gr1 ps8]# for i in 1 2 3 4 ; do ping -c 1 gr$i > /dev/null;echo $?; done
If not able to resolve, add the entries in the /etc/hosts file.
Once instances are up and running check the Percona Server for MySQL version on each node:
/opt/mysql_templates/PS-8P/bin/mysql Ver 8.0.20-11 for Linux on x86_64 (Percona Server (GPL), Release 11, Revision 159f0eb)
Create a proper user for administration:
CREATE user dba@localhost identified by 'dbapw';
CREATE user dba@'192.168.%' identified by 'dbapw';
GRANT ALL on *.* to dba@localhost with grant option;
GRANT ALL on *.* to dba@'192.168.%' with grant option;
Exit from the client as user root and login as user dba.
Be sure to have a good and unique SERVER_ID value:
(dba@node1) [(none)]>show global variables like 'server_id';
| Variable_name | Value |
| server_id | 1 | <--- Not good given the same for all nodes
1 row in set (0.01 sec)
It’s now time to add group replication settings to the instances.
Stop all running nodes, then in the my.cnf add:
#Replication + binlog settings
### SLAVE SECTION
slave-parallel-type = LOGICAL_CLOCK
slave-parallel-workers = 4
slave-preserve-commit-order = 1
group_replication_group_name ="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" <-- Not good use something that will help you
to identify the GR transactions and from where they
come from IE "dc1euz1-aaaa-aaaa-aaaa-aaaaaaaaaaaa"