Achieving Consistent Read and High Availability with Percona XtraDB Cluster 8.0

High Availability with Percona XtraDB Cluster 8.0In real life, there are frequent cases where getting a running application to work correctly is strongly dependent on consistent write/read operations. This is no issue when using a single data node as a provider, but it becomes more concerning and challenging when adding additional nodes for high availability and/or read scaling. 

In the MySQL dimension, I have already described it here in my blog Dirty Reads in High Availability Solution.

We go from the most loosely-coupled database clusters with primary-replica async replication, to the fully tightly-coupled database clusters with NDB Cluster (MySQL/Oracle).

Adding components like ProxySQL to the architecture can, from one side, help in improving high availability, and from the other, it can amplify and randomize the negative effect of a stale read. As such it is crucial to know how to correctly set up the environment to reduce the risk of stale reads, without reducing the high availability. 

This article covers a simple HOW-TO for Percona XtraDB Cluster 8.0 (PXC) and ProxySQL, providing an easy to follow guide to obtain no stale reads, without the need to renounce at read, scaling or a high grade of HA thanks to PXC8.

The Architecture

The covered architecture is based on:

  1. PXC8 cluster compose by 3 nodes
  2. ProxySQL v2 node in a cluster to avoid a single point of failure
  3. Virtual IP with KeepAlived see here. If you prefer to use your already-existing load balancer, feel free to do so.
  4. N number of application nodes, referring to VIP

Installation

Install PXC8 

Install ProxySQL

And finally, set the virtual IP as illustrated in the article mentioned above.  It is now the time to do the first step towards the non-stale read solution. 

Covering Stale Reads

With PXC, we can easily prevent stale reads by setting the parameter to one of the following values wsrep-sync-wait = 1 – 3 – 5 or 7 (default = 0). We will see what changes in more detail in part two of the blog to be published soon. For now, just set it to wsrep-sync-wait = 1 ;.

The cluster will ensure consistent reads no matter from which node you will write and read.

This is it. So simple!

ProxySQL Requirements

The second step is to be sure we set up our ProxySQL nodes to use:

  • One writer a time to reduce the certification conflicts and Brutal Force Abort
  • Avoid including the writer in the reader group
  • Respect the order I am setting for failover in case of needs  

Now here we have a problem; ProxySQL v2 comes with very interesting features like SSL Frontend/backend, support for AWS Aurora …and more. But it also comes with a very poor native PXC support. I have already raised this in my old article on February 19, 2019, and raised other issues with discussions and bug reports. 

In short, we cannot trust ProxySQL for a few factors:

  • The way it deals with the nodes failover/failback is not customizable
  • The order of the nodes is not customizable
  • As of this writing, the support to have the writer NOT working as a reader is broken

In the end, the reality is that in order to support PXC/Galera, the use of an external script using the scheduler is more flexible, solid, and trustworthy. As such, the decision is to ignore the native Galera support, and instead focus on the implementation of a more robust script. 

For the scope of this article, I have reviewed, updated, and extended my old script.

Percona had also developed a Galera checker script that was part of the ProxySQL-Admin-Tools suite, but that now has been externalized and available in the PerconaLab GitHub.

Setting All Blocks 

The setup for this specific case will be based on:

  • Rules to perform read-write split.
  • One host group to define the writer HG 200
  • One host group to define the reader HG 201
  • One host group to define candidate writers HG 8200
  • One host group to define candidate readers HG 8201

The final architecture will look like this:

High Availability with Percona XtraDB Cluster

ProxySQL Nodes:

Node1 192.168.4.191 public ip
      10.0.0.191    internal ip
Node1 192.168.4.192 public ip
      10.0.0.192    internal ip
Node1 192.168.4.193 public ip
       10.0.0.193    internal ip  

VIP   192.168.4.194 public ip

PXC8 Nodes:

pxc1  10.0.0.22
pxc2  10.0.0.23
pxc3  10.0.0.33

Let us configure PXC8 first. Operation one is to create the users for ProxySQL and the script to access the PXC cluster for monitoring.

The second step is to configure ProxySQL as a cluster:

Add a user able to connect from remote. This is will require ProxySQL nodes to be restarted.

On rotation, do all ProxySQL nodes. 

The third part is to set the variables below.

Please note that the value for admin-cluster_mysql_servers_diffs_before_sync is not standard and is set to 1

It is now time to define the ProxySQL cluster nodes:

Check the ProxySQL logs and you should see that the nodes are now linked:

Once this is done let us continue the setup, adding the PXC nodes and all the different host groups to manage the architecture:

You can see that as mentioned we have two host groups to manage the cluster 8200 and 8201. Those two host groups work as templates and they will change only by us manually.

The 8200 host group weight defines the order of the writers from higher to lower. Given that node 10.0.0.22 with weight 1000 is the preferred writer. At the moment of writing, I chose to NOT implement automatic fail-back. I will illustrate later how to trigger that manually. 

Once we have all the servers up, lets’ move on and create the users:

And the query rules to have Read/Write split:

The final step is to set the scheduler:

Let analyze the script parameters:

  • The schedule ID.    id: 10
  • As a best practice, always keep the scheduler script not active by default and enable it only when in the need. active: 0  
  • Interval is how often the scheduler should execute the script; it needs to be often enough to reduce the time the service is in a degraded state, but not so often to be noisy. An interval of two seconds is normally a good start.   interval_ms: 2000
  • The location of the script that must be set as executable filename: /var/lib/proxysql/galera_check.pl

Given the scheduler limitation to five arguments, we collapse all the parameters in one and let the script then parse them.   arg1:  -u=cluster1 -p=clusterpass -h=192.168.4.191 -H=200:W,201:R -P=6032 –retry_down=2 –retry_up=1 –main_segment=2 –debug=0  –log=/var/lib/proxysql/galeraLog –active_failover=1 –single_writer=1 –writer_is_also_reader=0

 The parameters we pass here are:

  • The credential to connect to ProxySQL: -u=cluster1 -p=clusterpass -h=192.168.4.191 -P=6032
  • The host group definition: -H=200:W,201:R This setting is necessary because you can have multiple script running serving multiple clusters.
  • The retry settings are to reduce the risk of false positive, say a network hiccup or other momentary events against which you do not want to take action: –retry_down=2 –retry_up=1 
  • Given the script is segment-aware, you need to declare the main segment that is serving the applications: –main_segment=2 
  •  Log location/name the final name will be the combination of this plus the host groups (ie galeraLog_200_W_201_R.log ) : –log=/var/lib/proxysql/galeraLog 
  •  If script should deal with failover or not and what type (read documentation/help for details): –active_failover=1 
  •  If the script should support SINGLE writer (default recommended), or multiple writer nodes: –single_writer=1 
  •  Is (are) the writers also working as readers or fully write dedicated: –writer_is_also_reader=0

Once we are confident our settings are right, let us put the script in production:

 

Warning

One important thing to keep in mind is that ProxySQL scheduler IS NOT part of the cluster synchronization, as such we must manually configure that part on each node. Once the script runs, any change done inside ProxySQL to the mysql_server table will be kept in sync by the ProxySQL cluster. It is strongly recommended to not mix ProxySQL nodes in the cluster and sparse one, as this may cause unexpected behavior. 

At this point, your PXC8 cluster architecture is fully running and will provide you with a very high level of HA and write isolation while preserving the read scaling capabilities.

In part two of this post, we will see the cluster in action and how it behaves in case of standard operations like backup or emergency cases like node crashes.


Percona XtraDB Cluster is a cost-effective and robust clustering solution created to support your business-critical data. It gives you the benefits and features of MySQL along with the added enterprise features of Percona Server for MySQL.

Download Percona XtraDB Cluster Datasheet

Share this post

Comment (1)

  • Sevak Reply

    Hi Marco & thank you for your continuous support of mysql HA solutions. Up until now I have believed, that proxysql is naturally capable of automatic failover in case of backend node crash. I’m still unable to prove, that it can reroute client connection to alive backend node without interrupting the query, instantly and without any downtime. So am I wrong in this and really it can’t do this by default?

    July 29, 2020 at 1:12 pm

Leave a Reply