Support for Percona XtraDB Cluster in ProxySQL (Part One)

Support for Percona XtraDB Cluster in ProxySQL

How native ProxySQL stands in failover support (both v2.0.15 and v2.1.0)

In recent times I have been designing several solutions focused on High Availability and Disaster Recovery. Some of them using Percona Server for MySQL with group replication, some using Percona XtraDB Cluster (PXC). What many of them had in common was the use of ProxySQL for the connection layer. This is because I consider the use of a layer 7 Proxy preferable, given the possible advantages provided in ReadWrite split and SQL filtering. 

The other positive aspect provided by ProxySQL, at least for Group Replication, is the native support which allows us to have a very quick resolution of possible node failures.

ProxySQL has Galera support as well, but in the past, that had shown to be pretty unstable, and the old method to use the scheduler was still the best way to go.

After Percona Live Online 2020 I decided to try it again and to see if at least the basics were now working fine. 

What I Have Tested

I was not looking for complicated tests that would have included different levels of transaction isolation. I was instead interested in the more simple and basic ones. My scenario was:

1 ProxySQL node v2.0.15  (192.168.4.191)
1 ProxySQL node v2.1.0  (192.168.4.108)
3 PXC 8.20 nodes (192.168.4.22/23/233) with internal network (10.0.0.22/23/33) 

ProxySQL was freshly installed. 

All the commands used to modify the configuration are here. Tests were done first using ProxySQL v2.015 then v2.1.0. Only if results diverge I will report the version and results. 

PXC- Failover Scenario

As mentioned above I am going to focus on the fail-over needs, period. I will have two different scenarios:

  • Maintenance
  • Node crash 

From the ProxySQL point of view I will have three scenarios always with a single Primary:

  • Writer is NOT a reader (option 0 and 2)
  • Writer is also a reader

The configuration of the native support will be:

Galera host groups:

  • Writer: 100
  • Reader: 101
  • Backup_writer: 102
  • Offline_hostgroup: 9101

Before going ahead let us analyze the Mysql Servers settings. As you can notice I am using the weight attribute to indicate ProxySQL which is my preferred write. But I also use weight for the READ Host Group to indicate which servers should be used and how.

Given that we have that:

  • Write
    • 192.168.4.22  is the preferred Primary
    • 192.168.4.23  is the first failover 
    • 192.168.4.233 is the last chance 
  • Read
    • 192.168.4.233/23 have the same weight and load should be balanced between the two of them
    • The 192.168.4.22 given is the preferred writer should NOT receive the same load in reads and have a lower weight value.  

The Tests

First Test

The first test is to see how the cluster will behave in the case of 1 Writer and 2 readers, with the option writer_is_also_reader = 0.
To achieve this the settings for proxysql will be:

As soon as I load this to runtime, ProxySQL should move the nodes to the relevant Host Group. But this is not happening, instead, it keeps the readers in the writer HG and SHUN them.

This is, of course, wrong. But why does it happen?

The reason is simple. ProxySQL is expecting to see all nodes in the reader group with READ_ONLY flag set to 1. 

In ProxySQL documentation we can read:

writer_is_also_reader=0: nodes with read_only=0 will be placed either in the writer_hostgroup and in the backup_writer_hostgroup after a topology change, these will be excluded from the reader_hostgroup.

This is conceptually wrong. 

A PXC cluster is a tightly coupled replication cluster, with virtually synchronous replication. One of its benefits is to have the node “virtually” aligned with respect to the data state. 

In this kind of model, the cluster is data-centric, and each node shares the same data view.

What it also means is that if correctly set the nodes will be fully consistent in data READ.

The other characteristic of the cluster is that ANY node can become a writer anytime. 
While best practices indicate that it is better to use one Writer a time as Primary to prevent certification conflicts, this does not mean that the nodes not currently elected as Primary, should be prevented from becoming a writer.

Which is exactly what READ_ONLY flag does if activated.

Not only, the need to have READ_ONLY set means that we must change it BEFORE we have the node able to become a writer in case of fail-over. 

This, in short, means the need to have either a topology manager or a script that will do that with all the relative checks and logic to be safe. Which in time of fail-over means it will add time and complexity when it’s not really needed and that goes against the concept of the tightly-coupled cluster itself.

Given the above, we can say that this ProxySQL method related to writer_is_also_reader =0, as it is implemented today for Galera, is, at the best, useless. 

Why is it working for Group Replication? That is easy; because Group Replication internally uses a mechanism to lock/unlock the nodes when non-primary, when using the cluster in single Primary mode. That internal mechanism was implemented as a security guard to prevent random writes on multiple nodes, and also manage the READ_ONLY flag. 

Second Test

Let us move on and test with writer_is_also_reader = 2. Again from the documentation:

writer_is_also_reader=2 : Only the nodes with read_only=0 which are placed in the backup_writer_hostgroup are also placed in the reader_hostgroup after a topology change i.e. the nodes with read_only=0 exceeding the defined max_writers.

Given the settings as indicated above, my layout before using Galera support is:

After enabling Galera support:

So node ending with 22 (the Primary elected) is not in the reader pool. Which can be ok, I assume. 

But what is not OK at all is that the READERS have now a completely different weight. Nodes x.23 and x.233 are NOT balancing the load any longer, because the weight is not the same or the one I define. It is instead copied over from the WRITER settings. 

Well of course this is wrong and not what I want. Anyhow, let’s test the READ failover.

I will use sysbench read-only:

Test Running