How to Add More Nodes to an Existing ProxySQL Cluster

In my previous post, some time ago, I wrote about the new cluster feature of ProxySQL. For that post, we were working with three nodes, now we’ll work with even more! If you’ve installed one ProxySQL per application instance and would like to work up to more, then this post is for you. If this is new to you, though, read my earlier post first for more context.

Check the image below to understand the structure of “one ProxySQL per application”. This means you have ProxySQL installed, and your application (Java, PHP, Apache server etc) in the same VM (virtual machine).

Schematic of a ProxySQL cluster

Having taken a look at that you probably have a few questions, such as:

  • What happens if you have 20 nodes synced and you now need to add 100 or more nodes?
  • How can I sync the new nodes without introducing errors?

Don’t be scared, it’s a simple process.

Remember there are only four tables which can be synced over the cluster. These tables are:

  • mysql_query_rules
  • mysql_servers
  • mysql_users
  • proxysql_servers

From a new proxysql cluster installation

After a fresh installation all those four tables are empty. That means if we configure it as a new node in a cluster, all rows in those tables will be copied instantly.

Generally the installation is straightforward.

Now ProxySQL is up and all the configuration are in default settings

Connect to the ProxySQL console:

Now there are two steps remaining:

  • Configure global_variables table
  • Configure proxysql_servers table

How to configure the global_variables table

Below is an example showing the minimal parameters to set – you can change the username and password according to your needs.

You can copy and paste the username and passwords from the current cluster and monitoring process by running the next command in a node of the current cluster:

You can update the parameters of the current node by using this as a template:

Configure “proxysql_servers” table

At this point you need keep in mind that you will need to INSERT into this table “all” the IPs from the other ProxySQL nodes.

Why so? Because this table will have a new epoch time and this process will overwrite the rest of the nodes listed by its last table update.

In our example, let’s assume the IP of the new node is 10.0.1.3 (i.e. node 3, below)

If you already have many ProxySQL servers in the cluster, you can run mysqldump as this will help speed up this process.

If that’s the case, you need to find the most up to date node and use mysqldump to export the data from the proxysql_servers table.

How would you find this node? There are a few stats tables in ProxySQL, and in this case we can use two of these to help identify the right node.

Here’s an example output

For each table, we can see different versions, each related to table changes.  In this case, we need to look for the latest epoch time for the table “proxysql_servers“. In this example , above, we can see that the server with the IP address of  10.0.1.1 is the latest version.  We can now run the next command to get a backup of all the IP data from the current cluster

Now, copy the output to the new node and import into proxysql_servers table. Here’s an example: the data has been exported to the file proxysql_servers.sql which we’ll now load to the new node:

You can run the next SELECTs to verify that the new node has the data from the current cluster, and in doing so ensure that the nodes are in sync as expected:

How can we check if there are errors in the synchronization process?

Run the next command to fetch data from the table stats_proxysql_servers_checksums:

Using our example, here’s the data saved in stats_proxysql_servers_checksums for the proxysql_servers table

As we can see in the column “diff” there are some differences in the checksum of the other nodes: our new node (IP 10.0.1.3) has the checksum 0x233638C097DE6190 where as nodes 1 and 2 both show 0xC7D7443B96FC2A94

This indicates that these nodes have different data. Is this correct? Well, yes, in this case, since in our example the nodes 1 and 2 were created on the current ProxySQL cluster

How do you fix this?

Well, you connect to the node 1 and INSERT the data for node 3 (the new node).

This change will propagate the changes over the current cluster – node 2 and node 3 – and will update the table proxysql_servers.

What happens if the new nodes already exist in the proxysql_servers table of the current cluster?

You can add the new ProxySQL node IPs to the current cluster before you start installing and configuring the new nodes.

In fact, this works fine and without any issues. The steps I went through, above, are the same, but when you check for differences in the table stats_proxysql_servers_checksums you should find there are none, and that the “diff” column should be displayed as 0.

How does ProxySQL monitor other ProxySQL nodes to sync locally?

Here’s a link to the explanation from the ProxySQL wiki page https://github.com/sysown/proxysql/wiki/ProxySQL-Cluster

Proxies monitor each other, so they immediately recognize that when a checksum of a configuration changes, the configuration has changed. It’s possible that the remote peer’s configuration and its own configuration were changed at the same time, or within a short period of time. So the proxy must also check its own status.

When proxies find differences:

  • If its own version is 1 , find the peer with version > 1 and with the highest epoch, and sync immediately
  • If its own version is greater than 1, starts counting for how many checks they differ
    • when the number of checks in which they differ is greater than cluster__name___diffs_before_sync and cluster__name__diffs_before_sync itself is greater than 0, find the peer with version > 1 and with the highest epoch, and sync immediately.

Note: it is possible that a difference is detected against one node, but the sync is performed against a different node. Since ProxySQL bases the sync on the node with the highest epoch, it is expected that all the nodes will converge.

To perform the syncing process:

These next steps run automatically when other nodes detect a change: this is an example for the table mysql_users

The same connection that’s used to perform the health check is used to execute a series of SELECTstatements in the form of SELECT _list_of_columns_ FROM runtime_module

In the node where it detected a difference in the checksum, it will run the next statements automatically:

Be careful:

After you’ve added the new ProxySQL node to the cluster, all four tables will sync immediately using the node with the highest epoch time as the basis for sync. Any changes to the tables previously added to the cluster, will propagate and overwrite the cluster

Summary

ProxySQL cluster is a powerful tool, and it’s being used more and more in production environments. If you haven’t tried it before, I’d recommend that you start testing ProxySQL cluster with a view to improving your application’s service. Hope you found this post helpful!

 

Share this post

Leave a Reply