Distributed systems are hard – I just want to echo that. In MySQL, we have quite a number of options to run highly available systems. However, real fault tolerant systems are difficult to achieve.
Take for example a common use case of multi-DC replication where Orchestrator is responsible for managing the topology, while ProxySQL takes care of the routing/proxying to the correct server, as illustrated below. A rare case you might encounter is that the primary MySQL node01 on DC1 might have a blip of a couple of seconds. Because Orchestrator uses an adaptive health check – not only the node itself but also consults its replicas – it can react really fast and promote the node in DC2.
Why is this problematic?
The problem occurs when node01 resolves its temporary issue. A race condition could occur within ProxySQL that could mark it back as read-write. You can increase an “offline” period within ProxySQL to make sure Orchestrator rediscovers the node first. Hopefully, it will set it to read-only immediately, but what we want is an extra layer of predictable behavior. This normally comes in the form of STONITH – by taking the other node out of action, we practically reduce the risk of conflict close to zero.
Orchestrator supports hooks to do this, but we can also do it easily with ProxySQL using its built in scheduler. In this case, we create a script where Orchestrator is consulted frequently for any nodes recently marked as downtimed, and we also mark them as such in ProxySQL. The script proxy-oc-tool.sh can be found on Github.
What does this script do? In the case of our topology above:
- If for any reason, connections to MySQL on node01 fail, Orchestrator will pick node02 as the new primary.
- Since node01 is unreachable – cannot modify read_only nor update replication – it will be marked as downtimed with lost-in-recovery as the reason.
- If node01 comes back online, and ProxySQL sees it before the next Orchestrator check, it can rejoin the pool. Then it’s possible that you have two writeable nodes in the hostgroup.
- To prevent the condition above, as soon as the node is marked with downtime from Orchestrator, the script proxy-oc-tool.sh will mark it OFFLINE_SOFT so it never rejoins the writer_hostgroup in ProxySQL.
- Once an operator fixes node01 i.e. reattaches as a replica and removes the downtimed mark, the script proxy-oc-tool.sh will mark it back ONLINE automatically.
- Additionally, if DC1 gets completely disconnected from DC2 and AWS, the script will not be able to reach Orchestrator’s raft-leader and will set all nodes to OFFLINE_SOFT preventing isolated writes on DC1.
Adding the script to ProxySQL is simple. First you download and set permissions. I placed the script in /usr/bin/ – but you can put it anywhere accessible by the ProxySQL process.
chmod 0755 proxy-oc-tool.sh
mv proxy-oc-tool.sh /usr/bin/
Note, you will need to edit some variables in the script i.e. ORCHESTRATOR_PATH .
Then load into the scheduler:
INSERT INTO scheduler (interval_ms, filename)
VALUES (5000, '/usr/bin/proxy-oc-tool.sh');
LOAD SCHEDULER TO RUNTIME;
SAVE SCHEDULER TO DISK;
I’ve set the interval to five seconds since inside ProxySQL, a shunned node will need about 10 seconds before the next read-only check is done. This way, this script is still ahead of ProxySQL and is able to mark the dead node as OFFLINE_SOFT .
Because this is the simple version, there are obvious additional improvements to be made in the script like using scheduler args to specify and ORCHESTRATOR_PATH implement error checking.