As we knowOrchestrator is a MySQL high availability and replication management tool that aids in managing farms of MySQL servers. In this blog post, we discuss how to make the Orchestrator (which manages MySQL) itself fault-tolerant and highly available.

When considering HA for the Orchestrator one of the popular choices will be using the Raft consensus.

What is Raft?

Raft is a consensus protocol/ algorithm where multiple nodes composed of a (Leader) and (Followers) agree on the same state and value. The Leaders are decided by the quorum and voting, and it is the responsibility of the Leader Raft to do all the decision-making and changes. The other node just follows or syncs with the Leader without involving any direct changes.

When Raft is used with Orchestrator, it provides high availability, solves network partitioning, and ensures fencing on the Isolated node.

Deployment

Next, we will see how we can deploy an Orchestrator/Raft based setup with the below topology. 

For demo purposes, I am using the same server for both Orchestrator/Raft and MySQL. 

So, we have the topology below, which we are going to deploy. Each Raft/Orchestrator node has its own separate SQLite database instance.

Orchestrator Raft Topology

Orchestrator Raft

Installation

For this demo, I am installing the packages via Percona distribution. However we can also install the Orchestrator packages from Percona or Openark repositories directly.

Note – Openark is no longer active, and the last update was quite some time ago( “2021”). Therefore, we can rely on the Percona repositories, which have the latest release last pushed on  (“2024”).

Orchestrator/Raft configuration

1) Create database-specific users/tables on the Source database node (Node1).

Note – Orchestrator will fetch the cluster details from this table.

Note – These credentials will be used by the Orchestrator to connect to the MySQL backends.

2) Then, we need to copy the orchestrator template file to the /etc/orchestrator.conf.json and perform the necessary changes in the mentioned sections.

  • Replace the MySQL topology credentials with the created ones.

  • Remove the below options since we are relying on the SQLite3 database to manage the Orchestrator backend.

In case we use MySQL as an orchestrator backend then we need the below two changes.

Create Orchestrator schema and related credentials.

You need to replace the details with Orchestrator managing database (MySQL) information.

  • Replace the existing value with the below query to fetch the cluster details from the MySQL node directly.

  • Add the below SQLite3 configuration. This only applicable when using SQLite database instead of MySQL backend.

  • Auto-failover settings.


     

    RecoverMasterClusterFilters => It defines which cluster should be auto failover/recover.
    RecoverIntermediateMasterClusterFilters => It resembles whether recovery/failure for intermediate masters allow. Intermediate masters are the replica hosts, which have their replicas as well.

  •    Now perform the Raft-related configuration.
Node1:

Node2:

Node3:

Note – Here we mainly replace the RaftAdvertise/RaftBind configuration for each node. We need to also make sure the communication between the nodes is allowed on the  given Raft port (10008).

3) Then, we can create the Raft data directory on each node.

4) Finally, we can start the Orchestrator service on each node.

Node Discovery:

From the Orchestrator UI- http://ec2-54-147-20-38.compute-1.amazonaws.com:3000/web/status directly we can do the initial Node discovery process.

Orchestrator Node discovery

Node Discovery

So here is our MySQL topology consisting of all 3 nodes.

Orchestrator topology

MySQL Topology

Accessing Orchestrator managing database(SQLlite3):

As we are using SQLite3, we can use the below way to access the tables and information from the insight of the database. 

Output:

Health/Service:

Next, we can check the logs of each node to confirm  the status.

We will see some voting and state changing in the below logs. So, Node2(172.31.16.8) becomes the leader while other nodes follow it.

We can also use the below curl command to get the status.

Output:

In the Orchestrator UI itself we can check the Raft details.

Raft

Raft Nodes

Raft Failover/Switchover:

Now consider the current raft-leader Node1(172.31.20.60).

Output:

If we stop Node1 we can see that one of the follower nodes (Node2) becomes the new leader.

Node2:

Node3:

So, the new leader is Node2(172.31.16.8) now.

Output:

We can also manually trigger the switchover using the command raft-elect-leader from the current leader node.

Output:

Basically Raft leader node is responsible for making all topology related changes and recovery. Other nodes just sync/exchange information. 

Once we stop the Source  database Node3(172.31.23.135)  the auto failover happens automatically. These are the logs from the Leader Raft node.

Summary

In this blog post, we explored one of the ways of setting up high availability for the Orchestrator tool. The Raft mechanism has the advantage that it comes with automatic fencing and fault tolerance by voting/consensus mechanism. The leader will be elected and the sole responsible for all changes and recoveries. In a production environment, we should have at least three nodes (odd number) to have a quorum/voting. Also, there are some other ways that exist for configuring HA in an orchestrator using  (Semi HA and HA by the shared backend) which we can explore in some other blog posts.

Subscribe
Notify of
guest

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Hari

Hi Anil,
This is really helpful. Just wondering if the setup can be done to handle multiple clusters. Suppose I have 3 different clusters in 3 nodes, can this setup be implemented for all the clusters?