Managing farms of MySQL servers under a replication environment is very efficient with the help of a MySQL orchestrator tool. This ensures a smooth transition happens when there is any ad hoc failover or a planned/graceful switchover comes into action.

Several configuration parameters play a crucial role in controlling and influencing failover behavior. In this blog post, we’ll explore some of these key options and how they can impact the overall failover process.

Let’s discuss each of these settings one by one with some examples.

FailMasterPromotionIfSQLThreadNotUpToDate

By default, this option is disabled. However, when it is “true” and a master failover takes place while the candidate master has still not consumed all the relay log events, the failover or promotion process will be terminated.

If this setting remains “false,” then in a scenario where all the replicas are lagging and the current master is down, one of the members will be chosen as a new master, which eventually can lead to data loss on the new master node. Later, when the old master is again added as a replica, it could lead to duplicate entry problems.

Considering “FailMasterPromotionIfSQLThreadNotUpToDate” is enabled in the orchestrator configuration file “orchestrator.conf.json”:

This is the topology managed by the orchestrator:

Below, we are running some workloads via sysbench, which will help increase the replication lag for our testing purposes. 

Output:

After some duration, the replication lag starts increasing on the replicas, and in the meantime, we just stopped the primary [127.0.0.1:22637].

The result was that the master promotion failed as the SQL thread was not up to date. 

Now, if the option “FailMasterPromotionIfSQLThreadNotUpToDate” is false or the default one, the failover will work flawlessly even if the replica is suffering from replication lag.

In the same above scenario, under “FailMasterPromotionIfSQLThreadNotUpToDate”: false condition, the master promotion happens successfully.

cat /tmp/recovery.log:

DelayMasterPromotionIfSQLThreadNotUpToDate

This parameter is the opposite of what we discussed above. Here, instead of aborting the master failover, it will be delayed until the candidate master has consumed all the relay log files. When it’s “true,” the orchestrator process will wait for the SQL thread to catch up before promoting to a new master.

Considering “DelayMasterPromotionIfSQLThreadNotUpToDate” is enabled in the orchestrator configuration file “orchestrator.conf.json”:

Some workload was running on the master [127.0.0.1:22637], and in a few seconds, the replication lag started emerging. We stopped the master node around this.

We can see in the log file /tmp/tmp/recovery.log that the failover initial process started. 

However, we can observe that due to the replication lag on the candidate master [Anils-MacBook-Pro.local:22638] the promotion was put on hold to recover the lag before the failover.

Anils-MacBook-Pro.local:22638

Anils-MacBook-Pro.local:22639

There is one more observation here. If we gracefully switch over while the replication lag persists on all replicas, we will get the message below.

Output:

This is happening because of the condition below, whereby the replication lag should be equal to or less than the defined “ReasonableMaintenanceReplicationLagSeconds: 20.

https://github.com/openark/orchestrator/blob/730db91f70344e38296dbb0fecdbc0cefd6fca79/go/logic/topology_recovery.go#L2124
https://github.com/openark/orchestrator/blob/730db91f70344e38296dbb0fecdbc0cefd6fca79/go/inst/instance.go#L37

Once this condition is met and replication lag is below that threshold, OR else we have increased the threshold, the graceful takeover works fine.

Here, the orchestrator service logs reflect the failover process now waiting for all the relay logs to finish.

Once lag resolves, the takeover process runs successfully.

Output:

FailMasterPromotionOnLagMinutes

This parameter ensures that a master promotion will be aborted if the replica is lagging >= the configured number of minutes. In order to use this flag, we must also use “ReplicationLagQuery” and a heartbeat mechanism, “pt-hearbeat”, to assess the correct replication lag.

Let’s see how it works.

We have set the value below in the orchestrator configuration “orchestrator.conf.json,” which ensures that if the lag exceeds ~1 minute, the master promotion process will fail.

As we discussed above, enabling this option depends on setting the “ReplicationLagQuery,” which fetches the replication lag details from the heartbeat mechanism instead of relying on the seconds_behind_master status.

By default, the orchestrator uses the slave status “seconds_behind_master” to monitor the replication lag. However, in a scenario where the replication is already broken and the master also failed, the value of “seconds_behind_master” would be “null,” which eventually would be of no use to get the accurate details needed to make the decision.

So here we are going to use the pt-hearbeat as a source of replication lag. pt-heartbeat is a replication delay monitoring system that measures delay by looking at actual replicated data. This provides “absolute” lag from the master as well as sub-second resolution.

Below is the “ReplicationLagQuery” configuration, which we will define in the orchestrator configuration file.

We also need to have a separate pt-heartbeat process that will run on both source/replica instances.

Anils-MacBook-Pro.local:22637:

Anils-MacBook-Pro.local:22638:

Anils-MacBook-Pro.local:22639:

 

  • –read-only-interval => When –check-read-only is specified, the interval to sleep while the server is found to be read-only. If unspecified, –interval is used.
  • –fail-successive-errors => If specified, pt-heartbeat will fail after given number of successive DBI errors (failure to connect to server or issue a query).
  • –interval => How often to update or check the heartbeat –table. Updates and checks begin on the first whole second then repeat every –interval seconds for –update and every –interval plus –skew seconds for –monitor.

Reference – https://docs.percona.com/percona-toolkit/pt-heartbeat.html

The delay is calculated on the replicas, as the difference between the current system time and the replicated timestamp value from the heartbeat table. Basically, on the master node, pt-heartbeat updates the heartbeat table every second with the server ID and the current timestamp. These updates are replicated to the replica nodes through asynchronous replication. 

E.g.,

Let’s see the behaviour of enabling “FailMasterPromotionOnLagMinutes” with a quick scenario.

We were running some workload in the background to have a replication delay/lag.

Then we tried doing a master graceful failover, but this failed due to the replication lag.

However, as soon as replication lag came < 1 minute, the condition we specified for  [FailMasterPromotionOnLagMinutes], the failover process ran very smoothly.

So, the master failed over to “Anils-MacBook-Pro.local:22638”.

Failover logs “/tmp/recovery.log”.

Conclusion

The purpose of the options discussed above is to provide control over the granularity of the MySQL orchestrator failover process, especially in scenarios where replicas are experiencing replication lag. Essentially, we have the option to either wait for the lag to resolve before triggering a failover, OR to proceed with an immediate failover despite the lag. Additionally, the setting like [FailMasterPromotionOnLagMinutes, FailMasterPromotionIfSQLThreadNotUpToDate] ensures the failover should fail in case of lag prevails as per the conditions, providing maximum consistency.

 

mysql performance tuning

Subscribe
Notify of
guest

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
woosang

Thanks for the great article.
We are going to use Orchestrator.
And Orchestrator supports semi-sync.
To use semi-sync replication, are the ‘LockedSemiSyncMaster, DetectSemiSyncEnforcedQuery, MasterWithTooManySemiSyncReplicas, RecoverLockedSemiSyncMaster, EnforceExactSemiSyncReplicas’ settings in orchestrator.conf.json mandatory?
How can I use semi-sync replication?

David

Anil great post!

What is the future of orchestrator now that its own development stopped at 2021.

What is the future of primary-replica topology if there is no active open source tool able to do a proper failover?