Flow control is not a new term, and we have already heard it a lot of times in Percona XtraDB Cluster/Galera-based environments.  In very simple terms, it means the cluster node can’t keep up with the cluster write pace. The write rate is too high, or the nodes are oversaturated. Flow control helps avoid excessive buffering and maintains group members so they can operate at a similar pace/speed.

Under GR, this depends on how many backlog items can accumulate, specifically related to “transactions to certify” and “transactions to apply.” Once those limits are exceeded, the flow-control mechanism limits the throughput of writer members to adjust to the capacity of the slowest members of the group. 

The default and only available FC mode is “QUOTA,” which calculates a write quota limit for the group in terms ofno of commits” and “time interval.” This limit is divided among the number of members who attempted to make commits in the previous period. Based on this, the maximum number of commits that client threads can make in the next flow-control period will be decided. 

As per the MySQL (Group Replication) official docs: 

Group Replication ensures that a transaction only commits after a majority of the members in a group have received it and agreed on the relative order between all transactions that were sent concurrently. This approach works well if the total number of writes to the group does not exceed the write capacity of any member in the group. If it does and some of the members have less write throughput than others, particularly less than the writer members, those members can start lagging behind of the writers.

Having some members lagging behind the group brings some problematic consequences, particularly, the reads on such members may externalize very old data. Depending on why the member is lagging behind, other members in the group may have to save more or less replication context to be able to fulfil potential data transfer requests from the slow member.

There is however a mechanism in the replication protocol to avoid having too much distance, in terms of transactions applied, between fast and slow members.This is known as the flow control mechanism.

Now let’s understand how it works practically under group replication.

1.) So, here we have a three-node group replication setup.

 

2.) Next, we will prepare some data for our test.

 

3.) Now, we will set the conditions that trigger the flow control scenarios. Basically, flow control depends on the below parameters. If the transactions exceed any of the [applier/certifier] queue thresholds, the flow will emerge.

Here, for testing purposes, we are setting a very reasonable threshold value. Although the default values for this threshold are pretty much on the higher side [25k].

OR

 

group_replication_flow_control_applier_threshold – Number of waiting transactions in the applier queue.

group_replication_flow_control_certifier_threshold – Number of waiting transactions in the certifier queue.

 

4.) Now, it’s time to perform some action.

Session1: 

5.) We will run some workload/threads on [127.0.0.1:23234] [Primary].

Output:

Session2:

6.) We will monitor the status of the remote transactions and verify if flow control is activated or not.

Output:

OR

Output:

 

7.) Flow control seems to not be activated yet. This is mainly because we didn’t hit the limit.  The values in [COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE/TRX_APPLY_Q] denote the transactions received from other members but not applied yet.

 

8.) Now, let’s do some tricks and trigger the FC by enabling an FTWRL thing. This will pile up the threads and increase the applier queue once we unblock the workload due to a long backlog of queries/transactions.

 

Output:

9.) After it runs for some time, we unblock the tables.

 

10.) As soon as we unlock the tables, we see a huge surge in the remote applier queue, as below.

 

 

11.) Finally, we can see the FC[Flow control] activated.

 

12.) The flow control emissions/iterations depend on group_replication_flow_control_period, which decides how many seconds to wait to send/manage flow control-related events.

As soon as the applier queue becomes normal and under the threshold the FC deactivates.

 

 

13.) If we want to avoid this situation and let the node suffer from huge lag or applier waiting, we can also disable the FC on that node.

 

There are some other FC [Flow Control]- related options to control the boundaries on the quota and time period to facilitate the FC messages. Those settings help control/manage the flow control events in the cluster. 

I’ll highlight a few here. We will try to discuss these and some others in more detail in the next series of blog posts. 

 

Conclusion

Basically, Flow control is more of a natural scenario to avoid too much lagging on other nodes of the cluster. However, this will also affect the cluster writes, and the nodes’ performance will slow down. But, we can still tweak some variables[applier/certifier] threshold to limit flow control emissions if acceptable OR can also disable the FC, especially in some cases.

Stay tuned for the next series of this blog post where we will try to cover some more advanced controlling options of the flow control mechanism.

MySQL Performance Tuning is an essential guide covering the critical aspects of MySQL performance optimization.

 

Unlock the full potential of your MySQL database today!

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments