Flow control is not a new term, and we have already heard it a lot of times in Percona XtraDB Cluster/Galera-based environments. In very simple terms, it means the cluster node can’t keep up with the cluster write pace. The write rate is too high, or the nodes are oversaturated. Flow control helps avoid excessive buffering and maintains group members so they can operate at a similar pace/speed.
Under GR, this depends on how many backlog items can accumulate, specifically related to “transactions to certify” and “transactions to apply.” Once those limits are exceeded, the flow-control mechanism limits the throughput of writer members to adjust to the capacity of the slowest members of the group.
The default and only available FC mode is “QUOTA,” which calculates a write quota limit for the group in terms of “no of commits” and “time interval.” This limit is divided among the number of members who attempted to make commits in the previous period. Based on this, the maximum number of commits that client threads can make in the next flow-control period will be decided.
As per the MySQL (Group Replication) official docs:
Group Replication ensures that a transaction only commits after a majority of the members in a group have received it and agreed on the relative order between all transactions that were sent concurrently. This approach works well if the total number of writes to the group does not exceed the write capacity of any member in the group. If it does and some of the members have less write throughput than others, particularly less than the writer members, those members can start lagging behind of the writers.
Having some members lagging behind the group brings some problematic consequences, particularly, the reads on such members may externalize very old data. Depending on why the member is lagging behind, other members in the group may have to save more or less replication context to be able to fulfil potential data transfer requests from the slow member.
There is however a mechanism in the replication protocol to avoid having too much distance, in terms of transactions applied, between fast and slow members.This is known as the flow control mechanism.
Now let’s understand how it works practically under group replication.
1.) So, here we have a three-node group replication setup.
1 2 3 4 5 6 7 8 |
node2 [localhost:23234] {msandbox} (performance_schema) > select * from replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | MEMBER_COMMUNICATION_STACK | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ | group_replication_applier | 00023233-1111-1111-1111-111111111111 | 127.0.0.1 | 23233 | ONLINE | SECONDARY | 8.0.32 | XCom | | group_replication_applier | 00023234-2222-2222-2222-222222222222 | 127.0.0.1 | 23234 | ONLINE | PRIMARY | 8.0.32 | XCom | | group_replication_applier | 00023235-3333-3333-3333-333333333333 | 127.0.0.1 | 23235 | ONLINE | SECONDARY | 8.0.32 | XCom | +---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+----------------------------+ |
2.) Next, we will prepare some data for our test.
1 2 3 4 5 6 7 8 9 10 11 |
shell> sysbench --db-driver=mysql --mysql-user=sbtest_user --mysql_password=Sbtest@2022 --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=23234 --tables=2 --table-size=5000000 --create_secondary=off /usr/share/sysbench/oltp_read_write.lua prepare |
3.) Now, we will set the conditions that trigger the flow control scenarios. Basically, flow control depends on the below parameters. If the transactions exceed any of the [applier/certifier] queue thresholds, the flow will emerge.
Here, for testing purposes, we are setting a very reasonable threshold value. Although the default values for this threshold are pretty much on the higher side [25k].
1 |
SET GLOBAL group_replication_flow_control_applier_threshold = 50; |
OR
1 |
SET GLOBAL group_replication_flow_control_certifier_threshold = 50; |
group_replication_flow_control_applier_threshold – Number of waiting transactions in the applier queue.
group_replication_flow_control_certifier_threshold – Number of waiting transactions in the certifier queue.
4.) Now, it’s time to perform some action.
Session1:
5.) We will run some workload/threads on [127.0.0.1:23234] [Primary].
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
sysbench --db-driver=mysql --mysql-user=sbtest_user --mysql_password=Sbtest@2022 --mysql-db=sbtest --mysql-host=127.0.0.1 --mysql-port=23234 --tables=2 --table-size=5000000 --create_secondary=off --threads=100 --time=0 --events=0 --report-interval=1 --rate=40 /usr/share/sysbench/oltp_read_write.lua run |
Output:
1 2 3 4 5 6 7 8 9 |
[ 85s ] queue length: 0, concurrency: 1 [ 86s ] thds: 100 tps: 45.00 qps: 899.99 (r/w/o: 629.99/180.00/90.00) lat (ms,95%): 27.66 err/s: 0.00 reconn/s: 0.00 [ 86s ] queue length: 0, concurrency: 1 [ 87s ] thds: 100 tps: 34.90 qps: 702.09 (r/w/o: 490.66/139.62/71.80) lat (ms,95%): 134.90 err/s: 0.00 reconn/s: 0.00 [ 87s ] queue length: 0, concurrency: 2 [ 88s ] thds: 100 tps: 34.02 qps: 676.43 (r/w/o: 474.30/136.09/66.04) lat (ms,95%): 21.50 err/s: 0.00 reconn/s: 0.00 [ 88s ] queue length: 0, concurrency: 0 [ 89s ] thds: 100 tps: 39.08 qps: 785.67 (r/w/o: 550.17/156.33/79.17) lat (ms,95%): 59.99 err/s: 0.00 reconn/s: 0.00 [ 89s ] queue length: 0, concurrency: 1 |
Session2:
6.) We will monitor the status of the remote transactions and verify if flow control is activated or not.
1 |
node2 [localhost:23234] {msandbox} ((none)) > SELECT MEMBER_ID,COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE FROM performance_schema.replication_group_member_stats; |
Output:
1 2 3 4 5 6 7 8 |
+--------------------------------------+--------------------------------------------+ | MEMBER_ID | COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE | +--------------------------------------+--------------------------------------------+ | 00023233-1111-1111-1111-111111111111 | 0 | | 00023234-2222-2222-2222-222222222222 | 0 | | 00023235-3333-3333-3333-333333333333 | 0 | +--------------------------------------+--------------------------------------------+ 3 rows in set (0.00 sec) |
OR
1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT t2.MEMBER_HOST AS MEMBER_HOST, t2.MEMBER_ID AS MEMBER_ID, t2.MEMBER_PORT AS MEMBER_PORT, t1.COUNT_TRANSACTIONS_IN_QUEUE AS TRX_LOCAL_Q, t1.COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE AS TRX_APPLY_Q FROM performance_schema.replication_group_member_stats t1 JOIN performance_schema.replication_group_members t2 ON t2.MEMBER_ID = t1.MEMBER_ID; |
Output:
1 2 3 4 5 6 7 |
+-------------+--------------------------------------+-------------+-------------+-------------+ | MEMBER_HOST | MEMBER_ID | MEMBER_PORT | TRX_LOCAL_Q | TRX_APPLY_Q | +-------------+--------------------------------------+-------------+-------------+-------------+ | 127.0.0.1 | 00023233-1111-1111-1111-111111111111 | 23233 | 2 | 0 | | 127.0.0.1 | 00023234-2222-2222-2222-222222222222 | 23234 | 0 | 0 | | 127.0.0.1 | 00023235-3333-3333-3333-333333333333 | 23235 | 0 | 1 | +-------------+--------------------------------------+-------------+-------------+-------------+ |
7.) Flow control seems to not be activated yet. This is mainly because we didn’t hit the limit. The values in [COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE/TRX_APPLY_Q] denote the transactions received from other members but not applied yet.
1 2 3 4 5 6 7 |
node2 [localhost:23234] {msandbox} ((none)) > show global status like '%group_replication_flow_control_active%'; +---------------------------------------+----------+ | Variable_name | Value | +---------------------------------------+----------+ | group_replication_flow_control_active | DISABLED | +---------------------------------------+----------+ 1 row in set (0.03 sec) |
8.) Now, let’s do some tricks and trigger the FC by enabling an FTWRL thing. This will pile up the threads and increase the applier queue once we unblock the workload due to a long backlog of queries/transactions.
1 |
node2 [localhost:23234] {msandbox} ((none)) > flush tables with read lock; |
Output:
1 2 3 4 5 6 7 8 |
499196 | 4225 | 0 | 0 | | 687 | sbtest_user | localhost:49594 | sbtest | Execute | 6 | Waiting for global read lock | UPDATE sbtest1 SET k=k+1 WHERE id=2515548 | 6356 | 0 | 0 | | 688 | sbtest_user | localhost:49596 | sbtest | Execute | 4 | Waiting for global read lock | UPDATE sbtest2 SET k=k+1 WHERE id=2518222 | 4143 | 0 | 0 | | 689 | sbtest_user | localhost:49598 | sbtest | Execute | 7 | Waiting for global read lock | UPDATE sbtest1 SET k=k+1 WHERE id=2507657 | 6516 | 0 | 0 | | 690 | sbtest_user | localhost:49602 | sbtest | Execute | 5 | Waiting for global read lock | UPDATE sbtest1 SET k=k+1 WHERE id=2497311 | 5070 | 0 | 0 | | 691 | sbtest_user | localhost:49600 | sbtest | Execute | 5 | Waiting for global read lock | UPDATE sbtest1 SET k=k+1 WHERE id=2498427 | 4914 | 0 | 0 | | 692 | sbtest_user | localhost:49604 | sbtest | Execute | 7 | Waiting for global read lock | UPDATE sbtest2 SET k=k+1 WHERE id=1572819 | 6509 | 0 | 0 | | 693 | sbtest_user | localhost:49606 | sbtest | Execute | 4 | Waiting for global read lock | UPDATE sbtest2 SET k=k+1 WHERE id=2502922 | 3897 | 0 | 0 | |
9.) After it runs for some time, we unblock the tables.
1 2 |
node2 [localhost:23234] {msandbox} ((none)) > unlock tables; Query OK, 0 rows affected (0.01 sec) |
10.) As soon as we unlock the tables, we see a huge surge in the remote applier queue, as below.
1 2 3 4 5 6 7 8 9 |
node3 [localhost:23235] {msandbox} ((none)) > SELECT MEMBER_ID,COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE FROM performance_schema.replication_group_member_stats; +--------------------------------------+--------------------------------------------+ | MEMBER_ID | COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE | +--------------------------------------+--------------------------------------------+ | 00023233-1111-1111-1111-111111111111 | 800 | | 00023234-2222-2222-2222-222222222222 | 0 | | 00023235-3333-3333-3333-333333333333 | 829 | +--------------------------------------+--------------------------------------------+ 3 rows in set (0.00 sec) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
node3 [localhost:23235] {msandbox} ((none)) > SELECT -> t2.MEMBER_HOST AS MEMBER_HOST, -> t2.MEMBER_ID AS MEMBER_ID, -> t2.MEMBER_PORT AS MEMBER_PORT, -> t1.COUNT_TRANSACTIONS_IN_QUEUE AS TRX_LOCAL_Q, -> t1.COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE AS TRX_APPLY_Q -> FROM -> performance_schema.replication_group_member_stats t1 -> JOIN -> performance_schema.replication_group_members t2 -> ON -> t2.MEMBER_ID = t1.MEMBER_ID; +-------------+--------------------------------------+-------------+-------------+-------------+ | MEMBER_HOST | MEMBER_ID | MEMBER_PORT | TRX_LOCAL_Q | TRX_APPLY_Q | +-------------+--------------------------------------+-------------+-------------+-------------+ | 127.0.0.1 | 00023233-1111-1111-1111-111111111111 | 23233 | 0 | 1059 | | 127.0.0.1 | 00023234-2222-2222-2222-222222222222 | 23234 | 0 | 0 | | 127.0.0.1 | 00023235-3333-3333-3333-333333333333 | 23235 | 0 | 1080 | +-------------+--------------------------------------+-------------+-------------+-------------+ 3 rows in set (0.00 sec) |
11.) Finally, we can see the FC[Flow control] activated.
1 2 3 4 5 6 7 |
node3 [localhost:23235] {msandbox} ((none)) > show global status like '%group_replication_flow_control_active%'; +---------------------------------------+--------+ | Variable_name | Value | +---------------------------------------+--------+ | group_replication_flow_control_active | ACTIVE | +---------------------------------------+--------+ 1 row in set (0.00 sec) |
12.) The flow control emissions/iterations depend on group_replication_flow_control_period, which decides how many seconds to wait to send/manage flow control-related events.
As soon as the applier queue becomes normal and under the threshold the FC deactivates.
1 2 3 4 5 6 7 8 9 |
node3 [localhost:23235] {msandbox} ((none)) > SELECT MEMBER_ID,COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE FROM performance_schema.replication_group_member_stats; +--------------------------------------+--------------------------------------------+ | MEMBER_ID | COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE | +--------------------------------------+--------------------------------------------+ | 00023233-1111-1111-1111-111111111111 | 0 | | 00023234-2222-2222-2222-222222222222 | 0 | | 00023235-3333-3333-3333-333333333333 | 0 | +--------------------------------------+--------------------------------------------+ 3 rows in set (0.00 sec) |
1 2 3 4 5 6 7 |
node3 [localhost:23235] {msandbox} ((none)) > show global status like '%group_replication_flow_control_active%'; +---------------------------------------+----------+ | Variable_name | Value | +---------------------------------------+----------+ | group_replication_flow_control_active | DISABLED | +---------------------------------------+----------+ 1 row in set (0.00 sec) |
13.) If we want to avoid this situation and let the node suffer from huge lag or applier waiting, we can also disable the FC on that node.
1 |
mysql> SET GLOBAL group_replication_flow_control_mode=DISABLED; |
There are some other FC [Flow Control]- related options to control the boundaries on the quota and time period to facilitate the FC messages. Those settings help control/manage the flow control events in the cluster.
I’ll highlight a few here. We will try to discuss these and some others in more detail in the next series of blog posts.
- Group_replication_flow_control_hold_percent – It defines the % of the group quota that remains unoccupied in order to allow a cluster under flow control to catch up on the backlog.
- Group_replication_flow_control_member_quota_percent – It defines the % of the quota that a member has when calculating the quotas.
- Group_replication_flow_control_period – It resembles seconds to wait between flow control iterations/communication messages.
- Group_replication_flow_control_release_percent – It defines how the group quota should be released when flow control no longer impacts/blocking the writes.
- Group_replication_flow_control_min_quota – It denotes lowest flow control quota that can be assigned to a group member
Conclusion
Basically, Flow control is more of a natural scenario to avoid too much lagging on other nodes of the cluster. However, this will also affect the cluster writes, and the nodes’ performance will slow down. But, we can still tweak some variables[applier/certifier] threshold to limit flow control emissions if acceptable OR can also disable the FC, especially in some cases.
Stay tuned for the next series of this blog post where we will try to cover some more advanced controlling options of the flow control mechanism.
MySQL Performance Tuning is an essential guide covering the critical aspects of MySQL performance optimization.
Unlock the full potential of your MySQL database today!