Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Contact Us

BEWARE: Increasing fc_limit can affect SELECT latency

April 23, 2017

Author

Krunal Bauskar

MySQL

Percona Software

Share this Post:

In this blog post, we’ll look at how increasing the fc_limit can affect SELECT latency.

Introduction

Recent Percona XtraDB Cluster optimizations have exposed fc_limit contention. It was always there, but was never exposed as the Commit Monitor contention was more significant. As it happens with any optimization, once we solve the bigger contention issues, smaller contention issues start popping up. We have seen this pattern in InnoDB, and Percona XtraDB Cluster is no exception. In fact, it is good because it tells us that we are on the right track.

If you haven’t yet checked the performance blogs, then please visit here and here.

What is FC_LIMIT?

Percona XtraDB Cluster has the concept of Flow Control. If any member of the cluster (not garbd) is unable to match the apply speed with the replicated write-set speed, then the queue builds up. If this queue crosses some threshold (dictated by gcs.fc_limit), then flow control kicks in. Flow control causes members of the cluster to temporary halt/slow-down so that the slower node can catch up.

The user can, of course, disable this by setting wsrep_desync=1 on the slower node, but make sure you understand the effect of doing so. Unless you have a good reason, you should avoid setting it.

mysql> show status like 'wsrep_flow_control_interval';
+-----------------------------+------------+
| Variable_name | Value |
+-----------------------------+------------+
| wsrep_flow_control_interval | [ 16, 16 ] |
+-----------------------------+------------+
1 row in set (0.01 sec)

1

2

3

4

5

6

7

mysql> show status like 'wsrep_flow_control_interval';

+-----------------------------+------------+

| Variable_name | Value |

+-----------------------------+------------+

| wsrep_flow_control_interval | [ 16, 16 ] |

+-----------------------------+------------+

1 row in set (0.01 sec)

Increasing fc_limit

Until recently, the default fc_limit was 16 (starting with Percona XtraDB Cluster 5.7.17-29.20, the default is 100). This worked until now, since Percona XtraDB Cluster failed to scale and rarely hit the limit of 16. With new optimizations, Percona XtraDB Cluster nodes can process more write-sets in a given time period, and thereby can replicate more write-sets (anywhere in the range of three to ten times). Of course, the replicating/slave nodes are also performing at a higher speed. But depending on the slave threads, it is easy to start hitting this limit.

So what is the solution?

- Increase fc_limit from 16 to something really big. Say 1600.

Is this correct?

YES and NO.

Why YES?

- If you don’t care about the freshness of data on the replicated nodes, then increasing the limit to a higher value is not an issue. Say setting it to 10K means that the replicating node is holding 10K write-sets to replicate, and a SELECT fired during this time will not view changes from these 10K write-sets.

- But if you insist on having fresh data, then Percona XtraDB Cluster has a solution for this (set wsrep_sync_wait=7).

- Setting wsrep_sync_wait places the SELECT request in a queue that is serviced only after existing replicated write-sets (at the point when the SELECT was fired) are done with. If the queue has 8K write-sets, then SELECT is placed at the 8K+1 position. As the queue progresses, SELECT gets serviced only when all those 8K write-sets are done. This insanely increases SELECT latency and can cause all Monitoring ALARM to go ON.

Why NO?

- For the reason mentioned above, we feel it is not a good idea to increase the fc_limit beyond some value unless you don’t care about data freshness and in turn don’t care to set wsrep_sync_wait.

- We did a small experiment with the latest Percona XtraDB Cluster release to understand the effects.

- Started 2 node cluster.
- Fired 64-threads workload on node-1 of the cluster.
- node-2 is acting as replicating slave without any active workload.
- Set wsrep_sync_wait=7 on node-2 to ensure data-freshness.

Using default fc_limit (= 16)
-----------------------------

mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.03 sec)


Increasing it from 16 -> 1600
-----------------------------

mysql> set global wsrep_provider_options="gcs.fc_limit=1600";
Query OK, 0 rows affected (0.00 sec)

mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (0.46 sec)
That is whopping 15x increase in SELECT latency.


Increasing it even further (1600 -> 25000)
-------------------------------------------

mysql> set global wsrep_provider_options="gcs.fc_limit=25000";
Query OK, 0 rows affected (0.00 sec)

mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;
+-------------+
| sum(k) |
+-------------+
| 22499552612 |
+-------------+
1 row in set (7.07 sec)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

- Started 2 node cluster.

- Fired 64-threads workload on node-1 of the cluster.

- node-2 is acting as replicating slave without any active workload.

- Set wsrep_sync_wait=7 on node-2 to ensure data-freshness.

Using default fc_limit (= 16)

-----------------------------

mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;

+-------------+

| sum(k) |

+-------------+

| 22499552612 |

+-------------+

1 row in set (0.03 sec)

Increasing it from 16 -> 1600

-----------------------------

mysql> set global wsrep_provider_options="gcs.fc_limit=1600";

Query OK, 0 rows affected (0.00 sec)

mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;

+-------------+

| sum(k) |

+-------------+

| 22499552612 |

+-------------+

1 row in set (0.46 sec)

That is whopping 15x increase in SELECT latency.

Increasing it even further (1600 -> 25000)

-------------------------------------------

mysql> set global wsrep_provider_options="gcs.fc_limit=25000";

Query OK, 0 rows affected (0.00 sec)

mysql> select sum(k) from sbtest1 where id > 5000 and id < 50000;

+-------------+

| sum(k) |

+-------------+

| 22499552612 |

+-------------+

1 row in set (7.07 sec)

Note: wsrep_sync_wait=7 will enforce the check for all DMLs (INSERT/UPDATE/DELETE/SELECT). We highlighted the SELECT example, as that is more concerning at first go. But latency for other DMLs also increases for the same reasons as mentioned above.

Conclusion

Let’s conclude with the following observation:

- Avoid increasing fc_limit to an insanely high value as it can affect SELECT latency (if you are running a SELECT session with wsrep_sync_wait=7 for data freshness).