EmergencyEMERGENCY? Get 24/7 Help Now!

BEWARE: Increasing fc_limit can affect SELECT latency

 | April 22, 2017 |  Posted In: MySQL, Percona XtraDB Cluster, XtraDB Cluster

PREVIOUS POST
NEXT POST

SELECT LatencyIn this blog post, we’ll look at how increasing the fc_limit can affect SELECT latency.

Introduction

Recent Percona XtraDB Cluster optimizations have exposed fc_limit contention. It was always there, but was never exposed as the Commit Monitor contention was more significant. As it happens with any optimization, once we solve the bigger contention issues, smaller contention issues start popping up. We have seen this pattern in InnoDB, and Percona XtraDB Cluster is no exception. In fact, it is good because it tells us that we are on the right track.

If you haven’t yet checked the performance blogs, then please visit here and here.

What is FC_LIMIT?

Percona XtraDB Cluster has the concept of Flow Control. If any member of the cluster (not garbd) is unable to match the apply speed with the replicated write-set speed, then the queue builds up. If this queue crosses some threshold (dictated by gcs.fc_limit), then flow control kicks in. Flow control causes members of the cluster to temporary halt/slow-down so that the slower node can catch up.

The user can, of course, disable this by setting wsrep_desync=1 on the slower node, but make sure you understand the effect of doing so. Unless you have a good reason, you should avoid setting it.

Increasing fc_limit

Until recently, the default fc_limit was 16 (starting with Percona XtraDB Cluster 5.7.17-29.20, the default is 100). This worked until now, since Percona XtraDB Cluster failed to scale and rarely hit the limit of 16. With new optimizations, Percona XtraDB Cluster nodes can process more write-sets in a given time period, and thereby can replicate more write-sets (anywhere in the range of three to ten times). Of course, the replicating/slave nodes are also performing at a higher speed. But depending on the slave threads, it is easy to start hitting this limit.

So what is the solution?

  • Increase fc_limit from 16 to something really big. Say 1600.

Is this correct?

YES and NO.

Why YES?

  • If you don’t care about the freshness of data on the replicated nodes, then increasing the limit to a higher value is not an issue. Say setting it to 10K means that the replicating node is holding 10K write-sets to replicate, and a SELECT fired during this time will not view changes from these 10K write-sets.
  • But if you insist on having fresh data, then Percona XtraDB Cluster has a solution for this (set wsrep_sync_wait=7).
  • Setting wsrep_sync_wait places the SELECT request in a queue that is serviced only after existing replicated write-sets (at the point when the SELECT was fired) are done with. If the queue has 8K write-sets, then SELECT is placed at the 8K+1 position. As the queue progresses, SELECT gets serviced only when all those 8K write-sets are done. This insanely increases SELECT latency and can cause all Monitoring ALARM to go ON.

Why NO?

  • For the reason mentioned above, we feel it is not a good idea to increase the fc_limit beyond some value unless you don’t care about data freshness and in turn don’t care to set wsrep_sync_wait.
  • We did a small experiment with the latest Percona XtraDB Cluster release to understand the effects.

Note: wsrep_sync_wait=7 will enforce the check for all DMLs (INSERT/UPDATE/DELETE/SELECT). We highlighted the SELECT example, as that is more concerning at first go. But latency for other DMLs also increases for the same reasons as mentioned above.

Conclusion

Let’s conclude with the following observation:

  • Avoid increasing fc_limit to an insanely high value as it can affect SELECT latency (if you are running a SELECT session with wsrep_sync_wait=7 for data freshness).
PREVIOUS POST
NEXT POST
Krunal Bauskar

Krunal is PXC lead at Percona. He is responsible for day-day PXC development, what goes into PXC, bug fixes, releases, etc.. Before joining Percona he use to work as part of InnoDB team at MySQL/Oracle. He authored most of the temporary table revamp work, undo log truncate, atomic truncate and lot of other features. In past he was associated with Yahoo! Labs researching on bigdata problems and database startup which is now part of Teradata. His interest mainly includes data-management at any scale and has been practicing it for more than decade now.

Leave a Reply