Announcement

Announcement Module
Collapse
No announcement yet.

slave lag on XtraDB Cluster node

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • slave lag on XtraDB Cluster node

    Hi all. I am trying to set up 3 node server. So I started with 2 nodes and set one of them as slave for regular percona server. One node (N1) in west colo second on east AWS (N2). When I bootstrap node 1 (N1) and join it as a slave to the regular percona, everyhing goes fine. When I connect N2 to cluster ans it starts SST, "show slave status' on N1 starts showing lag and it increasing. So 35G DB copying over network for 50 min. After it copied ~2000 sec lag appears and continue slowly increasing ( +1 sec lag in 3 sec). To get rid of this I have to stop mysql on N2 and start it again after N1 catches all data from a master. But even after that lag starts to increasing again. Here are what I have:
    Percona master:
    Percona-Server-shared-compat-5.5.29-rel29.4.401.rhel6.x86_64
    Percona-Server-shared-55-5.5.29-rel29.4.401.rhel6.x86_64
    Percona-Server-server-55-5.5.29-rel29.4.401.rhel6.x86_64
    percona-xtrabackup-2.0.5-499.rhel6.x86_64
    Percona-Server-client-55-5.5.29-rel29.4.401.rhel6.x86_64
    N1 and N2:
    percona-release-0.0-1.x86_64
    Percona-XtraDB-Cluster-client-55-5.5.34-25.9.607.rhel5.x86_64
    Percona-XtraDB-Cluster-galera-2-2.8-1.157.rhel5.x86_64
    Percona-XtraDB-Cluster-server-55-5.5.34-25.9.607.rhel5.x86_64
    Percona-Server-shared-compat-5.5.24-rel26.0.256.rhel5.x86_64
    Percona-XtraDB-Cluster-shared-55-5.5.34-25.9.607.rhel5.x86_64
    percona-xtrabackup-2.1.6-702.rhel5.x86_64

    my.cnf on N1 and N2:
    # xtradb cluster settings
    binlog_format = ROW
    wsrep_cluster_name = mycluster
    wsrep_cluster_address = gcomm://N1,N2
    wsrep_node_address = N1
    wsrep_provider = /usr/lib64/libgalera_smm.so
    #wsrep_sst_method = xtrabackup
    wsrep_sst_method = xtrabackup
    wsrep_sst_auth = sst:secret
    innodb_locks_unsafe_for_binlog = 1
    innodb_autoinc_lock_mode = 2
    wsrep_provider_options = "gcache.size=5G; gcs.fc_limit=512"
    tmpdir = /var/lib/mysql/tmp/

    Thanks for helping. Zaur.

  • #2
    This behavior happens if joiner located in other collocation than donor. If 2 nodes are located in the same colocation they perform well. Is there any variable I may play with to troubleshoot if its related with network latency?

    Comment


    • #3
      moderators, plz change topic name to "slave lag on xtradb node"

      Comment


      • #4
        So it seems there is some bottleneck when Galera replication comes into play. And indeed it looks like network limitation. But there are two things that can matter here - latency or throughput limitation.
        For example, if you have many small transactions - the commit latency increased because of the Galera replication - certification must be done - it may be too high to keep up with transaction rates on original master server.
        Or, if you have large transactions - as modified ROWs are transferred via the network link - maximum network throughput may be saturated.
        Check the actual west-east link capacity and compare to the size of binary logs (ROW format) produced by master.

        Comment

        Working...
        X