seconds behind master

  • Filter
  • Time
  • Show
Clear All
new posts

  • seconds behind master


    I'm running Percona MySQL 5.1.54 and my systems are configured in master / master with slaves coming off each.

    I have two different databases setup like this across 8 or so EC2 Instances. The EC2 instances are m2.xlarge. The data dir is on a single EBS 400GB disk formatted XFS with defaults and noatime.

    My problem is, after getting them in master master yesterday when doing a 'SHOW SLAVE STATUS\G' the Seconds_Behind_Master value fluctuates between a big value and 0.

    The big value seems to be increasing. Yesterday it would fluctuate from 0 to 500. Today it is now going between 0 and 60k. This is happening on all of the instances, including slaves.

    SELECT curtime(); is sync'd across all instances.

    Does anyone have any ideas? I see nothing in the error log.

    I searched and found an article saying to use mk-heartbeat instead of looking at seconds_behind_master, but couldn't find much documentation on mk-heartbeat or an example of an implementation of it.

    Seconds behind master value just doesn't make any sense to me, so I'm assuming it's incorrect, but I worry that it may be a symptom of something else..

  • #2
    If you have a master/master chain of replications the seconds_behind_master can fluctuate due to that it calculates this value as the difference between the last record executed by the SQL_THREAD and the first execution time of the last received entry by the IO_THREAD and since you are running master/master in a chain, this entry could have been replicated anywhere from one to seven times which would be noticeable on slow connections.

    On top of this you also have the time difference calculation that is only calculated once and then assumed to be constant.

    Read more about the details under seconds_behind_master on the show slave status page.

    And I don't know if you have found this page about mk-heartbeat:


    • #3
      I know this "behaviour" from a replication over a WAN-Link ( Germany -> US-Westcoast ).
      Do you maybe have bad network latency between different EC2 availability zones ?


      • #4
        We only have the two servers in master/master. We have two slaves coming off each master. It is now fluctuating between 0 and 100k seconds.

        SyncMaster72: Yes our two masters are in different availability zones us-east1a and us-east1d, however is no packet loss and 1ms response time.


        • #5
          You have some statements that have a timestamp a long time ago. This could be transactions that stay open a long time before committing, or maybe they deliberately set their timestamp. The most certain way to debug this is to enable log_slave_updates and log_slave_slow_statements (and set long_query_time=0) on the slave, and compare the resulting logs to find out which statements execute with a long-ago timestamp.