Announcement Module
No announcement yet.

replication lag

Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • replication lag

    Any ideas on how to identify the source of a problem of replication delay?

    Here's the scenario:

    We have a master-slave setup with MySQL 5.0.45 on RHEL64. I am experiencing some serious replication lag that seems to occur for undetermined reasons. There's very little, if any, activity on the slave aside from replication itself.

    InnoDB on the master has 8GB of RAM and the machine itself has dual-processors with quad-cores. On the slave, InnoDB has 12GB of RAM and it also has dual-procs with quad-cores. Both have 15K SCSI drives --- they're beefy machines to say the least.

    A Java application adds a large number of records on the master and it gets replicated to the slave from where another Java application reads.

    Replication lag occurs and it only increases unless I stop the application on the Master that adds data. Replication eventually catches up, but very slowly.

    The main table being updated has around 200M records, so it's not small, but the hardware should be very sufficient.

    innodb_flush_log_at_trx_commit has been changed to various settings but none seems to make a difference.

    I installed the mysqltoolkit for the heartbeat monitor application and it reports some oddities. It goes from being 1200 seconds behind to zero instantly when the Java application that's adding data on the master is stopped.

    Here's an example of the output from the mysql-heartbeat application that measures replication lag:

    1234s [ 1237.02s, 1246.21s, 1258.56s
    1235s [ 1236.93s, 1246.15s, 1258.51s
    1233s [ 1236.80s, 1246.08s, 1258.45s
    1234s [ 1236.67s, 1246.02s, 1258.40s
    1235s [ 1236.60s, 1245.96s, 1258.35s
    1236s [ 1236.53s, 1245.90s, 1258.30s
    0s [ 1215.85s, 1241.73s, 1256.87s
    0s [ 1195.22s, 1237.55s, 1255.45s
    0s [ 1174.57s, 1233.38s, 1254.03s
    0s [ 1153.90s, 1229.21s, 1252.60s
    0s [ 1009.38s, 1199.99s, 1242.64s
    0s [ 968.13s, 1191.64s, 1239.79s
    0s [ 947.48s, 1187.47s, 1238.37s
    0s [ 926.87s, 1183.30s, 1236.94s
    0s [ 906.23s, 1179.13s, 1235.52s

    Any thoughts are sincerely appreciated.

  • #2
    What is the CPU utilization on the master and the slave?

    If you use replication and you have multiprocessor/core machines and you have a very high insert/update rate you could have hit a bottleneck in MySQLs replication.

    The problem is that MySQL's replication is only performed within _one_ thread so it can basically only use the performance of one core.
    While on the master, inserts are usually performed through a lot of threads, so it can use the performance of all the core's and hence achieve a higher performance.

    So as long as the insert/update load is higher than what one core on the slave can handle the replication will fall behind.

    Unfortunately there isn't any configuration change you can do to MySQL to solve this problem.
    The only way to circumvent it is to put the replication into your application so that it is spread out over several threads and uses more than one core.