EmergencyEMERGENCY? Get 24/7 Help Now!

Making HAProxy 1.5 replication lag aware in MySQL

 | December 18, 2014 |  Posted In: Insight for DBAs, MySQL, Percona Toolkit, Percona XtraDB Cluster

PREVIOUS POST
NEXT POST

HAProxy is frequently used as a software load balancer in the MySQL world. Peter Boros, in a past post, explained how to set it up with Percona XtraDB Cluster (PXC) so that it only sends queries to available nodes. The same approach can be used in a regular master-slaves setup to spread the read load across multiple slaves. However with MySQL replication, another factor comes into play: replication lag. In this case the approach mentioned for Percona XtraDB Cluster does not work that well as the check we presented only returns ‘up’ or ‘down’. We would like to be able to tune the weight of a replica inside HAProxy depending on its replication lag. This is what we will do in this post using HAProxy 1.5.

Agent checks in HAProxy

Making HAProxy 1.5 replication lag aware in MySQLHAProxy 1.5 allows us to run an agent check, which is a check that can be added to a regular health check. The benefit of agent checks is that the return value can be ‘up’ or ‘down’, but also a weight.

What is an agent? It is simply a program that can be accessed from a TCP connection on a given port. So if we want to run an agent on a MySQL server that will:

  • Mark the server as down in HAProxy if replication is not working
  • Set the weight to 100% if the replication lag is < 10s
  • Set the weight to 50% if the replication lag is >= 10s and < 60s
  • Set the weight to 5% in all other situations

We can use a script like this:

If you want the script to be accessible from port 6789 and connect to a MySQL instance running on port 3306, run:

You will also need a dedicated MySQL user:

When the agent is started, you can check that it is working properly:

Assuming it is run locally on the app server, that 2 replicas are available (192.168.10.2 and 192.168.10.3) and that the application will send all reads on port 3307, you will define a frontend and a backend in your HAProxy configuration like this:

Demo

Now that everything is set up, let’s see how HAProxy can dynamically change the weight of the servers depending on the replication lag.

No lag

Slave1 lagging

Slave2 down

Conclusion

Agent checks are a nice addition in HAProxy 1.5. The setup presented above is a bit simplistic though: for instance, if HAProxy fails to connect to the agent, it will not mark the corresponding as down. It is then recommended to keep a regular health check along with the agent check.

Astute readers will also notice that in this configuration, if replication is broken on all nodes, HAProxy will stop sending reads. This may not be the best solution. Possible options are: stop the agent and mark the servers as UP using the stats socket or add the master as a backup server.

And as a final note, you can edit the code of the agent so that replication lag is measured with Percona Toolkit’s pt-heartbeat instead of Seconds_Behind_Master.

PREVIOUS POST
NEXT POST
Stephane Combaudon

Stéphane joined Percona in July 2012, after working as a MySQL DBA for leading French companies such as Dailymotion and France Telecom.

In real life, he lives in Paris with his wife and their twin daughters. When not in front of a computer or not spending time with his family, he likes playing chess and hiking.

11 Comments

  • Hi Stephane,

    thanks for sharing 🙂 I’ve corrected 2 typos and the localhost binding (if you bind the agent socket on 127.0.0.1, the check only works if HAProxy is running on the node itself).

    And a second change for the Debian/Ubuntu Users who do not need the User/Password setting. Here is the gist: https://gist.github.com/jmara/8035c07a86ff111465d9/revisions

  • Hi, thank for this script, i have just a notice :

    PHP Notice: Undefined offset: 0 in /usr/bin/haproxy_checkgalera on line 33

  • Hi Stéphane,

    I just discovered your post now, it’s very instructive for those who want to learn more about the possibilities of the agent check, so thanks for sharing this. I’ve added a link from the haproxy home page.

  • The script agent.php has few problem, I’ve corrected it as

    # Script Name: agent.php
    <?php
    // Simple socket server
    // See http://php.net/manual/en/function.stream-socket-server.php
    $port = $argv[1];
    $mysql_port = $argv[2];
    $mysql = "/usr/bin/mysql";
    $user = 'haproxy';
    $password = 'haproxy_pwd';
    $query = "SHOW SLAVE STATUS";
    function set_weight($lag){
    # Write your own rules here
    if ($lag == 'NULL'){
    return "down";
    }
    else if ($lag = 10 && $lag

  • Seems my previous posting is truncated due to the limitation of maximum allowed string size. Let me point out the parts of agent.php created by Stephane.

    1. Change
    <!–?php
    to
    <?php

    2. Change
    $cmd = "$mysql -h127.0.0.1 -u$user -p$password -P$mysql_port -Ee "$query" | grep Seconds_Behind_Master | cut -d ':' -f2 | tr -d ' '";
    exec("$cmd",$lag);

    to

    $cmd = "$mysql -h127.0.0.1 -u$user -p$password -P$mysql_port -e $query | grep Seconds_Behind_Master | cut -d ':' -f2 | tr -d ' '";
    exec("$cmd",$lag);

  • What is the recommended way to ensure that the php script is running and the socket is open? It would be unfortunate for the agent script to fail for some reason and falsely reporting the slaves as being down.

  • I am having a heck of a time getting HA proxy to accept this configuration. I keep getting an error of
    server slave1 only supports options ‘backup’, ‘cookie’, ‘redir’, ‘observer’, ‘on-error’, ‘error-limit’, ‘check’, ‘disabled’, ‘track’, ‘id’, ‘inter’, ‘fastinter’, ‘downinter’, ‘rise’, ‘fall’, ‘addr’, ‘port’, ‘source’, ‘minconn’, ‘maxconn’, ‘maxqueue’, ‘slowstart’ and ‘weight’.

    with a line of
    server slave1 xxx.xxx.xxx.xxx weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions

    • well that was just it! it was what and apt went and got on ubuntu 14.04 originally. I remove it and installed 1.6.3 for my tests.
      on another note. I am wanting to debug the agent-check since everything checks out over telnet from the proxy server, but when I stop the slave, nothing is happening on the haproxy as far as changes to status. telnet request states down when I stop replication..
      btw echo “show stat” | socat stdio /run/haproxy/admin.sock | cut -d ‘,’ -f1,2,18,19 returns nothing.
      my config
      global
      log /dev/log local0
      log 127.0.0.1 local1 notice
      maxconn 4096
      user haproxy
      group haproxy
      daemon
      defaults
      log global
      # mode http
      # option httplog
      option dontlognull
      retries 3
      option redispatch
      maxconn 2000
      timeout connect 5000
      timeout client 50000
      timeout server 50000
      listen stats
      bind *:1936
      mode http
      stats enable
      stats hide-version
      stats realm Haproxy\ Statistics
      stats uri /
      stats auth XXX
      stats admin if TRUE

      listen read_only-back
      bind *:3306
      mode tcp
      option tcplog
      log global
      balance leastconn
      server slave1 xxx.xxx.xxx weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions
      server slave2 xxx.xxx.xxx weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions
      server slave3 xxx.xxx.xxx weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions

      Thanks

  • Can’t get this to work. Able to setup the agent, but seems can’t get Haproxy listen to the agent port.
    Tried to shutdown a slave, the backend server still show up.
    Hopefully more detail guidelines will be provided.

Leave a Reply