Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Using pt-heartbeat with ProxySQL

January 2, 2020

Author

David Ducos

David Ducos| Ivan Groenewold

MySQL

ProxySQL

Share this Post:

Using pt-heartbeat with ProxySQL ProxySQL and Orchestrator are usually installed to achieve high availability when using MySQL replication. On a failover (or graceful takeover) scenario, Orchestrator will promote a slave, and ProxySQL will redirect the traffic. Depending on how your environment is configured, and how long the promotion takes, you could end up in a scenario where you need manual intervention.

In this post, we are going to talk about some considerations when working with ProxySQL in combination with pt-heartbeat (part of Percona Toolkit), with the goal of making your environment more reliable.

Why Would We Want pt-heartbeat With ProxySQL?

If you have intermediate masters, the seconds_behind_master metric is not good enough. Slave servers that are attached to intermediate masters will report the seconds_behind_master relative to their own master, not the “real” top-level server receiving the writes. So it is possible ProxySQL will send traffic to 2nd level slaves that are showing no latency respective to “their” master, but still have stale data. This happens when the intermediate master is lagging behind.

Another reason is the show slave status metric resolution is 1 second. Deploying pt-heartbeat will get us the real latency value in milliseconds, across the entire topology. Unfortunately, ProxySQL rounds the value to seconds, so we cannot fully take advantage of this at the time of this writing.

How Do I Deploy pt-heartbeat for a ProxySQL Environment?

ProxySQL since version 1.4.4 has built-in support to use pt-heartbeat. We only need to specify the heartbeat table as follows:

SET mysql-monitor_replication_lag_use_percona_heartbeat = 'percona.heartbeat';
LOAD MYSQL VARIABLES TO RUNTIME;
SAVE MYSQL VARIABLES TO DISK;

SET mysql-monitor_replication_lag_use_percona_heartbeat = 'percona.heartbeat';

LOAD MYSQL VARIABLES TO RUNTIME;

SAVE MYSQL VARIABLES TO DISK;

Now we need to decide how to deploy pt-heartbeat to be able to update the heartbeat table on the master. The easiest solution is to install pt-heartbeat on the ProxySQL server itself.

We need to create a file to store pt-heartbeat configuration, e.g /etc/percona-toolkit/pt-heartbeat-prod.conf:

utc
replace
daemonize
pid=/var/run/pt-heartbeat-prod.pid
database=percona
table=heartbeat
interval=0.01
port=3306
user=monitor
password=******
host=127.0.0.1

utc

replace

daemonize

pid=/var/run/pt-heartbeat-prod.pid

database=percona

table=heartbeat

interval=0.01

port=3306

user=monitor

password=******

host=127.0.0.1

We point pt-heartbeat to go through ProxySQL and route its traffic to the writer hostgroup. In order to do this, we need a query rule:

INSERT INTO mysql_query_rules (rule_id, active, username, destination_hostgroup, apply) VALUES (1, 1, "monitor", 10, 1)
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

INSERT INTO mysql_query_rules (rule_id, active, username, destination_hostgroup, apply) VALUES (1, 1, "monitor", 10, 1)

LOAD MYSQL QUERY RULES TO RUNTIME;

SAVE MYSQL QUERY RULES TO DISK;

In the example above, the writer hostgroup is 10 and the user that pt-heartbeat connects as is monitor;

Next, we need to create a systemd service that will be in charge of making sure pt-heartbeat is always running. The caveat here is that when the master is set to read-only for any reason (e.g. a master switch), pt-heartbeat will stop (by design) and return an error code !=0. So, we need to tell systemd to catch this and restart the daemon. The way to accomplish this is by using the Restart=on-failure functionality.

Here’s a sample systemd unit script:

[Unit]
Description="pt-heartbeat"
After=syslog.target mysql.service
#Requires=mysql.service

[Install]
WantedBy=multi-user.target

[Service]
Type=simple
PIDFile=/var/run/pt-heartbeat-prod.pid
ExecStart=/usr/bin/pt-heartbeat "--config=/etc/percona-toolkit/pt-heartbeat-prod.conf" "--update"
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=pt-heartbeat
Restart=on-failure
RestartSec=5s

[Unit]

Description="pt-heartbeat"

After=syslog.target mysql.service

#Requires=mysql.service

[Install]

WantedBy=multi-user.target

[Service]

Type=simple

PIDFile=/var/run/pt-heartbeat-prod.pid

ExecStart=/usr/bin/pt-heartbeat "--config=/etc/percona-toolkit/pt-heartbeat-prod.conf" "--update"

StandardOutput=syslog

StandardError=syslog

SyslogIdentifier=pt-heartbeat

Restart=on-failure

RestartSec=5s

We also specify RestartSec=5s, as the default of 100 ms sleep before restarting a service is overkill for this use case.

After changes are done, we need to daemon-reload to update SystemV:

systemctl daemon-reload

1	systemctl daemon-reload

At this point we can start and stop the service using:

service pt-heartbeat start
service pt-heartbeat stop

1 2	service pt-heartbeat start service pt-heartbeat stop

Dealing With Master Takeover/Failover

Let’s assume slave servers are configured with max_replication_lag of 5 seconds. Usually, the failover process can take a few seconds, where pt-heartbeat might stop updating the heartbeat table. This means that the slave that is picked as the new master might report replication lag (as per pt-heartbeat), even if it was not really behind the master.

Now, we found out ProxySQL does not automatically clear the max_replication_lag setting for a server when it becomes a master. When configured to use pt-heartbeat, it will (incorrectly) flag the new master as lagging, and shun it, causing writes to this cluster to be rejected!. This behavior happens ONLY when using pt-heartbeat. Using the show slave status method to detect latency, everything works as expected. You can check the bug report for more information.

For the time being, one way to deal with the above scenario is to write a script that monitors the mysql_servers table, and if it finds a writer node that has max_replication_lag configured, clear that value so it won’t be shunned.

Here’s some sample code:

#!/bin/bash

READER_HOSTGROUP=11
WRITER_HOSTGROUP=10
OTHER_WHERE_CLAUSE=" "
REPLICATION_LAG=5
MYSQL="mysql --socket=/tmp/proxysql_admin.sock -uadmin -p*****"
if (( $(echo "SELECT hostname FROM mysql_servers WHERE max_replication_lag !=0 AND hostgroup_id= ${WRITER_HOSTGROUP} " | $MYSQL | wc -l) > 0 ))
then
  echo "UPDATE mysql_servers SET max_replication_lag=0 WHERE hostgroup_id=${WRITER_HOSTGROUP} ;
        UPDATE mysql_servers SET max_replication_lag=${REPLICATION_LAG} WHERE hostgroup_id=${READER_HOSTGROUP} ${OTHER_WHERE_CLAUSE};
        LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;" | $MYSQL
fi

#!/bin/bash

READER_HOSTGROUP=11

WRITER_HOSTGROUP=10

OTHER_WHERE_CLAUSE=" "

REPLICATION_LAG=5

MYSQL="mysql --socket=/tmp/proxysql_admin.sock -uadmin -p*****"

if (( $(echo "SELECT hostname FROM mysql_servers WHERE max_replication_lag !=0 AND hostgroup_id= ${WRITER_HOSTGROUP} " | $MYSQL | wc -l) > 0 ))

then

echo "UPDATE mysql_servers SET max_replication_lag=0 WHERE hostgroup_id=${WRITER_HOSTGROUP} ;

UPDATE mysql_servers SET max_replication_lag=${REPLICATION_LAG} WHERE hostgroup_id=${READER_HOSTGROUP} ${OTHER_WHERE_CLAUSE};

LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;" | $MYSQL

We can use ProxySQL’s scheduler to run our script every 5 seconds like this:

INSERT INTO scheduler (id, active, interval_ms, filename) VALUES (1, 1, 5000, "/usr/local/bin/fix_replication_lag.sh");

1	INSERT INTO scheduler (id, active, interval_ms, filename) VALUES (1, 1, 5000, "/usr/local/bin/fix_replication_lag.sh");

Conclusion

We have seen how pt-heartbeat is useful to monitor the real latency for an environment using ProxySQL. We also discussed how to deal with the edge case of a new master being shunned because of latency, due to how ProxySQL and pt-heartbeat work.