Where the open source database community meets: Use code PERCONA75 and secure your spot for Percona Live. Register

Downloads

Blog

Three ways to know when a MySQL slave is about to start lagging

October 9, 2008

Author

Baron Schwartz

Insight for DBAs

Share this Post:

The trouble with slave lag is that you often can’t see it coming. Especially if the slave’s load is pretty uniform, a slave that’s at 90% of its capacity to keep up with the master can be indistinguishable from one that’s at 5% of its capacity.

So how can you tell when your slave is nearing its capacity to keep up with the master? Here are three ways:

One: watch for spikes of lag. If you have Cacti (and these Cacti templates for MySQL) you can see this in the graphs. If the graphs start to get a little bumpy, you can assume that the iceberg is floating higher and higher in the water, so to speak. (Hopefully that’s not too strange a metaphor.) As the slave’s routine work gets closer and closer to its capacity, you’ll see these spikes get bigger and “wider”. The front-side of the spike will always be less than a 45-degree angle in ordinary operation[1] but the back-side, when the slave is catching up after lagging behind, will become a gentler and gentler slope.

Two: deliberately make a slave fall behind, then see how fast it can catch up. This is sort of related to Method One. The goal here is to explicitly see how steep the backside of that slope is. If you stop a slave for an hour, then start it again and it catches up in one hour, it is running at 1/2 of its capacity. (In case that’s confusing: if you stop it at noon and restart it at 1:00, and it’s caught up again at 2:00, it played all statements from 12:00 to 2:00 in 1 hour, so it went at 2x speed.)

Three: measure it more scientifically. Use our patched server, which gives you a USER_STATISTICS table.

mysql> SELECT * FROM INFORMATION_SCHEMA.USER_STATISTICS WHERE USER='#mysql_system#'G
*************************** 1. row ***************************
                  USER: #mysql_system#
     TOTAL_CONNECTIONS: 1
CONCURRENT_CONNECTIONS: 2
        CONNECTED_TIME: 46188
             BUSY_TIME: 719
          ROWS_FETCHED: 0
          ROWS_UPDATED: 1882292
       SELECT_COMMANDS: 0
       UPDATE_COMMANDS: 580431
        OTHER_COMMANDS: 338857
   COMMIT_TRANSACTIONS: 1016571
 ROLLBACK_TRANSACTIONS: 0

mysql> SELECT * FROM INFORMATION_SCHEMA.USER_STATISTICS WHERE USER='#mysql_system#'G

*************************** 1. row ***************************

USER: #mysql_system#

TOTAL_CONNECTIONS: 1

CONCURRENT_CONNECTIONS: 2

CONNECTED_TIME: 46188

BUSY_TIME: 719

ROWS_FETCHED: 0

ROWS_UPDATED: 1882292

SELECT_COMMANDS: 0

UPDATE_COMMANDS: 580431

OTHER_COMMANDS: 338857

COMMIT_TRANSACTIONS: 1016571

ROLLBACK_TRANSACTIONS: 0

You can compare the BUSY_TIME to one-half the CONNECTED_TIME (because there are two replication threads on the slave) to see how much of the time the slave thread was actively processing statements. If the slave threads are always running, you can just use the server’s uptime instead.

[1] There are cases where this isn’t true, especially if you’re monitoring Seconds_behind_master instead of using mk-heartbeat, which is immune to this anomaly.

0 0 votes

Article Rating

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Rob Wultsch

17 years ago

Any general comments about max advised load on slaves?

Author

Baron Schwartz

17 years ago

Don’t load your slaves so much that they fall behind.

Plan for the slaves to have a hard time keeping up with the master, even with the same or better hardware.

Plan for slave lag and make your application cope with it.

frederic sidler

17 years ago

Assuming that you have no network problem between your nodes. What would you do next. I mean if your slave begins to lag behind your master

Patrick Domack

17 years ago

I have been using the mk-heartbeat type method for a few years now (using my own version). I found this excentual to actually being able to tell many issues with a slave itself. I have also wrapped the main critical mysql connected around checking the lag, and if it’s excessive, to switch to a different slave. It has worked well.

Author

Baron Schwartz

17 years ago

frederic, you have to take some load off the server. We have a whole section in our book about how to help slaves keep up. It’s in the replication chapter. http://www.amazon.com/gp/product/0596101716?tag=perinc-20

Sean

17 years ago

Baron, any future plans to create cacti graphs to log the *_statistics tables and/or include the #mysql_system# user in the replication templates? 🙂

Admin

Peter Zaitsev

17 years ago

Baron,

I would note “Bumpy” Lag graphs can also start happening in case you have some long executing queries being passed to the slave which is bad thing on itself.

I would recommend avoiding queries longer than 1/3rd of accepted lag in the replication. Ie if you want to be within 60sec you should ensure no queries going through replication take more than 20sec to execute.

Speaking about slave load I’d like to see slave to have at least 5x capacity when it is idle and at least 3x when it is operating during normal load this means you can take some spikes without lagging, though it is very application dependent.

Considering how much it takes to do roll forward recovery from backup is another important thing to keep in mind.