Three ways to know when a MySQL slave is about to start laggingBaron Schwartz
The trouble with slave lag is that you often can’t see it coming. Especially if the slave’s load is pretty uniform, a slave that’s at 90% of its capacity to keep up with the master can be indistinguishable from one that’s at 5% of its capacity.
So how can you tell when your slave is nearing its capacity to keep up with the master? Here are three ways:
One: watch for spikes of lag. If you have Cacti (and these Cacti templates for MySQL) you can see this in the graphs. If the graphs start to get a little bumpy, you can assume that the iceberg is floating higher and higher in the water, so to speak. (Hopefully that’s not too strange a metaphor.) As the slave’s routine work gets closer and closer to its capacity, you’ll see these spikes get bigger and “wider”. The front-side of the spike will always be less than a 45-degree angle in ordinary operation but the back-side, when the slave is catching up after lagging behind, will become a gentler and gentler slope.
Two: deliberately make a slave fall behind, then see how fast it can catch up. This is sort of related to Method One. The goal here is to explicitly see how steep the backside of that slope is. If you stop a slave for an hour, then start it again and it catches up in one hour, it is running at 1/2 of its capacity. (In case that’s confusing: if you stop it at noon and restart it at 1:00, and it’s caught up again at 2:00, it played all statements from 12:00 to 2:00 in 1 hour, so it went at 2x speed.)
Three: measure it more scientifically. Use our patched server, which gives you a USER_STATISTICS table.
mysql> SELECT * FROM INFORMATION_SCHEMA.USER_STATISTICS WHERE USER='#mysql_system#'\G
*************************** 1. row ***************************
You can compare the BUSY_TIME to one-half the CONNECTED_TIME (because there are two replication threads on the slave) to see how much of the time the slave thread was actively processing statements. If the slave threads are always running, you can just use the server’s uptime instead.
 There are cases where this isn’t true, especially if you’re monitoring Seconds_behind_master instead of using mk-heartbeat, which is immune to this anomaly.