Often I run into problems when trying to use mk-query-digest with tcpdump on “very” busy hosts. You might be thinking, “very busy is a relative and unquantifiable term,” and you’d be right, so I’ll phrase this differently. Let me give a little background to the problem first. Mk-query-digest tries to handle dropped or missing packets gracefully, but even so they can skew results dramatically. Imagine a situation where a single connection sends two queries and gets two responses, with a response time of R for each request, and a wait time of W between the requests. If the first response and second request are dropped by the kernel, the result – from mk-query-digest’s perspective – is that the database took 2R + W time to respond to the first request.
Back to the question of, “what is a very busy host?” In my experience, if you are getting even 5% of tcpdump packets dropped by the kernel, the results can be skewed enough to cause confusion about which queries are really slow. Recently, I got more than 60% dropped packets on a server with roughly 50MB/s of traffic on port 3306, system load of about 10, and 8 CPU cores. The resulting mk-query-digest output was obviously bogus when compared to the host’s slow-query-log (for example, none of the top 5 slow queries reported by mkqd appeared in the actual slow log file). After a little brain-storming, we came up with a few solutions:
#1 has an obvious flaw — if your long-query-time is 1 second, and mkqd believes that a query which actually took 10ms instead took 0.9s, the results are still useless. That is to say, this doesn’t actually solve the real problem of dropped packets, it just applies a mask to the output. #2 seems like the simplest good solution, but when I tested this, I still got very high percentage of dropped packets (around 30% when filtering only 4 out of hundreds of active clients). While this is lower than without the filter, it is still unusable. #3 actually worked very well and resulted in about 0.2% packet loss on this host, which is acceptable — the variances are statistically smoothed out and don’t noticeably affect the results. Here is the tcpdump command used.
tcpdump -i eth0 -s 65535 -x -n -q -tttt 'port 3306 and tcp & 7 == 2 and tcp & 7 == 2'
I also had to forward the tcpdump outputÂ to another host for processing because the database host couldn’t handle the additional IO or CPU pressure of either writing it to a file or piping it to mk-query-digest. Here is a draft of a script to automate this; use at your own risk and update to suit your needs.
# Set some defaults
d=$(date +%F-%T | tr :- _)
REMOTEHOST='some.other.hostname' # CHANGEME
LIMIT=50Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â # limit # of queries in report
SLEEPTIME=1200Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â # duration to capture tcpdump data in seconds (1200 = 20 min)
# check lock file
if [ -e "$TMP/$LOCKFILE" ]; then
echo "$self: lock file $LOCKFILE already exists, aborting"
# set trap to be sure tcpdump doesn't run for ever
# and clean up the temp file too
trapÂ 'rm -f $LOCKFILE; kill $PID; ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile"; exit' INT TERM EXIT
# run the tcpdump & write to remote file and sleep for a bit
tcpdump -i eth0 -s 65535 -x -n -q -tttt 'port 3306 and tcp & 7 == 2 and tcp & 7 == 2' 2>/dev/null \
| ssh $REMOTEHOST -- "cat - > $TMP/$TMPfile" &
# set trap to be sure both remote files are removed
trap 'ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile $TMP/$RESfile"; rm -f $LOCKFILE $RESfile; exit' INT TERM EXIT
# digest the result, copy to localhost, then email it
ssh $REMOTEHOST -- "mk-query-digest --type tcpdump --limit $LIMIT < $TMP/$TMPfile 2>&1 > $TMP/$RESfile"
scp -q $REMOTEHOST:$TMP/$RESfile $RESfile
# email $RESfile using your preferred transport
# clean up remote and local files.
ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile $TMP/$RESfile"
rm -f $RESfile $LOCKFILE
trap - INT TERM EXIT
Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.
Besides specific database help, the blog also provides notices on upcoming events and webinars.
Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below.