Downloads

Blog

How to use tcpdump on "Very" Busy Hosts

April 18, 2011

Author

Devananda van der Veen

Insight for DBAs

MySQL

Share this Post:

Tcpdump Packet Loss on Busy Servers

Often I run into problems when trying to use mk-query-digest with tcpdump on “very” busy hosts. You might be thinking, “very busy is a relative and unquantifiable term,” and you’d be right, so I’ll phrase this differently. Let me give a little background to the problem first. Mk-query-digest tries to handle dropped or missing packets gracefully, but even so they can skew results dramatically. Imagine a situation where a single connection sends two queries and gets two responses, with a response time of R for each request, and a wait time of W between the requests. If the first response and second request are dropped by the kernel, the result – from mk-query-digest’s perspective – is that the database took 2R + W time to respond to the first request.

Possible Solutions to Using tcpdump on Busy Servers

Back to the question of, “what is a very busy host?” In my experience, if you are getting even 5% of tcpdump packets dropped by the kernel, the results can be skewed enough to cause confusion about which queries are really slow. Recently, I got more than 60% dropped packets on a server with roughly 50MB/s of traffic on port 3306, system load of about 10, and 8 CPU cores. The resulting mk-query-digest output was obviously bogus when compared to the host’s slow-query-log (for example, none of the top 5 slow queries reported by mkqd appeared in the actual slow log file). After a little brain-storming, we came up with a few solutions:

1. use “mk-query-digest –filter ‘$event->{time} && $event->{time} > 1′” to exclude all queries which it believes took longer than the servers long-query-time of 1 second

1. tcpdump traffic only from a small and representative subset of clients

1. tcpdump a modulo of incoming ports (including port 3306, because we must capture the responses from mysqld)

#1 has an obvious flaw — if your long-query-time is 1 second, and mkqd believes that a query which actually took 10ms instead took 0.9s, the results are still useless. That is to say, this doesn’t actually solve the real problem of dropped packets, it just applies a mask to the output. #2 seems like the simplest good solution, but when I tested this, I still got very high percentage of dropped packets (around 30% when filtering only 4 out of hundreds of active clients). While this is lower than without the filter, it is still unusable. #3 actually worked very well and resulted in about 0.2% packet loss on this host, which is acceptable — the variances are statistically smoothed out and don’t noticeably affect the results. Here is the tcpdump command used.

tcpdump -i eth0 -s 65535 -x -n -q -tttt 'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2'

1	tcpdump -i eth0 -s 65535 -x -n -q -tttt 'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2'

I also had to forward the tcpdump outputÂ to another host for processing because the database host couldn’t handle the additional IO or CPU pressure of either writing it to a file or piping it to mk-query-digest. Here is a draft of a script to automate this; use at your own risk and update to suit your needs.

#!/bin/bash
# Set some defaults
d=$(date +%F-%T | tr :- _)
self=`basename $0`
REMOTEHOST='some.other.hostname'           # CHANGEME
LIMIT=50Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  # limit # of queries in report
SLEEPTIME=1200Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  # duration to capture tcpdump data in seconds (1200 = 20 min)
TMP='/tmp'
LOCKFILE="$self.lockfile"
TMPfile="$self.temp"
RESfile="$self.result"

# check lock file
if [ -e "$TMP/$LOCKFILE" ]; then
 echo "$self: lock file $LOCKFILE already exists, aborting"
 exit 1
fi

# set trap to be sure tcpdump doesn't run for ever
# and clean up the temp file too
trapÂ  'rm -f $LOCKFILE; kill $PID; ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile"; exit' INT TERM EXIT
touch $TMP/$LOCKFILE

# run the tcpdump & write to remote file and sleep for a bit
tcpdump -i eth0 -s 65535 -x -n -q -tttt 'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2' 2>/dev/null 
 | ssh $REMOTEHOST -- "cat - > $TMP/$TMPfile" &

PID=$!
sleep $SLEEPTIME
kill $PID

# set trap to be sure both remote files are removed
trap 'ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile $TMP/$RESfile"; rm -f $LOCKFILE $RESfile; exit' INT TERM EXIT

# digest the result, copy to localhost, then email it
ssh $REMOTEHOST -- "mk-query-digest --type tcpdump --limit $LIMIT < $TMP/$TMPfile 2>&1 > $TMP/$RESfile"
scp -q $REMOTEHOST:$TMP/$RESfile $RESfile

# email $RESfile using your preferred transport

# clean up remote and local files.
ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile $TMP/$RESfile"
rm -f $RESfile $LOCKFILE

trap - INT TERM EXIT
exit 0

#!/bin/bash

# Set some defaults

d=$(date +%F-%T | tr :- _)

self=`basename $0`

REMOTEHOST='some.other.hostname' # CHANGEME

LIMIT=50Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â # limit # of queries in report

SLEEPTIME=1200Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â # duration to capture tcpdump data in seconds (1200 = 20 min)

TMP='/tmp'

LOCKFILE="$self.lockfile"

TMPfile="$self.temp"

RESfile="$self.result"

# check lock file

if [ -e "$TMP/$LOCKFILE" ]; then

echo "$self: lock file $LOCKFILE already exists, aborting"

exit 1

# set trap to be sure tcpdump doesn't run for ever

# and clean up the temp file too

trapÂ 'rm -f $LOCKFILE; kill $PID; ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile"; exit' INT TERM EXIT

touch $TMP/$LOCKFILE

# run the tcpdump & write to remote file and sleep for a bit

tcpdump -i eth0 -s 65535 -x -n -q -tttt 'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2' 2>/dev/null

| ssh $REMOTEHOST -- "cat - > $TMP/$TMPfile" &

PID=$!

sleep $SLEEPTIME

kill $PID

# set trap to be sure both remote files are removed

trap 'ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile $TMP/$RESfile"; rm -f $LOCKFILE $RESfile; exit' INT TERM EXIT

# digest the result, copy to localhost, then email it

ssh $REMOTEHOST -- "mk-query-digest --type tcpdump --limit $LIMIT < $TMP/$TMPfile 2>&1 > $TMP/$RESfile"

scp -q $REMOTEHOST:$TMP/$RESfile $RESfile

# email $RESfile using your preferred transport

# clean up remote and local files.

ssh $REMOTEHOST -- "rm -f $TMP/$TMPfile $TMP/$RESfile"

rm -f $RESfile $LOCKFILE

trap - INT TERM EXIT

exit 0

0 0 votes

Article Rating

20 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Baron Schwartz

15 years ago

I’m not very fresh on my TCP headers and I don’t remember what’s in various positions. What is the net effect of this expression?

tcp[1] & 7 == 2 and tcp[3] & 7 == 2

Author

Devananda van der Veen

15 years ago

tcp[x] indicates the x’th octet in the TCP header, starting from 0. The first two octets are the source port, second two octets are the destination port. So my understanding is that this expression says: the lower octet of each source and dest port, when AND’ed with the bitmask ‘111’ result in a value of ‘010’, or the number 2. Note that “3306 & 7 == 2” is a true statement.

See http://www.tcpdump.org/tcpdump_man.html for more in-depth info on TCP header structure.

erkules

15 years ago

On this example it has no effect, as port 3306 is more precise. tcp[0,1] sourceport and tcp[2,3] dstport (rfc 793).

erkules

15 years ago

Ahh of course no effect is wrong. At least of the ports got to be 3306 (port=3306) the AND check for that dst/src-port is fulfilled anyway (3306) and the other one got to fullfill &7 == 2.

So one could use instead “dst port 3306” to match mysql server and do some portfiltering on tcp[0,1] I would prefer hosts.

Author

Devananda van der Veen

15 years ago

erkules, “dst port 3306” won’t work because you must also capture the reply from mysqld, where the source port is 3306 and the dest port is unknown (matches & 7 == 2).

In some cases, I’m sure that host filtering would work fine, eg when you want to watch a specific host, but in some situations I’ve faced it does not work well. Honestly, I’m not sure why the packet drops were so high when filtering just a few hosts, perhaps the CPU overhead is higher when applying a host IP filter versus a port bitmask filter?

erkules

15 years ago

Devananda .. right I missed that.

Anthony DeRobertis

15 years ago

Another approach, if you have the hardware: enable port mirroring on your switch, and capture the traffic on an otherwise idle host with enough disk bandwidth to do so.

Baron Schwartz

15 years ago

I just used this trick, worked like a charm.

207815 packets captured
214152 packets received by filter
6324 packets dropped by kernel

Baron Schwartz

15 years ago

I just used this trick, worked like a charm.

207815 packets captured
214152 packets received by filter
6324 packets dropped by kernel

Anthony DeRobertis

15 years ago

Another approach, if you have the hardware: enable port mirroring on your switch, and capture the traffic on an otherwise idle host with enough disk bandwidth to do so.

erkules

15 years ago

Devananda .. right I missed that.

Author

Devananda van der Veen

15 years ago

erkules, “dst port 3306″ won’t work because you must also capture the reply from mysqld, where the source port is 3306 and the dest port is unknown (matches & 7 == 2).

erkules

15 years ago

Ahh of course no effect is wrong. At least of the ports got to be 3306 (port=3306) the AND check for that dst/src-port is fulfilled anyway (3306) and the other one got to fullfill &7 == 2.

So one could use instead “dst port 3306″ to match mysql server and do some portfiltering on tcp[0,1] I would prefer hosts.

erkules

15 years ago

On this example it has no effect, as port 3306 is more precise. tcp[0,1] sourceport and tcp[2,3] dstport (rfc 793).

Author

Devananda van der Veen

15 years ago

tcp[x] indicates the x’th octet in the TCP header, starting from 0. The first two octets are the source port, second two octets are the destination port. So my understanding is that this expression says: the lower octet of each source and dest port, when AND’ed with the bitmask ’111′ result in a value of ’010′, or the number 2. Note that “3306 & 7 == 2″ is a true statement.

See http://www.tcpdump.org/tcpdump_man.html for more in-depth info on TCP header structure.

Baron Schwartz

15 years ago

I’m not very fresh on my TCP headers and I don’t remember what’s in various positions. What is the net effect of this expression?

tcp[1] & 7 == 2 and tcp[3] & 7 == 2

Pascal Schmiel

14 years ago

You can improve the rate even further if you first capture the traffic to a file using tcpdump’s “-w” option and parse it later with “tcpdump -r”.

Using /usr/sbin/tcpdump -i eth1 -c 100000 -s0 -x -nn -q -tttt ‘port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2’ | /usr/bin/mk-query-digest –type=tcpdump :

100000 packets captured
195794 packets received by filter
95664 packets dropped by kernel

And now using /usr/sbin/tcpdump -i eth1 -c 100000 -s0 -w /tmp/snoopfile ‘port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2’; /usr/sbin/tcpdump -r /tmp/snoopfile -s0 -x -nn -q -tttt | /usr/bin/mk-query-digest –type=tcpdump :

100000 packets captured
100286 packets received by filter
220 packets dropped by kernel

Erik Brakkee

11 years ago

I have been using the performance optimization with the tcp flags filtering in tcpdump for some time. Yesterday however, I found out that I could not capture any packets anymore on a number of hosts (not all). All of these hosts are running Centos 6 (various versions). I could not figure out what the problem was so had to remove the TCP flags again.