Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Introducing tcprstat, a TCP response time tool

September 1, 2010

Author

Baron Schwartz

MySQL

Percona Software

Share this Post:

Ignacio Nin and I (mostly Ignacio) have worked together to create tcprstat[1], a new tool that times TCP requests and prints out statistics on them. The output looks somewhat like vmstat or iostat, but we’ve chosen the statistics carefully so you can compute meaningful things about your TCP traffic.

What is this good for? In a nutshell, it is a lightweight way to measure response times on a server such as a database, memcached, Apache, and so on. You can use this information for historical metrics, capacity planning, troubleshooting, and monitoring to name just a few.

The tcprstat tool itself is a means of gathering raw statistics, which are suitable for storing and manipulating with other programs and scripts. By default, tcprstat works just like vmstat: it runs once, prints out a line, and exits. You’ll probably want to tell it to run forever, and continue to print out more lines. Each line contains a timestamp and information about the response time of the requests within that time period. Here “response time” means, for a given TCP connection, the time elapsed from the last inbound packet until the first outbound packet. For many simple protocols such as HTTP and MySQL, this is the moral equivalent of a query’s response time.

The statistics we chose to output by default are the count, median, average, min, max, and standard deviation of the response times, in microseconds. These are repeated for the 95th and 99th percentiles as well. Other metrics are also available. Here’s a sample:

[root@server] # tcprstat -p 3306 -n 0 -t 1
timestamp	count	max	min	avg	med	stddev	95_max	95_avg	95_std	99_max	99_avg	99_std
1276827985	1341	24556	23	149	59	767	310	91	69	1030	107	112
1276827986	1329	12098	28	134	63	461	299	91	65	667	104	93
1276827987	1180	13277	22	202	93	873	439	103	79	1523	131	169
1276827988	1441	15878	27	180	139	672	427	116	79	1045	136	128
1276827989	1432	157198	26	272	138	4165	405	115	80	1092	134	123
1276827990	1835	25198	26	183	124	734	448	115	85	1141	137	141
1276827991	1242	6949	29	129	114	301	233	98	61	686	109	84
1276827992	1480	284181	25	442	127	7432	701	128	114	4157	173	293
1276827993	1448	9339	22	161	88	425	392	104	80	1280	126	140

[root@server] # tcprstat -p 3306 -n 0 -t 1

timestamp count max min avg med stddev 95_max 95_avg 95_std 99_max 99_avg 99_std

1276827985 1341 24556 23 149 59 767 310 91 69 1030 107 112

1276827986 1329 12098 28 134 63 461 299 91 65 667 104 93

1276827987 1180 13277 22 202 93 873 439 103 79 1523 131 169

1276827988 1441 15878 27 180 139 672 427 116 79 1045 136 128

1276827989 1432 157198 26 272 138 4165 405 115 80 1092 134 123

1276827990 1835 25198 26 183 124 734 448 115 85 1141 137 141

1276827991 1242 6949 29 129 114 301 233 98 61 686 109 84

1276827992 1480 284181 25 442 127 7432 701 128 114 4157 173 293

1276827993 1448 9339 22 161 88 425 392 104 80 1280 126 140

tcprstat uses libpcap to capture traffic. It’s a threaded application that does the minimum possible work and uses efficient data structures. Your feedback on the kernel/userland exchange overhead caused by the packet sniffing would be very appreciated — libpcap allows the user to tune this exchange, so if you have suggestions on how to improve it, that’s great.

We build statically linked binaries with the preferred version of libpcap, which means there are no dependencies. You can just run the tool. In the future, packages in the Percona repositories will provide another means for rapid installation via yum and apt.

tcprstat is beta software. Several C/C++ experts reviewed its code and gave it a thumbs-up, so many eyes have been on the code. We’ve performed tests on servers with high loads and observed minimal resource consumption. I personally have been running it for many weeks on some production servers without stopping it and have seen no problems, so I am pretty sure it has no memory leaks or other problems. Nevertheless, it’s a first prototype release, and we want much more testing. We might also change the functionality; as we build tools around it, we discover new things that might be useful. When we’re happy with it and you’re happy with it, we’ll take the Beta label away and make it GA.

The tcprstat user’s manual and links to downloads are on the Percona wiki. Commercial support and services are provided by Percona. Bug reports, feature requests, etc should go to the Launchpad project linked from the user’s manual. General discussion is welcome on the Google Group also linked from the user’s manual.

[1] Historical note: we initially called this tool rtime, but did not publicize it. However, some of you might have heard of “rtime” before. This is the same tool.

0 0 votes

Article Rating

13 Comments

Oldest

Newest Most Voted

PowerPaul

15 years ago

You should state the kernel version required.

$ ./tcprstat-static.v0.3.1.x86_64
FATAL: kernel too old
Segmentation fault
$ uname -sr
Linux 2.6.16.46-0.12-smp

Dimitri

15 years ago

That’s a great stuff, folks!!! :-))

Do you plan to port it on other platforms than Linux?..

And a small feature request so far – it’ll be great to print stats from several TCP ports on the same time! 😉
For ex. you accept several port numbers / ranges within -p option, and then you print multi-line stats separated by empty line (similar to “iostat”, but instead of disk names you’ll have port numbers!) – that will be even more great! – specially when you want to monitor several MySQL instances in parallel or simply combine Apache and MySQL and others on the same output 🙂

Once it’ll become GA I’ll integrate it into dim_STAT with a pleasure! 🙂

Rgds,
-Dimitri

Author

Baron Schwartz

15 years ago

Hi Dimitri, I’m glad you like this. What other platforms would you like?

Dan

15 years ago

I just installed Bazaar to download source and see if I can get this up on Solaris 8 (got a legacy MySQL installation in need of love).

Dimitri

15 years ago

Hi Baron,

now when you’re asking.. :-)) – personally I’d like to have it also on Solaris :-)) but on the same time it’s very possible that DTrace will be able to print a similar information too.. – will check :-)) And then – I have no idea if any other UNIXes are popular to host MySQL as Linux.. – on the same time the tool may be useful not only for MySQL ;-)) – so I think you may see what the demand will be, and then decide according priorities 🙂

Rgds,
-Dimitri

Author

Baron Schwartz

15 years ago

Well, let’s see who wants to pay us to port it 🙂

If someone else wants to port it too, it’s on Launchpad, so I guess that should make it pretty easy, right?

AndrÃ© Ferraz

15 years ago

There is a small typo in one of the source file that prevent compiling on 32bit systems, i did a patch to fix it: https://bugs.launchpad.net/tcprstat/+bug/628073

Author

Baron Schwartz

15 years ago

Thanks Andre. PowerPaul, I haven’t had that problem. Please file a bug.

Subbu Subramaniam

15 years ago

There is an open source tool called yconalyzer that you should be able to compile for the other platforms.

https://sourceforge.net/projects/yconalyzer/

vbarter

15 years ago

FATAL: kernel too old
Segmentation fault (core dumped)

Linux 2.6.9-52bs

also got this problem

Didier Spezia

15 years ago

Perhaps it is worth mentioning that it does not really represent performance of the queries (or more exactly of the database roundtrips) as experienced by the client application. My understanding is measurement is done on server-side, so it completely excludes network latency. Another suggestion would be to publish a list of well-known servers/protocols which are suitable for such measurements. Purpose is to prevent people to run it against pipelining/multiplexing or non client/server protocols which will only result in meaningless data.

Author

Baron Schwartz

15 years ago

If you want true measurement of the query response time, you have to do it in the database server itself (i.e. use the slow query log in MySQL), because even measuring TCP traffic on the server isn’t that precise due to kernel buffering and such. But this is actually the typical place where queries are measured, so it’s not as if server-side measurement is a weakness of tcprstat! Now, it is perfectly legitimate to measure from the client side, although you have to be careful of this, because you no longer know the difference between the network time and the query response time. This is one reason why tcprstat has the -l option.

I think that it’s more usual for protocols to be a simple call-and-response, and it would be better to make a list of protocols that fall outside this category, instead of listing all the ones that are valid to measure with tcprstat.

repls

13 years ago

Hi Baron,
I have some questions about tcprstat tool.

first, is the tcprstat filter some packet’s response time? such as packets used to build a connection with MySQL Server. because if the tcprstat counts these packets’s response time into output, then the result may not precise.

second, maybe you can provide more accuracy result, such as 99.99_max 99.99_agv, 99.999_max, 99.999_avg and so on.