Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Query Response time histogram – new feature in Percona Server

July 12, 2010

Author

Vadim Tkachenko

MySQL

Percona Software

Share this Post:

Recently we had couple posts dedicated to performance monitoring, i.e. Color code your performance numbers, Performance Optimization and Six Sigma, so you may understand we consider stability of performance numbers as one of important area for database management.

That’s why we decided to add histogram of queries response times into Percona Server, and our software engineer Oleg Tsarev implemented this feature. The feature adds new INFORMATION_SCHEMA table QUERY_RESPONSE_TIME
( full docs are on page response_time_distribution), which looks like

     time         count
      0.000001  53
      0.000010  204840
      0.000100  12064767
      0.001000  8095309
      0.010000  2889362
      0.100000  19341
      1.000000  7040

time count

0.000001 53

0.000010 204840

0.000100 12064767

0.001000 8095309

0.010000 2889362

0.100000 19341

1.000000 7040

and which basically means that there were 7040 queries with execution time less than 1 sec and more than 0.1 sec.

The feature will be available in next release of Percona Server 5.1-12.0, but you can try it earlier – we setup our BuildBot farm to generate binary snapshots for each commit into lp:percona-server repository, and now you can take binary version for each particular commit from our TESTING stage

And to demonstrate possible usage of histogram of response time, let me show you
how increasing throughput affects response time. There are histograms and pie charts for sysbench oltp runs
for 1, 32 and 128 concurrent threads

1 thread

time    count
      0.000001  1
      0.000010  102818
      0.000100  2057031
      0.001000  242722
      0.010000  188
      0.100000  3

time count

0.000001 1

0.000010 102818

0.000100 2057031

0.001000 242722

0.010000 188

0.100000 3

so as you see in this case 10% of all queries were executed with execution time in interval (0.1ms, 1 ms] and 86% queries had execution time in interval (0.01ms, 0.1ms]

32 threads

time    count
      0.000001  21
      0.000010  203637
      0.000100  18723323
      0.001000  8366615
      0.010000  177225
      0.100000  5077
      1.000000  2240

time count

0.000001 21

0.000010 203637

0.000100 18723323

0.001000 8366615

0.010000 177225

0.100000 5077

1.000000 2240

With 32 thread percent of queries in interval (0.1ms, 1ms] increased to 30%, and (0.01ms, 0.1ms] dropped to 68%

128 threads

time    count
      0.000001  53
      0.000010  204840
      0.000100  12064767
      0.001000  8095309
      0.010000  2889362
      0.100000  19341
      1.000000  7040

time count

0.000001 53

0.000010 204840

0.000100 12064767

0.001000 8095309

0.010000 2889362

0.100000 19341

1.000000 7040

For 128 threads we see now 12% of queries went into 1 ms < resp time <= 10ms, which we did not see previously,
and (0.1ms, 1ms] increased to 35%, while percentage of queries <=0.1ms is now only 52%.

I expect Baron will add this to Cacti templates, so you can setup monitoring of response times.

0 0 votes

Article Rating

11 Comments

Oldest

Newest Most Voted

Rob Wultsch

16 years ago

This will be very useful. I image that it would be useful if there was some way to tag a query with a keyword so that the information is not skewed by slow or fast queries. Without the tagging I imagine that most db’s will have a somewhat broad curve. With tagging I imagine the curve would be much more narrow and it would be easier to discern behavior changes.

Author

Vadim Tkachenko

16 years ago

Rob,

I agree, tagging of queries would give even more sense for query monitoring.
I think we will add it eventually.

Mark Callaghan

16 years ago

Excellent. It is great to see our tools and processes get better.

matt

16 years ago

thanks, it’s very useful!

Shlomi Noach

16 years ago

Very good stuff! Keep up the good work!

Farid

16 years ago

Great stuff !

Wondering if there is any overhead with this function enabled ?

Admin

Peter Zaitsev

16 years ago

Rob,

This feature acts as a “fingerprint” for a lot of systems – there are going to be some slow queries and some short two, while distribution will stay more or less the same.
Not all the systems have this property of course some may have cron jobs etc which cause significant behavior variance.

For systems with stable workload we can monitory such histogram every 5 minutes or so and track for significant variances.

Periodic spikes in response time is a frequent problem which is often noticed too late – after it significantly impacts production. Though most of such problems start with more subtle changes.