Recently we had couple posts dedicated to performance monitoring, i.e. Color code your performance numbers, Performance Optimization and Six Sigma, so you may understand we consider stability of performance numbers as one of important area for database management.
That’s why we decided to add histogram of queries response times into Percona Server, and our software engineer Oleg Tsarev implemented this feature. The feature adds new INFORMATION_SCHEMA table QUERY_RESPONSE_TIME
( full docs are on page response_time_distribution), which looks like
1 2 3 4 5 6 7 8 |
time count 0.000001 53 0.000010 204840 0.000100 12064767 0.001000 8095309 0.010000 2889362 0.100000 19341 1.000000 7040 |
and which basically means that there were 7040 queries with execution time less than 1 sec and more than 0.1 sec.
The feature will be available in next release of Percona Server 5.1-12.0, but you can try it earlier – we setup our BuildBot farm to generate binary snapshots for each commit into lp:percona-server repository, and now you can take binary version for each particular commit from our TESTING stage
And to demonstrate possible usage of histogram of response time, let me show you
how increasing throughput affects response time. There are histograms and pie charts for sysbench oltp runs
for 1, 32 and 128 concurrent threads
1 thread
1 2 3 4 5 6 7 |
time count 0.000001 1 0.000010 102818 0.000100 2057031 0.001000 242722 0.010000 188 0.100000 3 |
so as you see in this case 10% of all queries were executed with execution time in interval (0.1ms, 1 ms] and 86% queries had execution time in interval (0.01ms, 0.1ms]
32 threads
1 2 3 4 5 6 7 8 |
time count 0.000001 21 0.000010 203637 0.000100 18723323 0.001000 8366615 0.010000 177225 0.100000 5077 1.000000 2240 |
With 32 thread percent of queries in interval (0.1ms, 1ms] increased to 30%, and (0.01ms, 0.1ms] dropped to 68%
128 threads
1 2 3 4 5 6 7 8 |
time count 0.000001 53 0.000010 204840 0.000100 12064767 0.001000 8095309 0.010000 2889362 0.100000 19341 1.000000 7040 |
For 128 threads we see now 12% of queries went into 1 ms < resp time <= 10ms, which we did not see previously,
and (0.1ms, 1ms] increased to 35%, while percentage of queries <=0.1ms is now only 52%.
I expect Baron will add this to Cacti templates, so you can setup monitoring of response times.
This will be very useful. I image that it would be useful if there was some way to tag a query with a keyword so that the information is not skewed by slow or fast queries. Without the tagging I imagine that most db’s will have a somewhat broad curve. With tagging I imagine the curve would be much more narrow and it would be easier to discern behavior changes.
Rob,
I agree, tagging of queries would give even more sense for query monitoring.
I think we will add it eventually.
Excellent. It is great to see our tools and processes get better.
thanks, it’s very useful!
Very good stuff! Keep up the good work!
Great stuff !
Wondering if there is any overhead with this function enabled ?
Rob,
This feature acts as a “fingerprint” for a lot of systems – there are going to be some slow queries and some short two, while distribution will stay more or less the same.
Not all the systems have this property of course some may have cron jobs etc which cause significant behavior variance.
For systems with stable workload we can monitory such histogram every 5 minutes or so and track for significant variances.
Periodic spikes in response time is a frequent problem which is often noticed too late – after it significantly impacts production. Though most of such problems start with more subtle changes.
Great stuff!! – specially when monitoring live the difference between periods you may point exactly to the moment when the deviation is happening!..
Will add it to dim_STAT too 🙂
Rgds,
-Dimitri
Vadim, is there an eta date on the next release (5.1-12.0)?
Sean,
It should come in next two weeks.
There are now graphs for this in the Cacti templates I maintain:
http://code.google.com/p/mysql-cacti-templates/wiki/MySQLTemplates
Scroll down to these graphs:
# MySQL Query Response Time (Microseconds)
# MySQL Query Time Histogram (Count)