As continuation of my CPU benchmarks it is interesting to see what is scalability limitation in MySQL 5.6.2, and I am going to check that using PERFORMANCE SCHEMA, but before that let’s estimate what is potential overhead of using PERFORMANCE SCHEMA. So I am going to run the same benchmarks (sysbench read-only and read-write) as in previous post with different performance schema options and compare results.
I am going to use Cisco UCS C250
with next settings:
The full results with details are not our Benchmark Wiki
There is graph for read-only case:
and for read-write:
To have some numeric impression, let’s see ration of result with PS to result without PS
There is table with ratios for read-only case:
| Threads | PS on | PS only global | PS all off |
| 1 | 1.11 | 1.10 | 1.13 |
| 2 | 1.13 | 1.08 | 1.04 |
| 4 | 1.15 | 1.07 | 1.05 |
| 8 | 1.18 | 1.07 | 1.03 |
| 24 | 1.21 | 1.08 | 1.06 |
| 32 | 1.25 | 1.10 | 1.08 |
| 48 | 1.25 | 1.10 | 1.08 |
| 64 | 1.23 | 1.10 | 1.06 |
| 128 | 1.23 | 1.10 | 1.04 |
| 256 | 1.21 | 1.08 | 1.04 |
| 512 | 1.18 | 1.07 | 1.01 |
| 1024 | 1.17 | 1.01 | 0.96 |
There is table with ratios for read-write case:
| Threads | PS on | PS only global | PS all off |
| 1 | 1.07 | 0.94 | 0.98 |
| 2 | 1.11 | 1.00 | 1.06 |
| 4 | 1.15 | 1.04 | 1.08 |
| 8 | 1.19 | 1.02 | 1.08 |
| 24 | 1.17 | 1.00 | 1.07 |
| 32 | 1.18 | 1.07 | 1.06 |
| 48 | 1.18 | 1.09 | 1.13 |
| 64 | 1.17 | 1.11 | 1.11 |
| 128 | 1.18 | 1.09 | 1.12 |
| 256 | 1.14 | 1.04 | 0.99 |
| 512 | 1.17 | 1.02 | 1.04 |
| 1024 | 1.21 | 1.06 | 1.07 |
So this allows us to make next summary:
In read-only case, Performance Schema with all consumers gives about 25% overhead,
with “global instrumentation” only –10%, and with all disabled consumers – about 8%.
For read-write case, Performance Schema with all consumers gives about 19% overhead,
with “global instrumentation” only –11%, and it is about the same with all disabled consumers.
Is that big or small ? I leave it for you to decide, I think it may be acceptable in some cases and not in some others.
I wish only that Performance Schema with all disabled consumers gives less overhead, 8-11% seems significant.
If nothing helps I would like to be able to fully disable / enable performance schema in run-time, not at start-time.
As I understand dtrace / systemtap probes can be disabled / enabled at run-time, and when they disabled – it is almost 0% overhead, why Performance Schema can’t do the same ?
(Disclaimer: This benchmark is sponsored by Well Know Social Network, and they are generous to make it public)