Performance Schema overhead

PREVIOUS POST
NEXT POST

As continuation of my CPU benchmarks it is interesting to see what is scalability limitation in MySQL 5.6.2, and I am going to check that using PERFORMANCE SCHEMA, but before that let’s estimate what is potential overhead of using PERFORMANCE SCHEMA. So I am going to run the same benchmarks (sysbench read-only and read-write) as in previous post with different performance schema options and compare results.

I am going to use Cisco UCS C250
with next settings:

  • PERFORMANCE SCHEMA disabled (NO PS)
  • PERFORMANCE SCHMEA enabled, with all consumers ON (PS on)
  • PERFORMANCE SCHMEA enabled, but only global_instrumentation consumer enabled. It allows to gather table and index access statistics (PS only global)
  • PERFORMANCE SCHMEA enabled, but all consumers OFF (PS all off)

The full results with details are not our Benchmark Wiki

There is graph for read-only case:

and for read-write:

To have some numeric impression, let’s see ration of result with PS to result without PS

There is table with ratios for read-only case:

ThreadsPS onPS only globalPS all off
11.111.101.13
21.131.081.04
41.151.071.05
81.181.071.03
241.211.081.06
321.251.101.08
481.251.101.08
641.231.101.06
1281.231.101.04
2561.211.081.04
5121.181.071.01
10241.171.010.96

There is table with ratios for read-write case:

ThreadsPS onPS only globalPS all off
11.070.940.98
21.111.001.06
41.151.041.08
81.191.021.08
241.171.001.07
321.181.071.06
481.181.091.13
641.171.111.11
1281.181.091.12
2561.141.040.99
5121.171.021.04
10241.211.061.07

So this allows us to make next summary:

In read-only case, Performance Schema with all consumers gives about 25% overhead,
with “global instrumentation” only -10%, and with all disabled consumers – about 8%.

For read-write case, Performance Schema with all consumers gives about 19% overhead,
with “global instrumentation” only -11%, and it is about the same with all disabled consumers.

Is that big or small ? I leave it for you to decide, I think it may be acceptable in some cases and not in some others.
I wish only that Performance Schema with all disabled consumers gives less overhead, 8-11% seems significant.
If nothing helps I would like to be able to fully disable / enable performance schema in run-time, not at start-time.

As I understand dtrace / systemtap probes can be disabled / enabled at run-time, and when they disabled – it is almost 0% overhead, why Performance Schema can’t do the same ?

(Disclaimer: This benchmark is sponsored by Well Know Social Network, and they are generous to make it public)

PREVIOUS POST
NEXT POST

Comments

  1. Marc Alff says

    Our internal benchmarks for mysql-trunk also show the same problem, this is a performance bug currently being investigated.

    About how the performance schema instrumentation works, a good way to start is to follow the instrumentation APIs. The code is publicly available, and also documented.

    For example, for the mutex instrumentation, see mysql_mutex_lock().

    Regards,
    – Marc

  2. Vadim Tkachenko says

    Davi,

    You explained how dtrace works, but I would be very much interested to hear details how P_S works. In particular, why with enabled P_S, but disabled all consumers, we still have 10% overhead? P_S does not need to collect and aggregate stats in this case, so what is it doing ?
    Can you shed some light on this ?

  3. Davi Arnaut says

    > My point is that ideally we should have near to zero overhead when we do not
    > use dtrace / systemtap / P_S.

    If we are speaking ideally, we should have it improve performance! Seriously though, I explained how dtrace operates on a completely different level then P_S.

  4. Vadim Tkachenko says

    Davi,

    My point is that ideally we should have near to zero overhead when we do not use dtrace / systemtap / P_S.

    E.g. when we disable all consumers, P_S does not need to aggregate and store data, does it ?
    The why do we see about 10% degradation ?

  5. Davi Arnaut says

    > As I understand dtrace / systemtap probes can be disabled / enabled at run-time,
    > and when they disabled – it is almost 0% overhead, why Performance Schema
    > can’t do the same ?

    For example, dtrace is able to patch instructions in a binary at runtime in order to add trap instructions — this requires kernel support, etc. Also, differently from dtrace/systemtap, P_S needs to aggregate and store data, etc.

  6. Patrick Casey says

    Interesting benchmark … that’s more overhead than I would have guessed w/o having looked at the code.

    Has anybody looked at how other database vendors (I’m thinking specifically of Oracle here) offer these kind of stats? Are they taking a similar non-trivial performance hit, or is their architecture different enough that they get away with it?

  7. Peter Zaitsev says

    One interesting thing is how overhead changes by workload type.
    Do we have more overhead for short statements (such as sysbench simple tests) or for long queries which crunch many row ? How much overhead do we get when we’re IO bound. Many applications have spare CPU and they just would not like their performance to drop for IO bound workload due to added contentions etc.

    Finally do you remember how Performance Schema in “row access counters only” mode compares to user_statistics patch in its overhead ?

  8. Vadim Tkachenko says

    Peter,

    I used recommendations from that post.
    The results are “PS all off” columns if we disable in the way described in ‘cheat sheet’

  9. says

    One interesting thing is how overhead changes by workload type.
    Do we have more overhead for short statements (such as sysbench simple tests) or for long queries which crunch many row ? How much overhead do we get when we’re IO bound. Many applications have spare CPU and they just would not like their performance to drop for IO bound workload due to added contentions etc.

    Finally do you remember how Performance Schema in “row access counters only” mode compares to user_statistics patch in its overhead ?

  10. Patrick Casey says

    Interesting benchmark … that’s more overhead than I would have guessed w/o having looked at the code.

    Has anybody looked at how other database vendors (I’m thinking specifically of Oracle here) offer these kind of stats? Are they taking a similar non-trivial performance hit, or is their architecture different enough that they get away with it?

  11. Davi Arnaut says

    > As I understand dtrace / systemtap probes can be disabled / enabled at run-time,
    > and when they disabled – it is almost 0% overhead, why Performance Schema
    > can’t do the same ?

    For example, dtrace is able to patch instructions in a binary at runtime in order to add trap instructions — this requires kernel support, etc. Also, differently from dtrace/systemtap, P_S needs to aggregate and store data, etc.

  12. says

    Davi,

    My point is that ideally we should have near to zero overhead when we do not use dtrace / systemtap / P_S.

    E.g. when we disable all consumers, P_S does not need to aggregate and store data, does it ?
    The why do we see about 10% degradation ?

  13. Davi Arnaut says

    > My point is that ideally we should have near to zero overhead when we do not
    > use dtrace / systemtap / P_S.

    If we are speaking ideally, we should have it improve performance! Seriously though, I explained how dtrace operates on a completely different level then P_S.

  14. says

    Davi,

    You explained how dtrace works, but I would be very much interested to hear details how P_S works. In particular, why with enabled P_S, but disabled all consumers, we still have 10% overhead? P_S does not need to collect and aggregate stats in this case, so what is it doing ?
    Can you shed some light on this ?

  15. Gia McNerney says

    Great test and report Vadim! I think the performance schema will be a great addition to MySQL once it’s more established. Just starting with MySQL after retiring from handling oracle databases – now volunteering :) Thanks for sharing.

Leave a Reply

Your email address will not be published. Required fields are marked *