Performance Schema overhead

Performance Schema overhead

PREVIOUS POST
NEXT POST

As continuation of my CPU benchmarks it is interesting to see what is scalability limitation in MySQL 5.6.2, and I am going to check that using PERFORMANCE SCHEMA, but before that let’s estimate what is potential overhead of using PERFORMANCE SCHEMA. So I am going to run the same benchmarks (sysbench read-only and read-write) as in previous post with different performance schema options and compare results.

I am going to use Cisco UCS C250
with next settings:

  • PERFORMANCE SCHEMA disabled (NO PS)
  • PERFORMANCE SCHMEA enabled, with all consumers ON (PS on)
  • PERFORMANCE SCHMEA enabled, but only global_instrumentation consumer enabled. It allows to gather table and index access statistics (PS only global)
  • PERFORMANCE SCHMEA enabled, but all consumers OFF (PS all off)

The full results with details are not our Benchmark Wiki

There is graph for read-only case:

and for read-write:

To have some numeric impression, let’s see ration of result with PS to result without PS

There is table with ratios for read-only case:

Threads PS on PS only global PS all off
1 1.11 1.10 1.13
2 1.13 1.08 1.04
4 1.15 1.07 1.05
8 1.18 1.07 1.03
24 1.21 1.08 1.06
32 1.25 1.10 1.08
48 1.25 1.10 1.08
64 1.23 1.10 1.06
128 1.23 1.10 1.04
256 1.21 1.08 1.04
512 1.18 1.07 1.01
1024 1.17 1.01 0.96

There is table with ratios for read-write case:

Threads PS on PS only global PS all off
1 1.07 0.94 0.98
2 1.11 1.00 1.06
4 1.15 1.04 1.08
8 1.19 1.02 1.08
24 1.17 1.00 1.07
32 1.18 1.07 1.06
48 1.18 1.09 1.13
64 1.17 1.11 1.11
128 1.18 1.09 1.12
256 1.14 1.04 0.99
512 1.17 1.02 1.04
1024 1.21 1.06 1.07

So this allows us to make next summary:

In read-only case, Performance Schema with all consumers gives about 25% overhead,
with “global instrumentation” only –10%, and with all disabled consumers – about 8%.

For read-write case, Performance Schema with all consumers gives about 19% overhead,
with “global instrumentation” only –11%, and it is about the same with all disabled consumers.

Is that big or small ? I leave it for you to decide, I think it may be acceptable in some cases and not in some others.
I wish only that Performance Schema with all disabled consumers gives less overhead, 8-11% seems significant.
If nothing helps I would like to be able to fully disable / enable performance schema in run-time, not at start-time.

As I understand dtrace / systemtap probes can be disabled / enabled at run-time, and when they disabled – it is almost 0% overhead, why Performance Schema can’t do the same ?

(Disclaimer: This benchmark is sponsored by Well Know Social Network, and they are generous to make it public)

PREVIOUS POST
NEXT POST

Share this post

Comments (22)

  • Mark Callaghan Reply

    What is the bug number for the performance problem being investigated?

    April 25, 2011 at 12:00 am
  • Marc Alff Reply

    Our internal benchmarks for mysql-trunk also show the same problem, this is a performance bug currently being investigated.

    About how the performance schema instrumentation works, a good way to start is to follow the instrumentation APIs. The code is publicly available, and also documented.

    For example, for the mutex instrumentation, see mysql_mutex_lock().

    Regards,
    – Marc

    April 25, 2011 at 12:00 am
  • Davi Arnaut Reply

    Vadim,

    Please take a look at the high level architecture of WL#2360 (http://forge.mysql.com/worklog/task.php?id=2360). See the overhead heading.

    April 25, 2011 at 12:00 am
  • Vadim Tkachenko Reply

    Davi,

    You explained how dtrace works, but I would be very much interested to hear details how P_S works. In particular, why with enabled P_S, but disabled all consumers, we still have 10% overhead? P_S does not need to collect and aggregate stats in this case, so what is it doing ?
    Can you shed some light on this ?

    April 25, 2011 at 12:00 am
  • Davi Arnaut Reply

    > My point is that ideally we should have near to zero overhead when we do not
    > use dtrace / systemtap / P_S.

    If we are speaking ideally, we should have it improve performance! Seriously though, I explained how dtrace operates on a completely different level then P_S.

    April 25, 2011 at 12:00 am
  • Vadim Tkachenko Reply

    Davi,

    My point is that ideally we should have near to zero overhead when we do not use dtrace / systemtap / P_S.

    E.g. when we disable all consumers, P_S does not need to aggregate and store data, does it ?
    The why do we see about 10% degradation ?

    April 25, 2011 at 12:00 am
  • Davi Arnaut Reply

    > As I understand dtrace / systemtap probes can be disabled / enabled at run-time,
    > and when they disabled – it is almost 0% overhead, why Performance Schema
    > can’t do the same ?

    For example, dtrace is able to patch instructions in a binary at runtime in order to add trap instructions — this requires kernel support, etc. Also, differently from dtrace/systemtap, P_S needs to aggregate and store data, etc.

    April 25, 2011 at 12:00 am
  • Patrick Casey Reply

    Interesting benchmark … that’s more overhead than I would have guessed w/o having looked at the code.

    Has anybody looked at how other database vendors (I’m thinking specifically of Oracle here) offer these kind of stats? Are they taking a similar non-trivial performance hit, or is their architecture different enough that they get away with it?

    April 25, 2011 at 12:00 am
  • Peter Zaitsev Reply

    One interesting thing is how overhead changes by workload type.
    Do we have more overhead for short statements (such as sysbench simple tests) or for long queries which crunch many row ? How much overhead do we get when we’re IO bound. Many applications have spare CPU and they just would not like their performance to drop for IO bound workload due to added contentions etc.

    Finally do you remember how Performance Schema in “row access counters only” mode compares to user_statistics patch in its overhead ?

    April 25, 2011 at 12:00 am
  • Vadim Tkachenko Reply

    Peter,

    I used recommendations from that post.
    The results are “PS all off” columns if we disable in the way described in ‘cheat sheet’

    April 25, 2011 at 12:00 am
  • Peter Laursen Reply

    I think there is a ‘cheat sheet’ here. But I did not try it.
    http://marcalff.blogspot.com/2011/04/performance-schema-faq-1-enable-without.html

    April 25, 2011 at 12:00 am
  • Peter Laursen Reply

    I think there is a ‘cheat sheet’ here. But I did not try it.
    http://marcalff.blogspot.com/2011/04/performance-schema-faq-1-enable-without.html

    April 26, 2011 at 4:35 am
  • Vadim Tkachenko Reply

    Peter,

    I used recommendations from that post.
    The results are “PS all off” columns if we disable in the way described in ‘cheat sheet’

    April 26, 2011 at 7:54 am
  • Peter Zaitsev Reply

    One interesting thing is how overhead changes by workload type.
    Do we have more overhead for short statements (such as sysbench simple tests) or for long queries which crunch many row ? How much overhead do we get when we’re IO bound. Many applications have spare CPU and they just would not like their performance to drop for IO bound workload due to added contentions etc.

    Finally do you remember how Performance Schema in “row access counters only” mode compares to user_statistics patch in its overhead ?

    April 26, 2011 at 8:50 am
  • Patrick Casey Reply

    Interesting benchmark … that’s more overhead than I would have guessed w/o having looked at the code.

    Has anybody looked at how other database vendors (I’m thinking specifically of Oracle here) offer these kind of stats? Are they taking a similar non-trivial performance hit, or is their architecture different enough that they get away with it?

    April 26, 2011 at 12:38 pm
  • Davi Arnaut Reply

    > As I understand dtrace / systemtap probes can be disabled / enabled at run-time,
    > and when they disabled – it is almost 0% overhead, why Performance Schema
    > can’t do the same ?

    For example, dtrace is able to patch instructions in a binary at runtime in order to add trap instructions — this requires kernel support, etc. Also, differently from dtrace/systemtap, P_S needs to aggregate and store data, etc.

    April 26, 2011 at 1:24 pm
  • Vadim Tkachenko Reply

    Davi,

    My point is that ideally we should have near to zero overhead when we do not use dtrace / systemtap / P_S.

    E.g. when we disable all consumers, P_S does not need to aggregate and store data, does it ?
    The why do we see about 10% degradation ?

    April 26, 2011 at 1:33 pm
  • Davi Arnaut Reply

    > My point is that ideally we should have near to zero overhead when we do not
    > use dtrace / systemtap / P_S.

    If we are speaking ideally, we should have it improve performance! Seriously though, I explained how dtrace operates on a completely different level then P_S.

    April 26, 2011 at 1:46 pm
  • Vadim Tkachenko Reply

    Davi,

    You explained how dtrace works, but I would be very much interested to hear details how P_S works. In particular, why with enabled P_S, but disabled all consumers, we still have 10% overhead? P_S does not need to collect and aggregate stats in this case, so what is it doing ?
    Can you shed some light on this ?

    April 26, 2011 at 3:42 pm
  • Davi Arnaut Reply

    Vadim,

    Please take a look at the high level architecture of WL#2360 (http://forge.mysql.com/worklog/task.php?id=2360). See the overhead heading.

    April 26, 2011 at 4:01 pm
  • Mark Callaghan Reply

    The overhead is still there. I measured 10% for PS=on with default consumers and read-only sysbench – http://bugs.mysql.com/bug.php?id=68413

    February 18, 2013 at 9:32 am
  • Gia McNerney Reply

    Great test and report Vadim! I think the performance schema will be a great addition to MySQL once it’s more established. Just starting with MySQL after retiring from handling oracle databases – now volunteering 🙂 Thanks for sharing.

    June 12, 2013 at 4:30 pm

Leave a Reply