EmergencyEMERGENCY? Get 24/7 Help Now!

Performance Schema overhead

 | April 25, 2011 |  Posted In: Benchmarks, Hardware and Storage, Insight for DBAs, MySQL

PREVIOUS POST
NEXT POST

As continuation of my CPU benchmarks it is interesting to see what is scalability limitation in MySQL 5.6.2, and I am going to check that using PERFORMANCE SCHEMA, but before that let’s estimate what is potential overhead of using PERFORMANCE SCHEMA. So I am going to run the same benchmarks (sysbench read-only and read-write) as in previous post with different performance schema options and compare results.

I am going to use Cisco UCS C250
with next settings:

  • PERFORMANCE SCHEMA disabled (NO PS)
  • PERFORMANCE SCHMEA enabled, with all consumers ON (PS on)
  • PERFORMANCE SCHMEA enabled, but only global_instrumentation consumer enabled. It allows to gather table and index access statistics (PS only global)
  • PERFORMANCE SCHMEA enabled, but all consumers OFF (PS all off)

The full results with details are not our Benchmark Wiki

There is graph for read-only case:

and for read-write:

To have some numeric impression, let’s see ration of result with PS to result without PS

There is table with ratios for read-only case:

Threads PS on PS only global PS all off
1 1.11 1.10 1.13
2 1.13 1.08 1.04
4 1.15 1.07 1.05
8 1.18 1.07 1.03
24 1.21 1.08 1.06
32 1.25 1.10 1.08
48 1.25 1.10 1.08
64 1.23 1.10 1.06
128 1.23 1.10 1.04
256 1.21 1.08 1.04
512 1.18 1.07 1.01
1024 1.17 1.01 0.96

There is table with ratios for read-write case:

Threads PS on PS only global PS all off
1 1.07 0.94 0.98
2 1.11 1.00 1.06
4 1.15 1.04 1.08
8 1.19 1.02 1.08
24 1.17 1.00 1.07
32 1.18 1.07 1.06
48 1.18 1.09 1.13
64 1.17 1.11 1.11
128 1.18 1.09 1.12
256 1.14 1.04 0.99
512 1.17 1.02 1.04
1024 1.21 1.06 1.07

So this allows us to make next summary:

In read-only case, Performance Schema with all consumers gives about 25% overhead,
with “global instrumentation” only –10%, and with all disabled consumers – about 8%.

For read-write case, Performance Schema with all consumers gives about 19% overhead,
with “global instrumentation” only –11%, and it is about the same with all disabled consumers.

Is that big or small ? I leave it for you to decide, I think it may be acceptable in some cases and not in some others.
I wish only that Performance Schema with all disabled consumers gives less overhead, 8-11% seems significant.
If nothing helps I would like to be able to fully disable / enable performance schema in run-time, not at start-time.

As I understand dtrace / systemtap probes can be disabled / enabled at run-time, and when they disabled – it is almost 0% overhead, why Performance Schema can’t do the same ?

(Disclaimer: This benchmark is sponsored by Well Know Social Network, and they are generous to make it public)

PREVIOUS POST
NEXT POST
Vadim Tkachenko

Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Officer. Vadim leads Percona Labs, which focuses on technology research and performance evaluations of Percona’s and third-party products. Percona Labs designs no-gimmick tests of hardware, filesystems, storage engines, and databases that surpass the standard performance and functionality scenario benchmarks. Vadim’s expertise in LAMP performance and multi-threaded programming help optimize MySQL and InnoDB internals to take full advantage of modern hardware. Oracle Corporation and its predecessors have incorporated Vadim’s source code patches into the mainstream MySQL and InnoDB products. He also co-authored the book High Performance MySQL: Optimization, Backups, and Replication 3rd Edition.

22 Comments

  • Our internal benchmarks for mysql-trunk also show the same problem, this is a performance bug currently being investigated.

    About how the performance schema instrumentation works, a good way to start is to follow the instrumentation APIs. The code is publicly available, and also documented.

    For example, for the mutex instrumentation, see mysql_mutex_lock().

    Regards,
    – Marc

  • Davi,

    You explained how dtrace works, but I would be very much interested to hear details how P_S works. In particular, why with enabled P_S, but disabled all consumers, we still have 10% overhead? P_S does not need to collect and aggregate stats in this case, so what is it doing ?
    Can you shed some light on this ?

  • > My point is that ideally we should have near to zero overhead when we do not
    > use dtrace / systemtap / P_S.

    If we are speaking ideally, we should have it improve performance! Seriously though, I explained how dtrace operates on a completely different level then P_S.

  • Davi,

    My point is that ideally we should have near to zero overhead when we do not use dtrace / systemtap / P_S.

    E.g. when we disable all consumers, P_S does not need to aggregate and store data, does it ?
    The why do we see about 10% degradation ?

  • > As I understand dtrace / systemtap probes can be disabled / enabled at run-time,
    > and when they disabled – it is almost 0% overhead, why Performance Schema
    > can’t do the same ?

    For example, dtrace is able to patch instructions in a binary at runtime in order to add trap instructions — this requires kernel support, etc. Also, differently from dtrace/systemtap, P_S needs to aggregate and store data, etc.

  • Interesting benchmark … that’s more overhead than I would have guessed w/o having looked at the code.

    Has anybody looked at how other database vendors (I’m thinking specifically of Oracle here) offer these kind of stats? Are they taking a similar non-trivial performance hit, or is their architecture different enough that they get away with it?

  • One interesting thing is how overhead changes by workload type.
    Do we have more overhead for short statements (such as sysbench simple tests) or for long queries which crunch many row ? How much overhead do we get when we’re IO bound. Many applications have spare CPU and they just would not like their performance to drop for IO bound workload due to added contentions etc.

    Finally do you remember how Performance Schema in “row access counters only” mode compares to user_statistics patch in its overhead ?

  • Peter,

    I used recommendations from that post.
    The results are “PS all off” columns if we disable in the way described in ‘cheat sheet’

  • One interesting thing is how overhead changes by workload type.
    Do we have more overhead for short statements (such as sysbench simple tests) or for long queries which crunch many row ? How much overhead do we get when we’re IO bound. Many applications have spare CPU and they just would not like their performance to drop for IO bound workload due to added contentions etc.

    Finally do you remember how Performance Schema in “row access counters only” mode compares to user_statistics patch in its overhead ?

  • Interesting benchmark … that’s more overhead than I would have guessed w/o having looked at the code.

    Has anybody looked at how other database vendors (I’m thinking specifically of Oracle here) offer these kind of stats? Are they taking a similar non-trivial performance hit, or is their architecture different enough that they get away with it?

  • > As I understand dtrace / systemtap probes can be disabled / enabled at run-time,
    > and when they disabled – it is almost 0% overhead, why Performance Schema
    > can’t do the same ?

    For example, dtrace is able to patch instructions in a binary at runtime in order to add trap instructions — this requires kernel support, etc. Also, differently from dtrace/systemtap, P_S needs to aggregate and store data, etc.

  • Davi,

    My point is that ideally we should have near to zero overhead when we do not use dtrace / systemtap / P_S.

    E.g. when we disable all consumers, P_S does not need to aggregate and store data, does it ?
    The why do we see about 10% degradation ?

  • > My point is that ideally we should have near to zero overhead when we do not
    > use dtrace / systemtap / P_S.

    If we are speaking ideally, we should have it improve performance! Seriously though, I explained how dtrace operates on a completely different level then P_S.

  • Davi,

    You explained how dtrace works, but I would be very much interested to hear details how P_S works. In particular, why with enabled P_S, but disabled all consumers, we still have 10% overhead? P_S does not need to collect and aggregate stats in this case, so what is it doing ?
    Can you shed some light on this ?

  • Great test and report Vadim! I think the performance schema will be a great addition to MySQL once it’s more established. Just starting with MySQL after retiring from handling oracle databases – now volunteering 🙂 Thanks for sharing.

Leave a Reply

 
 

Percona’s widely read Percona Data Performance blog highlights our expertise in enterprise-class software, support, consulting and managed services solutions for both MySQL® and MongoDB® across traditional and cloud-based platforms. The decades of experience represented by our consultants is found daily in numerous and relevant blog posts.

Besides specific database help, the blog also provides notices on upcoming events and webinars.
Want to get weekly updates listing the latest blog posts? Subscribe to our blog now! Submit your email address below and we’ll send you an update every Friday at 1pm ET.

No, thank you. Please do not ask me again.