Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

Replication Triggers a Performance Schema Issue on Percona XtraDB Cluster

October 21, 2016

Author

David Ducos

MySQL

Percona Software

Share this Post:

In this blog post, we’ll look at how replication triggers a Performance Schema issue on Percona XtraDB Cluster.

During an upgrade to Percona XtraDB Cluster 5.6, I faced an issue that I wanted to share. In this environment, we set up three Percona XtraDB Cluster nodes (mostly configured as default), copied from a production server. We configured one of the members of the cluster as the slave of the production server.

During the testing process, we found that a full table scan query was taking four times less in the nodes where replication was not configured. After reviewing mostly everything related to the query, we decided to use perf.

We executed:

perf record -a -g -F99 -p $(pidof mysqld) -- sleep 60

1	perf record -a -g -F99 -p $(pidof mysqld) -- sleep 60

And the query in another terminal a couple of times. Then we executed:

perf report > perf.out

1	perf report > perf.out

And we found in the perf.out this useful information:

# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 5K of event 'cpu-clock'
# Event count (approx.): 57646464070
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... .................. ............................................................................................................................................................
#
62.03% 62.01% mysqld mysqld [.] my_timer_cycles
|
---my_timer_cycles

4.66% 4.66% mysqld mysqld [.] 0x00000000005425d4
|
---0x5425d4

4.66% 0.00% mysqld mysqld [.] 0x00000000001425d4
|
---0x5425d4

3.31% 3.31% mysqld mysqld [.] 0x00000000005425a7
|
---0x5425a7

# To display the perf.data header info, please use --header/--header-only options.

# Samples: 5K of event 'cpu-clock'

# Event count (approx.): 57646464070

# Children Self Command Shared Object Symbol

# ........ ........ ....... .................. ............................................................................................................................................................

62.03% 62.01% mysqld mysqld [.] my_timer_cycles

---my_timer_cycles

4.66% 4.66% mysqld mysqld [.] 0x00000000005425d4

---0x5425d4

4.66% 0.00% mysqld mysqld [.] 0x00000000001425d4

---0x5425d4

3.31% 3.31% mysqld mysqld [.] 0x00000000005425a7

---0x5425a7

As you can see, the my_timer_cycles function took 62.03% of the time. Related to this, we found a blog (http://dtrace.org/blogs/brendan/2011/06/27/viewing-the-invisible/) that explained how after enabling the Performance Schema, the performance dropped 10%. So, we decided to disable Performance Schema in order to see if this issue was related to the one described in the blog. We found that after the restart required by disabling Performance Schema, the query was taking the expected amount of time.

We also found out that this was triggered by replication, and nodes rebuilt from this member might have this issue. It was the same if you rebuilt from a member that was OK: the new member might execute the query slower.

Finally, you should take into account that my_timer_cycles seems to be called on a per-row basis, so if your dataset is small you will never notice this issue. However, if you are doing a full table scan of a million row table, you could face this issue.

Conclusion

If you are having query performance issues, and you can’t find the root cause, try disabling or debugging instruments from the Performance Schema to see if that is causing the issue.

0 0 votes

Article Rating

12 Comments

Oldest

Newest Most Voted

Admin

Peter Zaitsev

9 years ago

David,

I wonder if you have more details here. As Brendan Gregg explains this function should be very fast unless something is stalling it. I wonder what conditions caused it in your case

Also did you use Performance Schema with default setting or some more verbose instrumentation ?

Author

David Ducos

9 years ago

Reply to Peter Zaitsev

Hi Peter,

It was a fresh install, everything set at default.

There was nothing particularly stalling it as the only traffic that receives were from the replication channel.

lefred

9 years ago

Hi David,

I have some questions here.

My first one is related to something I don’t really understand, maybe you could confirm if what I understood is what you meant or if I completely understood it wrong.
So, let’s call the servers M (the production Master), P1 (the PXC node that will act as asynchronous slave from M), P2 and P3 (both PXC nodes), OK?

P1, P2 and P3 are 3 new nodes with data copied from M (or a n async slave of M), the same data on all 3 nodes.
P1 , P2 and P3 are in the same PXC and finally P1 is configured as slave of M.

If you run your specific query on P1 it’s slow, but not on P2 and P3. Just because P1 replicated from M.
If you add a new node to the cluster (P4) and this node performs SST with P1 as donor, then the query is also slow on P4, but still ok on P2 and P3, right ?!!?

This is strange, if what I understood is indeed what’s happening…

My second question is related to the instrumentation that is specific to PXC, I saw in PLAM that now PXC as performance_schema instruments that are not in MySQL Community Edition, neither in Galera… did you try to disable only these (if they are default or enabled) ?

Thank you.

Author

David Ducos

9 years ago

Reply to lefred

Hi Lefred,

About “If you add a new node to the cluster (P4) and this node performs SST with P1 as donor, then the query is also slow on P4, but still ok on P2 and P3, right ?!!?” on my tests there were times when, after a SST, P4 was not slow, and there were times when it was slow. P2 or P3 continue ok.
I’m agree with you, it was strange.

I didn’t try to disable instruments as this platform was well tested with customer workload, and replication was just an step on the migration path.

Admin

Peter Zaitsev

9 years ago

Fred,

Performance Schema was only added to PXC 5.7 From what I understand we’re speaking about full table scan queries which seems to point to the timed table access instrumentation in perfomance schema which is disabled by default exactly as its cost is high…. It still looks way to high 🙂 This is why I’m very curious what is the other details of the configuration – OS, Hardware etc.

Mark Leith

9 years ago

Issues with table IO instrumentation for large scans is what https://dev.mysql.com/worklog/task/?id=7802 (“PERFORMANCE SCHEMA, BATCH TABLE IO”) was implemented for, within MySQL 5.7.

Do you see these issues with PXC 5.7?

Author

David Ducos

9 years ago

Reply to Mark Leith

Hi Mark,
Sorry, I didn’t test it on PXC 5.7

Mark Leith

9 years ago

And to verify if that is the issue, rather than disabling all of performance schema, you could just try disabling the table IO instrument:

UPDATE performance_schema.setup_instruments SET enabled = ‘NO’, timed = ‘NO’ WHERE name = ‘wait/io/table/sql/handler’;

tecogyan

9 years ago

wonderful

Daniël van Eeden

9 years ago

This reminds me of this bug: https://bugs.mysql.com/bug.php?id=76309 So I once noticed the cycle timer taking way to much time. With ‘perf top’ I traced that back to a slow rdtsc instruction. Rebooting the VM fixed it.

Admin

Peter Zaitsev

9 years ago

Daniel, This is very interesting. Are there some known cases where rdtsc would be very slow ?

Daniël van Eeden

9 years ago

Reply to Peter Zaitsev

I only know about the case I encountered, which was probably a VMWare or firmware bug. However more people might have had this issue w/o finding the real root-cause. That’s also why I filed the bug, that should make it easier to discover.

With different kernel versions, types of hardware and virtualization there are many things that can go wrong.

I found this with lots of info about rdtsc: http://oliveryang.net/2015/09/pitfalls-of-TSC-usage/