Three ways that the poor man’s profiler can hurt MySQL

Over the last few years, Domas’s technique of using GDB as a profiler has become a key tool in helping us analyze MySQL when customers are having trouble. We have our own implementation of it in Percona Toolkit (pt-pmp) and we gather GDB backtraces from pt-stalk and pt-collect.

Although it’s helped us figure out a lot of problems, sometimes it doesn’t go well. Getting GDB backtraces is pretty intrusive. Here are three things that I’ve witnessed:

  1. The server freezes for the duration of the process. This is the most obvious impact on the running server: GDB forklifts the process and gets a stack trace from every thread, then lets it go on working. But this can take a while. It’s usually a couple of seconds, but on big servers with a lot of memory and many threads, it can take much longer (I’ve seen tens of seconds, but heard reports of minutes).
  2. The server can crash. I haven’t seen this myself, but I’ve heard reports from others.
  3. The server can be left in an unstable state. Sometimes when GDB detaches, the process’s state isn’t quite as it was before. I have rarely seen this, but the other day I had a customer experience a very slow-running server that was using tons of CPU time and lots of system CPU, and exhibiting the classic signs of InnoDB kernel_mutex contention. I was not able to find anything wrong, and was still trying to determine whether the sudden slowness was due to some cause such as an increase in traffic or just InnoDB having trouble, when the customer Skyped me to say that they’d restarted the server and it resolved the problems. This server had previously been analyzed with GDB around the time the problems began.

So although it’s extremely useful, it does have risks.

Domas, and others at Facebook and beyond, have developed a variety of tools that can help them get stack traces less intrusively. I think we need to investigate some of those and see whether there is something that would work well a broad variety of cases for more users, and how much less intrusive they can be.

Of course, I’m really waiting for MySQL 5.6 and the improved Performance Schema that will be included with it. A built-in solution will be much better than a technique such as using GDB. We’ll probably never stop using GDB completely, but hopefully we’ll be able to use such tools much less frequently after 5.6 is released.

Share this post

Comments (19)

  • Dave Juntgen Reply

    Baron –

    We’ve been searching for answers to what really happend and we’re being lead to ptrace(). We believe that ptrace() set and left the CPU debug registries turn on,, or something like that to were each thread was running in a debug state (still kind of a mystery). I haven’t been able to find anything that can monitor these registers to show their state… This may explain why we saw so much time being spent in the kernel_mutex.

    man ptrace() and read the first line in the description, “… examine and change its core image and registers” – scary!!

    Other info on ptrace()

    Do you know if dtrace() has been ported to linux?

    December 2, 2011 at 10:41 am
  • Baron Schwartz Reply


    See also by the way, Vadim and I discussed and he did some followup work. kernel_mutex appears to be the first bottleneck you will have if you magnify your current load greatly… or turn down CPU speed a lot.

    I will read more about the links you pasted. I confess I don’t know enough about this to make me feel smart at all.

    The closest thing to DTrace on Linux is SystemTap. We have some earlier articles on that. More recently, Brendan Gregg posted an interesting blog article on it, with a lot of good dialogue in the comments:

    December 2, 2011 at 11:13 am
  • Jeff Schroeder Reply

    The real “long term fix” for tracing apps like this on LInux is uprobes[1]. It hasn’t been accepted just yet, but is almost ready for merging upstream. It will then take awhile, but uprobes allows low/no impact tracing overhead and doesn’t futz with the parent process like ptrace (used by gdb) does. Uprobes will be functionally equivalent to the userspace tracing aspect of DTrace.


    December 2, 2011 at 11:31 am
  • Raghavendra Reply

    Regarding point #1 in the post, there seems to be a similar bug open — — “attaching to percona-server with gdb disconnects clients ” .

    @Dave, If the process ends up in that state, it will be showing as ‘T’ in ps output IMO, same thing can happen with strace too.

    @Jeff, Regarding uprobes: there seems to be long wait till it gets merged even after long patch trail.

    However, the good news is that (and another reason why uprobe may be getting delayed), ptrace has been improved a lot precisely to address issues mentioned above. The changes seem to have gone into linux 3.1 (, but since kernel is quite new, will be a while since enterprise kernels get those or Redhat backports them to RHEL 6 or so.

    Now, ltrace — ltracing mysqld will kill it instantly 😀 .. Well, the fault is with ltrace in this case. Ltracing a multi threaded process (try with your firefox) kills it instantly (with SIGTRAP), the fix for this is still not released for ltrace afaik.

    December 2, 2011 at 12:45 pm
  • Mark Leith Reply

    Actually the closest thing to DTrace on Linux is DTrace on Linux… 🙂

    December 2, 2011 at 1:19 pm
  • Baron Schwartz Reply

    OK, I mean the closest thing that people actually know about and use 🙂 I was aware of this but AFAIK it’s kind of stagnant/abandoned, no? Don’t answer that — I will go find out.

    December 2, 2011 at 1:28 pm
  • Baron Schwartz Reply

    OK, I went and found out. I won’t kill the suspense for other readers.

    December 2, 2011 at 1:30 pm
  • Raghavendra Reply

    @Mark, Good thing. However, it looks like it is only for Oracle’s Unbreakable kernel for now. I see public RPMS for kernel here — but I wonder if RPMs for dtrace-* are available.

    @Baron, Thanks for mentioning performance schema. I read a bit on it here — . They look good.

    December 2, 2011 at 2:10 pm
  • Mark leith Reply

    I admit I haven’t used the DTrace port yet.. I can attest to the virtues of performance schema though (certainly in 5.6)! 🙂

    December 2, 2011 at 2:44 pm
  • Alexey Kopytov Reply

    Adam Leventhal on the current state of the DTrace port to Linux:

    December 2, 2011 at 8:19 pm
  • Kristian Nielsen Reply

    Some time ago I hacked together a proof-of-concept for obtaining pmp stack
    traces directly using libunwind, without GCC:

    This is _much_ faster than using GCC (order of magnitude), and hence much less
    intrusive on the running server. I think it can even be speeded up a lot more
    by using /proc/$pid/mem to access the target process; currently it uses a
    syscall (ptrace()) for each memory word read.

    I have not had the time to play more with it, but it would be cool if someone
    could use it as a basis for a more polished tool. I think this is the right
    way forward; GCC needs to do a lot more when attaching a process than just
    obtain the stack traces. Or maybe Facebook already does this?

    Anyway, I agree that there will always be some risk for using something like
    this on a production server; ptracing() a process is never free. Still, with a
    faster tool the stack traces can be obtained (and the server process kept
    stopped) for only sub-second periods at a time.

    One reason for ptrace() / PMP to make the server unstable is that they
    interrupt “slow” system calls such as write(). The program much catch the
    EINTR error (or partial writes) and retry the system call; else it is a bug
    that can be triggered also in other cases, and which we should fix. Note
    however that this can only happen in the “slow” cases such as socket
    operations; “fast” cases such as file I/O are not affected by ptrace() or
    other signals.

    December 3, 2011 at 4:49 am
  • Baron Schwartz Reply

    Kristian, yes – MarkC pointed me to your tool a while ago. The thing that always blocks me from using it is that it can’t be “yum install”-ed. It seems lame, but it’s actually kind of where I end up with most projects: if it isn’t a script I can download and run with no further ado, or something I can ask the sysadmins to yum/apt install, then the barrier to using it is much higher. What I think I need for situations like this is statically linked binaries that I can wget, chmod, and run.

    December 4, 2011 at 6:37 am
  • Peter Zaitsev Reply


    Regarding Server Slow down mystery I would point out I’ve seen number of cases when MySQL would have poor performance/scalability which would be fixed by restarting MySQL Server. Looks like it can regress some times. It is possible the gdb is at fault in some of such cases but surely not all of them.

    December 4, 2011 at 9:52 am
  • Baron Schwartz Reply

    Yes. It looked like the server had taken a while to warm up and I didn’t want to just try a restart randomly.

    December 5, 2011 at 4:23 am
  • Kristian Nielsen Reply

    Baron: It would be possible to re-furbish my tool as a single perl-script. It would however require libunwind and Inline::Perl (as well as binutils): apt-get install libunwind7-dev libinline-perl binutils. Would this be better?

    Or alternative, a single C program that is statically linked is a possibility, as you suggest.

    December 5, 2011 at 7:42 am
  • Baron Schwartz Reply

    I think that the statically linked binary approach would be preferable, to make it simpler.

    December 6, 2011 at 6:24 am
  • Frank Ch. Eigler Reply

    For what it’s worth, once you get it going, recent systemtap’s on-the-fly kernel+userspace backtracing is probably top-notch in terms of performance, undisruptiveness, and correctness. If you haven’t tried it lately, it may be worth your time.

    December 10, 2011 at 9:27 pm
  • Aurimas Mikalauskas Reply

    Here’s what I do when gdb or strace leaves a process in The “T” (stopped) state:

    Works like a charm every time.

    December 22, 2011 at 7:45 am
  • Mathew Reply

    Open PHP-MyProfiler uses the MySQL Query Profiler, and is excellent for people working on the LAMP stack in shared hosting environments, where installation of non-standard software is usually impossible.

    January 3, 2012 at 3:33 am

Leave a Reply