October 21, 2014

How to obtain the “LES” (Last Executed Statement) from an Optimized Core Dump?

Ever ran into a situation where you saw “some important variable you really needed to know about=<optimized out>” while debugging? Let’s look at an example:

It happens to all of us. This “issue” is seen when using optimized (release) binaries: debug symbols have been stripped out. In non-optimized binaries, the query would (in most cases) have shown directly in the backtrace. Sidenote: “in most cases”: sometimes you have to swap threads (or use: ‘thread apply all bt’ or ‘thread <nr>’ then ‘bt’) before you can view the actual crashing statement: gdb may have incorrectly analyzed which thread caused the crash. Inspecting the error log (which also contains a stack trace) may also help in such cases.

So… maybe you were testing a C program with three threads executing highly concurrent DML statements (where each thread executes one particular type of statement) and you are stuck as to which statement is causing your program to crash.. Or you just ran your latest and greatest RQG grammar and found this nifty crashing bug, only to find out you’ll be stuck with hours or even days of “grammar simplification”..

Not so! Read on…

If we did know the “LES” – or Last Executed Statement – in our RQG case for example, we could bring up the server with a copy of the test run’s datadir and try to re-execute the crashing statement to see if it crashes again – a plausible thing to happen. So, how do we make this happen if the query is not directly visible from the stack trace?

Let’s fire up gdb and see what we find. (We will assume the source code used for compiling is available at the same location as where it was at compilation time)

First we inspect the mysql_execute_command frame:

Looking at line 3138 (or by inspecting the stack trace above), we may expect the query to be present in the thd variable. Looking further, we can see that thd is being referenced in all frames, and that it is passed to handle_select. Let’s see what it contains:

Ok, so it’s a pointer to an array’s memory address. Using * before the address allows us to see the contents of the thd array:

Great! The query is showing in there: “INSERT INTO testdb_N . t1_temp1_N SELECT * FROM test.table10_int_autoinc”.

Instead of using a numeric memory address, we could also use:

Gdb knows the variable type, so we don’t need the typecast to “(THD *)”.

Or, better yet, now that we know where the query string is hiding (query_string > string > str), we can use:

In the future.

Conclusion: the Last Executed Statement was found in an optimized core dump by checking the contents of the thd array in the mysql_execute_command frame. By simply examining “surrounding” variables we’ve been able to find the information we needed. You can try this technique yourself the next time you need to find out exactly which query is causing you trouble!

About Roel Van de Paar

Roel leads Percona's QA team. Before coming to Percona, he contributed significantly to the QA infrastructure at Oracle. Roel has a varied background in IT, backed up by many industry leading certifications. He also enjoys time with God, his wife and 4 children, or heading into nature.

Comments

  1. Last Executed statement is helpful for many crashes troubleshooting. I would point out though it will not always be enough to repeat the crash. Crashes which are repeatable by running statement alone are “simple” and often caught easily so you do not run into them very often. Many problems in production are either context dependent – depend on what previous queries were executed or concurrency dependent – depend on what is being executed in parallel – it could be heap corruption, wrong mutex usage etc.

  2. Can you expect to get core dumps in production when the InnoDB buffer pool is huge? That will lead to huge core dump files. Perhaps it is time to figure out how to get core dumps that exclude the buffer pool.

  3. Mark,

    Did you look into “Google Breakpad crash reporting system” from
    http://darnaut.blogspot.com/2012/06/changes-in-twitter-mysql-5523t6.html ?
    Is this something of interest?

  4. @ Peter – Agreed, the last executed statement will often not be enough to repeat the crash. It is a good first test though, and it is simple to execute. If it works, one can simply dump or copy the db and use the crashing statement for a bug report. It also may provide developers with a good indication on where to start looking for the root cause. I found that in general many crashes can be reduced to a single statement + corresponding DDL & data setup. If the crash is however caused in production as the result of multiple threads interacting, or is caused by multiple successive queries, then indeed finding the root cause is likely not going to be straightforward. Besides a developer studying the coredump in those cases, it may also pay off to see if any application logs are available.

    @ Mark – Good point (and thanks to Vadim for the link). One idea, at least if one is after the Last Executed Statement, is to also check the error log. Often times it will contain at least part of the crashing query.

  5. Roel,

    Another question by the way… when MySQL crashes it often will print query into error log even when core file is not enabled. Are there any cases when that information would not be available while core file will contain the query ?

  6. Vadim – thanks for pointing that out to me.

  7. @ Peter – I believe so. You may see things like “Query (0x7f90ab293dc5): is an invalid pointer” or “Query (0): is an invalid pointer” in the error log when MySQL was unable to figure out which query caused the crash. However, it would be fair enough to assume that indiviudal thread variables recorded in the coredump still contain the queries which were being executed. I usually tend to check one or the other, depending on whatever comes to mind first. I have also found that in some cases MySQL and gdb disagree on what caused the crash (different main backtrace).

  8. sbester says:

    Peter, we still have cases where the query is not printed in error log. I haven’t investigated why (always works on my machines),. And this is after the fix for:

    http://bugs.mysql.com/bug.php?id=51817
    (incorrect assumption: thd->query at 0x2ab2a8360360 is an invalid pointer)

  9. Roel Van de Paar says:
  10. Roel Van de Paar says:

Speak Your Mind

*