October 20, 2014

Debugging sleeping connections with MySQL

Have you ever seen connection in the SHOW PROCESSLIST output which is in “Sleep” state for a long time and you have no idea why this would happen ?

I see if frequently with web applications and it is often indication of trouble. Not only it means you may run out of MySQL connections quicker than you expected but it also frequently indicates serious problems in the application. If you do not use persistent connections and you have connection in Sleep stage for 600 seconds what could it be ? It may mean some of your pages take that long to generate (or might be the code simply gets into the tight loop and page never gets generated) it also could mean some of external Web Services are slow or not available and you’re not dealing with timeouts properly. Or may be you have several connections to MySQL server and right now running query which takes that long ? In any case it is something frequently worth looking at.

First task is to find to which process the connection belongs. Using different user names for different application is a good practice however it will not tell you which of apache children is handling request in question. If you just want to fix it, ie by restarting apache it is enough but if you want to figure our why it is happening you need more info.

You my notice in the “Host” filed of SHOW PROCESSLIST output not only host but also port is specified, showing you something like
“192.168.1.70:58555″ This port can be used to identify the process which owns connection in question:

As you can see in this case we can find php-cgi is holding connection in question (this is lighttpd based system with fastcgi)

Now you know the process and you can use your favorite tools to check what that process is doing.

Using same netstat command and filtering on the PID we can find which connections does this process have. Here you can see it has couple of memcached connections. Few MySQL connections (to the same host, which if usually bad idea) and connection to some external web server.

You can use strace -p to see what host is doing, it often gives a clue. In this case I for example found the process is stuck in pool() system call reading from network. Using netstat can give you an idea what it can be but if you do not like guessing you can use gdb -p . It will not print you exact line of code in PHP which is running but can give you some good ideas – for example in this case I could find stack trace originated from php stream functions not from libmysql or memcache.so, which means it is not MySQL or memcache connections leaving last candidate as the only choice. I also could see some of the variables in GDB “bt” command output which also hinted what could be the problem.

By the way does anyone know any debugger which can connect to PHP process or apache with mod_php and provide backtrace in PHP terms not the one for zend engine ? That would be pretty cool.

Yet another great tool which you can use is server-status if you’re running apache. This way you will see the URL which that process is processing and so get few more hints on what may be happening or even get repeatable example in some cases.

The tools I mentioned regarding figuring our what is happening with the process are not only helpful to debug sleeping connections with MySQL but many other cases when you see web application locking up or starting to runs in the tight loop consuming too much CPU time.

If you know any other tools which could be helpful in this regard would appreciate your comments. There might be some smarter tools out where for production tracing.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. pgl says:

    Hi,

    Just a quick note to say that this post is a little broken – the closing code tag seems to be using the wrong kind of slash.

    Feel free to remove this comment if you fix it.

    cheers,

    – pgl

  2. Steve says:

    Hey peoples.

    Anyone interested in this feature I have requested?

    http://bugs.mysql.com/bug.php?id=60476

  3. I agree, a zend-free live debugger would really be interesting to have. Although if the bug happens frequently and you don’t necessarily have to attach to a specific already running instance there’s still the possibility to use a very ugly hack:

    Find a way to store an affected pid somewhere, ie memcached. Add an include file that registers a tick function that will be called every couple of ticks (depending on workload values between 100 and 10k seem fine). This function checks memcached if getmypid() is affected and if so unregisters itself and registers another function that regularly dumps debug_backtrace() somewhere. Using timestamps the gained data can really be helpful. The same method could also be used to selectively call apd etc. at any point during the script execution.

    As I said, it’s terribly ugly but helpful if one needs to troubleshoot something right away without means or time to install a proper tool or add debug output everywhere.

  4. Alexey says:

    > If you know any other tools which could be helpful in this regard would appreciate your comments. There might be some smarter tools out where for production tracing.

    A simple and good idea is to hack either Apache request handler or fasctgi server to display request information in process title (via setproctitle call). Sometimes even knowing which script is running may be helpful.

  5. Anonymous Coward says:

    How do you get mysql to show you the port it is connected cause on all my boxes it shows only localhost with no port…

  6. peter says:

    Well, in this case you’re connecting from the same host via unix socket. In this case such information is unavailable.

    If you need to debug something you can use 127.0.0.1 so connection will happen via TCP/IP

  7. Anonymous Coward says:

    nope, im connecting via tcp

    mysqld 22241 mysql 11u IPv4 337587343 TCP *:mysql (LISTEN)

    Even so in phpmyadmin or even in cli i see only localhost with no port…

  8. Anonymous Coward says:

    ok i feel kinda stupid now… it was listening in both socket and tcp so the webapp prefered the socket, ill try to change it and see what’s causing the sleep processes

    Thanks for info and keep up the good work

  9. peter says:

    Ok. Good you’ve figured that you :)

  10. Anonymous Coward says:

    Me again…

    Can you please give more detail on using strace/gdb?
    i have few mysql that sleep for over 1000 and..
    netstat on port gives me

    # netstat -ntp | grep :51646
    tcp 0 0 127.0.0.1:51646 127.0.0.1:3306 ESTABLISHED 32034/apache2
    tcp 0 0 127.0.0.1:3306 127.0.0.1:51646 ESTABLISHED 8512/mysqld

    strace on pid gives me

    # strace -p 32034
    Process 32034 attached – interrupt to quit
    flock(24, LOCK_EX

    gdb shows me that it loads some libraries and that’s about it
    application wasn’t written by me and im no php/mysql guru, just a part-time sysadmin that wants to solve this problem as usually in 1-2 days mysql conn. limit gets touched and i have to restart apache to fix it

  11. peter says:

    Sure. Paste GDB backtrace or you can create forum topic and discuss it where.

    Basically it is waiting on flock() if this is PHP I’d expect it to be session – PHP locks session file for whole script execution duration which may cause problems with a lot of requests from single agent.

    It may be something in your own code as well. you can use “lsof” to find out what is file #24 in this process

  12. Guruprasad says:

    Hi,
    I notice that my apache process is in accept call, i do notice that there are 3 connections to MySQL:3306 and show process list on MySQL indicates they are in sleep state.
    Is there a way to findout if these connections will be reused by this apache child process at all?
    Or are these connections more like resource leak?


    Thanks,
    Gp

  13. Guruprasad says:

    Just in case if some one is tryin this on BSD box,
    netstat -ntp might not work, u can use lsof instead like,

    lsof -i -P | grep OR

  14. peter says:

    Guruprasad,

    I’d check if these connections are associated with open file descriptors. If they are it is likely these are persistent connections which may be reused. If not it is likely these connections are still closing by kernel, sometimes it takes longer time for some reason.

  15. Tom Pittlik says:

    I get this once in a blue moon – instead of tracking down what’s causing the problem (since it’s so rare) i made a simple cronjob that checks if dbconnect() fails for mysql_errno() reason 1040 (too many connections). If it does i run shell_exec(“httpd restart”) and the sleeping connections are gone.

  16. Russell G. says:

    Are you sure that sleeping threads are a problem? I always thought that a thread showing a Command value of “Sleep” just meant that it had finished whatever request it was working on and was sitting idle, waiting for another request to be assigned to it. I believe the Time value for those threads is how long it’s been sitting idle. The MySQL docs seem to say the same thing:

    “Sleep – The thread is waiting for the client to send a new statement to it.”

    From:
    http://dev.mysql.com/doc/refman/5.1/en/thread-commands.html

    Even the program “mytop”, which displays the current commands being processed by the server, ignores threads with the Sleep command, I believe for the same reason.

    I can see how a thread with a Command of “Query” could be a problem if it has a large Time value, but I don’t think Sleep commands are a problem. Or have I just completely misunderstood things?

    Russell G.

  17. Richard says:

    Russell, sleeping threads are a problem. They occur when a connection is requested, but is not closed- the connection is still reserved by that thread and “waiting for the client to send a new statement to it”. If the sleep time of a thread exceeds the wait_timeout variable for mysql, the connection gets closed and recycled. Until then, it is locked.

  18. Mike says:

    A better way than restarting your server or whatever is to set a MySQL value that does not allow connections to last over 30 min, or you can have a cron script that checks the db and kills really old sleeping connections.

    my.cnf:
    wait_timeout=180 # = 30min

    php cron script:

    #!/usr/bin/php -q
    1000 && $row['Command'] == “Sleep”) && ($row['User'] != “Root”))
    {
    $killed = 0;
    mysql_query(“kill “.$row['Id']) or ($killed = 1);
    if ($killed = 1)
    {
    echo
    “Killed process number “.$row['Id'].”, with a User of “.$row['User']
    .” in database “.$row['db'].”, which had a time of “.$row['Time']
    .”, and a command of “.$row['Command'].”\r\n\r\n”;
    }
    }

    }

    @mysql_close($link);

    ?>

  19. Mike says:

    what the hell, here’s the first part of the php script that this thing cut off due to bad html stripping it looks like:

    $link = mysql_connect (“localhost”, “root”, “password”);

    $query = “show processlist;”;

    $result = mysql_query($query) or die (“failed $query”);

    while ($row = mysql_fetch_array($result))
    {
    if (($row['Time'] >

  20. Tom Pittlik says:

    I was seeing this occurring due to every apache connection opening a mysql connection (as is common in a PHP driven website). If there is a bottleneck, a vicious cycle occurs:

    1. apache requests increase
    2. mysql queries increase
    3. mysql slow queries increase
    4. apache connections start to wait on slow mysql queries
    5. apache connections take longer to close
    6. more apache connections are opened
    7. with each apache request a new sleeping mysql connection is made
    8. mysql & apache reach max_connections
    9. server slows to a crawl

    Solution:
    – Reduce the time a mysql connection is open during script execution – don’t open a mysql connection in the header and close it in the footer. One approach is to create a query wrapper function and open a connection to the database only when the first query occurs – e.g:
    function query ($query) {
    if (!mysql_connection) mysql_connect();
    return mysql_query();
    }
    – Reduce slow queries (always a good strategy!)

  21. Tom Pittlik says:

    I should also add – increasing the max_connections (mysql) and MaxClients (apache) variables to high values is not a solution! In fact it can just prolong the bottleneck as the server takes longer to crash.

  22. PaulW says:

    Great info, thanks for the post, I’m off to try and sort our sleeping connections now.

    Not sure if it’s just me but on Mac with Safari and Chrome the code section is a bit screwy. The closing tegs don’t seem to work. After the line “This port can be used to identify the process which owns connection in question:” the rest of the article is in a code block

  23. Steve Jackson says:

    A popular way to show what is running a query is to attach tracking info in a comment in the SQL.

    However, as it is at the moment, tracking info in an SQL comment is useless in sleep state as the INFO column in processlist is NULL when connection is sleeping (understandable as there is actually no query executing)

    Therefore, I would like to see a “Last Query” in the processlist, which simply maintains the last query name that was executed on that connection. Then tracking info in an SQL comment becomes much more useful in these “sleeping thread”, and other scenarios…

    This would be a VERY powerful feature if you ask me.

    I have seen it mentioned before, but it was closed as they did not consider it to be worthy of their attention…

    http://forge.mysql.com/worklog/task.php?id=384

    Cheers

    //Steve

  24. Steve says:

    Hey peoples.

    Anyone interested in this feature I have requested?

    http://bugs.mysql.com/bug.php?id=60476

  25. How long does a sleep have to stay for in order for me to worry?

  26. Naveen says:

    We are having a problem with Sleeping queries slowing down our servers – its become quiet frequent now and we are trying to find out why – the problem in tracking down is that the system is a Social Networking Engine Software (Elgg) – we do not know the very internals of the software. We tried to use the approach above, however our mysql and web servers are on 2 separate servers – this poses additional problems as we cannot exactly locate from which ip the requests are coming to the mysql server – it will always be the web server.
    Also when we try the netstat command on the mysql server to see the process, it just displays mysqld as Established – is there a better way to track where these queries are originating for such a case?
    currently we are waiting for the problem to occur again and then dynamically enable Mysql general query log to see what queries are being thrown to the server – we have isolated this application to a single box
    any suggestions on how to tackle this situation ?

    thank you for your valuable posts

  27. vishnu rao says:

    hello everyone,

    in our application server, after we start a transaction starts – we make sure we execute the following query using the connection object obtained from our pool. [i.e. the following query is the first query to be executed]

    “select CONNECTION_ID() ”
    refer to http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_connection-id

    This gives us the thread id which mysql would assign to this transaction.

    we then log the following details – thread id & name of the application module which wants to start the db transaction.

    when we see threads in sleep state(from show processlist) we note down the id & immediately look at our logs to identify the application module which has the corresponding id & debug the module code to see what it does. in most cases – we found our application thread starts of a database transaction and does a few queries and then makes a call to some other external system like a MQ server and gets stuck, & since the transaction is still active(i.e. not commited/rollbacked) – the thread in mysql goes to sleep state.

    hope this helps

    -vishnu rao
    @flipkart.com

  28. Jible says:

    A popular way to show what is running a query is to attach tracking info in a comment in the SQL.

    However, as it is at the moment, tracking info in an SQL comment is useless in sleep state as the INFO column in processlist is NULL when connection is sleeping (understandable as there is actually no query executing)

    Therefore, I would like to see a “Last Query” in the processlist, which simply maintains the last query name that was executed on that connection. Then tracking info in an SQL comment becomes much more useful in these “sleeping thread”, and other scenarios…

  29. Steve Jackson says:

    Jible. Why did you just re-publish part of my earlier post?

Speak Your Mind

*