Announcement

Announcement Module
Collapse
No announcement yet.

Intermintent slow MySQL Connections

Page Title Module
Move Remove Collapse
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intermintent slow MySQL Connections

    #1
    Hello,

    We keep getting errors like this on our PHP sites.


    Can't connect to MySQL server on '192.168.100.85' (4)


    web is the web server 192.168.1.116
    mysql is the mysql server 192.168.100.85

    So I built a script on the web server to make 10,000 mysql connections in a row and time them. It will "sometimes" produce the error. Most times everything runs fine and there is an average time of 5-10 ms for the call to mysql_connect

    Some serious Googling showed that the (4) error is due to the connection being cut off by the timeout.


    web# grep mysql.connect_timeout /etc/php.ini mysql.connect_timeout = 1


    So I modified the timeout in the script to 30 to see if it would work. The connection errors went away, but occasionally the connection would take 5 seconds.

    After more Googling and some tcpdump I found that occasionally when the MySQL server is doing its reverse lookup of the IP the DNS server would fail to respond. So after 5 seconds it would give up and allow the connection.

    I have since added skip-name-resolve to the server. But this did not solve the problem.

    Now my test would show slow connections taking 3-4.5 seconds instead of the set 5 with the DNS issue.

    So I ran my tests again with tcpdump running on both ends.


    web# tcpdump -n -s 65535 -w web3-$(date +"%F_%H-%M-%S").pcap host 192.168.100.85 and port 3306 mysql# tcpdump -n -s 65535 -w master1-$(date +"%F_%H-%M-%S").pcap host 192.168.1.116 and port 3306


    Here is the packets from the relevant slow connection.

    Packets from web:


    No. Time Source Destination Protocol Info 13312 2010-10-13 10:01:01.201965 192.168.1.116 192.168.100.85 TCP 41560 > mysql [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=904829062 TSER=0 WS=2 13316 2010-10-13 10:01:04.201577 192.168.1.116 192.168.100.85 TCP 41560 > mysql [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=904832062 TSER=0 WS=2 13317 2010-10-13 10:01:04.204837 192.168.100.85 192.168.1.116 TCP mysql > 41560 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1380 TSV=562240314 TSER=904832062 WS=7 13318 2010-10-13 10:01:04.204853 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSV=904832065 TSER=562240314 13319 2010-10-13 10:01:04.205886 192.168.100.85 192.168.1.116 MySQL Server Greeting proto=10 version=5.0.77-log 13320 2010-10-13 10:01:04.205899 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=1 Ack=61 Win=5840 Len=0 TSV=904832066 TSER=562240316 13321 2010-10-13 10:01:04.205959 192.168.1.116 192.168.100.85 MySQL Login Request userexample 13322 2010-10-13 10:01:04.206800 192.168.100.85 192.168.1.116 TCP mysql > 41560 [ACK] Seq=61 Ack=71 Win=5888 Len=0 TSV=562240317 TSER=904832066 13323 2010-10-13 10:01:04.206874 192.168.100.85 192.168.1.116 MySQL Response OK 13324 2010-10-13 10:01:04.208823 192.168.1.116 192.168.100.85 MySQL Request Quit 13325 2010-10-13 10:01:04.208839 192.168.1.116 192.168.100.85 TCP 41560 > mysql [FIN, ACK] Seq=76 Ack=72 Win=5840 Len=0 TSV=904832069 TSER=562240317 13326 2010-10-13 10:01:04.210422 192.168.100.85 192.168.1.116 TCP mysql > 41560 [FIN, ACK] Seq=72 Ack=76 Win=5888 Len=0 TSV=562240320 TSER=904832069 13327 2010-10-13 10:01:04.210437 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=77 Ack=73 Win=5840 Len=0 TSV=904832071 TSER=562240320 13328 2010-10-13 10:01:04.210567 192.168.100.85 192.168.1.116 TCP mysql > 41560 [ACK] Seq=73 Ack=77 Win=5888 Len=0 TSV=562240320 TSER=904832069


    Packets from mysql:


    No. Time Source Destination Protocol Info 13315 2010-10-13 10:01:04.204817 192.168.1.116 192.168.100.85 TCP 41560 > mysql [SYN] Seq=0 Win=5840 Len=0 MSS=1380 TSV=904832062 TSER=0 WS=2 13316 2010-10-13 10:01:04.204836 192.168.100.85 192.168.1.116 TCP mysql > 41560 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSV=562240314 TSER=904832062 WS=7 13317 2010-10-13 10:01:04.206611 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSV=904832065 TSER=562240314 13318 2010-10-13 10:01:04.206808 192.168.100.85 192.168.1.116 MySQL Server Greeting proto=10 version=5.0.77-log 13319 2010-10-13 10:01:04.207658 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=1 Ack=61 Win=5840 Len=0 TSV=904832066 TSER=562240316 13320 2010-10-13 10:01:04.207815 192.168.1.116 192.168.100.85 MySQL Login Request user=example 13321 2010-10-13 10:01:04.207872 192.168.100.85 192.168.1.116 TCP mysql > 41560 [ACK] Seq=61 Ack=71 Win=5888 Len=0 TSV=562240317 TSER=904832066 13322 2010-10-13 10:01:04.207910 192.168.100.85 192.168.1.116 MySQL Response OK 13323 2010-10-13 10:01:04.210817 192.168.1.116 192.168.100.85 MySQL Request Quit 13324 2010-10-13 10:01:04.210849 192.168.100.85 192.168.1.116 TCP mysql > 41560 [FIN, ACK] Seq=72 Ack=76 Win=5888 Len=0 TSV=562240320 TSER=904832069 13325 2010-10-13 10:01:04.211632 192.168.1.116 192.168.100.85 TCP 41560 > mysql [FIN, ACK] Seq=76 Ack=72 Win=5840 Len=0 TSV=904832069 TSER=562240317 13326 2010-10-13 10:01:04.211640 192.168.100.85 192.168.1.116 TCP mysql > 41560 [ACK] Seq=73 Ack=77 Win=5888 Len=0 TSV=562240320 TSER=904832069 13327 2010-10-13 10:01:04.213243 192.168.1.116 192.168.100.85 TCP 41560 > mysql [ACK] Seq=77 Ack=73 Win=5840 Len=0 TSV=904832071 TSER=562240320


    As you can see, web re-sent the initial ACK after 3 seconds of no response. But mysql never even saw that initial packet.

    I also tried to run a ping flood to check for dropped packets. If you leave it running long enough you will get dropped packets.


    web3# ping -f 192.168.100.85PING 192.168.100.85 (192.168.100.85) 56(84) bytes of data.............................................. .......................--- 192.168.100.85 ping statistics ---38253 packets transmitted, 38185 received, 0% packet loss, time 460851msrtt min/avg/max/mdev = 0.880/3.430/66.904/8.015 ms, pipe 7, ipg/ewma 12.047/1.378 ms


    This issue is very intermittent but is happening all the time. I do understand that simply increasing our timeout would help greatly, but I'd rather have the connects always be fast and never have a 3-4 second delay in serving a page.

    I contacted our hosting provider and they say its likely a result of the bad interaction between Nagle's Algorithm and Delayed ACK. <http://www.stuartcheshire.org/papers/NagleDelayedAck/>

    It seems to me like packets are being dropped. Any ideas on a way I could better prove it to the hosting guys? Ping dropping 60 out of 38,000 packets does seem kind of minor. Is this something I should just live with?

    Thank you for your time in looking into this!

  • #2
    Did you check if either the conntrack or kernel cookies are not the issue here ? On mysql server ofc )

    Comment


    • #3
      conntrack is part of iptables correct? If so, iptables is off on both servers and shouldn't be a problem.

      I don't even know where to begin checking if "kernel cookies" is an issue. Can you point me in the right direction?

      Comment


      • #4
        conntrack is a kernel module and can be related to iptables but is not 'part' of it.
        Do lsmod and look if it's loaded - if yes you can try to grow conntrack limits or rmmod it's modules. Google for details )

        If you have redhat or fedora search in /var/log/messages, and if this is debian/ubuntu - try look in /var/log/syslog for string like:
        "kernel: possible SYN flooding on port" and it would be most likely 3306 )
        So if you found some try to do:
        echo "0" > /proc/sys/net/ipv4/tcp_syncookies

        sbarre wrote on Mon, 25 October 2010 19:21

        conntrack is part of iptables correct? If so, iptables is off on both servers and shouldn't be a problem.

        I don't even know where to begin checking if "kernel cookies" is an issue. Can you point me in the right direction?

        Comment


        • #5
          This is a RHEL 5 server

          lsmod does not show conntrack as being loaded.

          My /var/log/messages is almost empty and has no messages about SYN floods.

          So now I can say we checked that conntrack and kernel cookies are not the issues here.

          Thanks for the help. Any other ideas?

          I also added some logging to one of our sites. Now every time it runs mysqli->real_connect I time it and log it.

          in the last 4 days there have been 48,000 connections from this site to the db. 49 of which took longer than 1 second. most are 3-4 seconds but there are two that are 9 seconds (

          Comment


          • #6
            Did you check for any evidence in mysql error log ?

            Than maybe it is some network related problem ? ifconfig RX/TX packets errors, dropped ?
            Also you might need to look at application server for similiar conntrack, cookies, network problems...

            Comment


            • #7
              I got a hold of someone else at the hosting company. They seem to think our firewall it underpowered and is dropping packets during peak times. I will keep working with them to get this resolved.

              Thanks for your help!

              Comment


              • #8
                Sorry i can't help you because i does not understand your problem.But i hope some one will probably help you soon.

                Comment

                Working...
                X