GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

Issue restarting server under specific conditions

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue restarting server under specific conditions

    Hi,

    We've got an old backup script which runs once a day and does a hot backup of one of our production slaves. It could do with an update using LVM snapshots, but for the moment it works like:
    • STOP SLAVE;
    • <wait period>
    • Server shuts down
    • <wait period>
    • Ensure server is shutdown, then copy datadir to backup server
    • <wait until finished>
    • Start Percona
    • <wait period>
    • Ensure Slave has started again and caught up to master
    Basically, I've had an issue approximately once every 1 - 2 months (30 - 60 restart/backup runs) where Percona refuses to start again because it can't bind to the port 3306 ("Address already in use"). I've checked everything I can think of within the system, ensuring nothing is holding 3306 open (netstat/lsof), and that no other mysqld processes start or have been started over the period. But it refuses to start regardless, unless I reboot the machine completely.
    2014-01-19 22:50:39 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    2014-01-19 22:50:39 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
    2014-01-19 22:50:39 10662 [Note] InnoDB: The InnoDB memory heap is disabled
    2014-01-19 22:50:39 10662 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
    2014-01-19 22:50:39 10662 [Note] InnoDB: Compressed tables use zlib 1.2.3
    2014-01-19 22:50:39 10662 [Note] InnoDB: Using Linux native AIO
    2014-01-19 22:50:39 10662 [Note] InnoDB: Using CPU crc32 instructions
    2014-01-19 22:50:39 10662 [Note] InnoDB: Initializing buffer pool, size = 90.0G
    2014-01-19 22:50:44 10662 [Note] InnoDB: Completed initialization of buffer pool
    2014-01-19 22:50:44 10662 [Note] InnoDB: Highest supported file format is Barracuda.
    2014-01-19 22:50:47 10662 [Note] InnoDB: 128 rollback segment(s) are active.
    2014-01-19 22:50:47 10662 [Note] InnoDB: Waiting for purge to start
    2014-01-19 22:50:47 10662 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.15-63.0 started; log sequence number 2854995318219
    2014-01-19 22:50:47 10662 [Note] RSA private key file not found: /var/lib/mysql/private_key.pem. Some authentication plugins will not work.
    2014-01-19 22:50:47 10662 [Note] RSA public key file not found: /var/lib/mysql/public_key.pem. Some authentication plugins will not work.
    2014-01-19 22:50:47 10662 [Note] Server hostname (bind-address): '*'; port: 3306
    2014-01-19 22:50:47 10662 [Note] IPv6 is available.
    2014-01-19 22:50:47 10662 [Note] - '::' resolves to '::';
    2014-01-19 22:50:47 10662 [Note] Server socket created on IP: '::'.
    2014-01-19 22:50:47 10662 [ERROR] Can't start server: Bind on TCP/IP port: Address already in use
    2014-01-19 22:50:47 10662 [ERROR] Do you already have another mysqld server running on port: 3306 ?
    2014-01-19 22:50:47 10662 [ERROR] Aborting
    Once again, it's fine for months in a row, then randomly shows this behaviour after a large number of restart cycles.


    Any idea?

    Thanks in advance!

  • #2
    Hi,

    Yesterday, I also faced the same issue once. I have checked but there is no any known bug for this. Even this thing is not repeatable every time so its quite difficult to submit bug. Is it possible for you to provide test case when next time it happened? I mean to say collecting all information like output of netstat/lsof/ps -aux | grep mysql etc?

    Comment


    • #3
      Ah! Found it, I was simply checking listening ports (lsof -Pi | grep LISTEN, netstat -plunt), not actually existing connections that may have still been open
      [root@patchdb05 ~] $ lsof -Pi | grep LISTEN
      rpcbind 2452 rpc 8u IPv4 15836 0t0 TCP *:111 (LISTEN)
      rpcbind 2452 rpc 11u IPv6 15841 0t0 TCP *:111 (LISTEN)
      rpc.statd 2470 rpcuser 9u IPv4 15926 0t0 TCP *:5943 (LISTEN)
      rpc.statd 2470 rpcuser 11u IPv6 15934 0t0 TCP *:6432 (LISTEN)
      cupsd 7055 root 6u IPv6 19345 0t0 TCP localhost:631 (LISTEN)
      cupsd 7055 root 7u IPv4 19346 0t0 TCP localhost:631 (LISTEN)
      snmpd 7217 root 8u IPv4 20024 0t0 TCP localhost:199 (LISTEN)
      sshd 7229 root 3u IPv4 20061 0t0 TCP *:22 (LISTEN)
      sshd 7229 root 4u IPv6 20063 0t0 TCP *:22 (LISTEN)
      xinetd 7237 root 5u IPv4 20130 0t0 TCP patchdb05.prod.lan:9204 (LISTEN)
      xinetd 7237 root 8u IPv4 20131 0t0 TCP patchdb05.prod.lan:9200 (LISTEN)
      searchd 8916 root 7u IPv4 22983 0t0 TCP *:3312 (LISTEN)

      [root@patchdb05 ~] $ ps aux | grep mysql
      root 27182 0.0 0.0 103252 856 pts/1 S+ 22:07 0:00 grep mysql
      Nothing useful as of yet!

      However, a plain netstat showed (amongst everything else):

      [root@patchdb05 ~] $ netstat
      Active Internet connections (w/o servers)
      Proto Recv-Q Send-Q Local Address Foreign Address State
      ......
      tcp 0 0 patchdb05.prod:appman-server patchdbload01.prod.l:12329 TIME_WAIT
      tcp 0 0 patchdb05.prod.lan:mysql patchdb05.prod.lan:mysql ESTABLISHED
      tcp 0 0 patchdb05.prod.:jboss-iiop patchdb05.prod.lan:gearman TIME_WAIT
      tcp 0 0 patchdb05.prod.lan:ssh patchext01.prod.lan:13740 ESTABLISHED
      .......

      annnnd here it is:

      [root@patchdb05 ~] $ lsof -l -P | grep 3306
      search 9557 495 40u IPv4 38329795 0t0 TCP patchdb05.prod.lan:3288->patchdb05.prof.lan:3306 (ESTABLISHED)
      So, I'm guessing, my search process (an indexer client that runs as part of sphinx search) must have frozen during mysqld restart and didn't kill itself or time out. Killing the sphinx process and waiting a short period for the connection to timeout and Percona started again straight away.

      Hope this helps someone in future! I'll definitely remember it as it wasn't an obvious issue this time around!

      Comment


      • #4
        Thanks for sharing this, It will be helpful for others too.

        Comment

        Working...
        X