Emergency

InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server

    We have a 3 node percona cluster , all on ubuntu 16.04
    mysql Ver 14.14 Distrib 5.7.19-17, for debian-linux-gnu (x86_64) using 6.3

    Last night one node crashed and refuses to rejoin the cluster. When doing a wsrep-recover the log file says the following at the start:

    2017-12-13T09:16:27.759954Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 1043375327479
    2017-12-13T09:16:27.760085Z 0 [Note] InnoDB: Database was not shutdown normally!
    2017-12-13T09:16:27.760090Z 0 [Note] InnoDB: Starting crash recovery.
    2017-12-13T09:16:29.026740Z 0 [Note] InnoDB: Created parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite, size 39321600 bytes
    2017-12-13T09:16:29.854281Z 0 [Note] InnoDB: Transaction 3369902481 was in the XA prepared state.
    2017-12-13T09:16:30.865089Z 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
    2017-12-13T09:16:30.865113Z 0 [Note] InnoDB: Trx id counter is 3369902848
    2017-12-13T09:16:30.876541Z 0 [Note] InnoDB: Starting an apply batch of log records to the database...
    InnoDB: Progress in percent: 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
    2017-12-13T09:16:32.311863Z 0 [Note] InnoDB: Apply batch completed
    2017-12-13T09:16:32.311903Z 0 [Note] InnoDB: Last MySQL binlog file position 0 256986, file name binlog.000038
    2017-12-13T09:25:40.681787Z 0 [Note] InnoDB: Starting in background the rollback of uncommitted transactions
    2017-12-13T09:25:40.681825Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
    2017-12-13T09:25:40.681875Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
    2017-12-13T09:25:40.681849Z 0 [Note] InnoDB: Rollback of non-prepared transactions completed
    2017-12-13T09:25:40.681927Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
    2017-12-13T09:25:40.768967Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
    2017-12-13T09:25:40.774366Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
    2017-12-13T09:25:40.774388Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
    2017-12-13T09:25:40.774468Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 554744ms. The settings might not be optimal. (flushed=0, during the time.)
    2017-12-13T09:25:40.790622Z 0 [Note] InnoDB: Waiting for purge to start
    2017-12-13T09:29:41.799002Z 0 [Warning] InnoDB: A long semaphore wait:
    --Thread 139902404835072 has waited at row0row.cc line 1073 for 241.00 seconds the semaphore:
    S-lock on RW-latch at 0x7f4148f33d70 created in file buf0buf.cc line 1456

    a writer (thread id 139902388049664) has reserved it in mode exclusive
    number of readers 0, waiters flag 1, lock_word: ffffffffdfffffff
    Last time read locked in file row0row.cc line 1073
    Last time write locked in file /mnt/workspace/percona-xtradb-cluster-5.7-debian-binary/label_exp/ubuntu-xenial-64bit/percona-xtradb-cluster-5.7-5.7.19-29.22/storage/innobase/btr/btr0cur.cc line 1911
    2017-12-13T09:29:41.799057Z 0 [Warning] InnoDB: A long semaphore wait:
    --Thread 139902388049664 has waited at btr0cur.cc line 1911 for 241.00 seconds the semaphore:
    X-lock (wait_ex) on RW-latch at 0x7f40c36f8db0 created in file buf0buf.cc line 1456
    a writer (thread id 139902388049664) has reserved it in mode wait exclusive
    number of readers 1, waiters flag 0, lock_word: ffffffffefffffff
    Last time read locked in file row0row.cc line 1073
    Last time write locked in file /mnt/workspace/percona-xtradb-cluster-5.7-debian-binary/label_exp/ubuntu-xenial-64bit/percona-xtradb-cluster-5.7-5.7.19-29.22/storage/innobase/include/mtr0mtr.ic line 117
    2017-12-13T09:29:41.799079Z 0 [Warning] InnoDB: A long semaphore wait:
    --Thread 139925455378240 has waited at fsp0fsp.cc line 1395 for 241.00 seconds the semaphore:
    X-lock on RW-latch at 0x3f3f3f8 created in file fil0fil.cc line 1392
    a writer (thread id 139902388049664) has reserved it in mode exclusive
    number of readers 0, waiters flag 1, lock_word: 0
    Last time read locked in file not yet reserved line 0
    Last time write locked in file /mnt/workspace/percona-xtradb-cluster-5.7-debian-binary/label_exp/ubuntu-xenial-64bit/percona-xtradb-cluster-5.7-5.7.19-29.22/storage/innobase/fsp/fsp0fsp.cc line 3373
    InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
    InnoDB: Pending preads 0, pwrites 0

    There is a lot of debugging output after this... probably not that important but I can share if need be.

    And finaly at the bottom it says:
    [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.


    ----------------------------
    END OF INNODB MONITOR OUTPUT
    ============================
    InnoDB: ###### Diagnostic info printed to the standard error stream
    2017-12-13T09:41:33.810011Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
    2017-12-13 10:41:33 0x7f3d92a4e700 InnoDB: Assertion failure in thread 139902430013184 in file ut0ut.cc line 917
    InnoDB: We intentionally generate a memory trap.
    InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
    InnoDB: If you get repeated assertion failures or crashes, even
    InnoDB: immediately after the mysqld startup, there may be
    InnoDB: corruption in the InnoDB tablespace. Please refer to
    InnoDB: http://dev.mysql.com/doc/refman/5.7/...-recovery.html
    InnoDB: about forcing recovery.
    09:41:33 UTC - mysqld got signal 6 ;
    This could be because you hit a bug. It is also possible that this binary
    or one of the libraries it was linked against is corrupt, improperly built,
    or misconfigured. This error can also be caused by malfunctioning hardware.
    Attempting to collect some information that could help diagnose the problem.
    As this is a crash and something is definitely wrong, the information
    collection process might fail.
    Please help us make Percona XtraDB Cluster better by reporting any
    bugs at https://bugs.launchpad.net/percona-xtradb-cluster

    key_buffer_size=8388608
    read_buffer_size=131072
    max_used_connections=0
    max_threads=501
    thread_count=0
    connection_count=0
    It is possible that mysqld could use up to
    key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 207511 K bytes of memory
    Hope that's ok; if not, decrease some variables in the equation.

    Thread pointer: 0x0
    Attempting backtrace. You can use the following information to find out
    where mysqld died. If you see no messages after this, something went
    terribly wrong...
    stack_bottom = 0 thread_stack 0x40000
    /usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xed221b]
    /usr/sbin/mysqld(handle_fatal_signal+0x499)[0x783ef9]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f42ed7bf390]
    /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f42ecb78428]
    /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f42ecb7a02a]
    /usr/sbin/mysqld[0x753da4]
    /usr/sbin/mysqld(_ZN2ib5fatalD1Ev+0x145)[0x10b54c5]
    /usr/sbin/mysqld(srv_error_monitor_thread+0xe61)[0x104e441]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f42ed7b56ba]
    /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f42ecc4a3dd]
    You may download the Percona XtraDB Cluster operations manual by visiting
    http://www.percona.com/software/percona-xtradb-cluster/. You may find information
    in the manual which will help you identify the cause of the crash.




    It might be that it crashed during the night, while doing an alter table.
    But why it doesn't want to rejoin the cluster is of bigger concern to me.

    Any idea what is going on? Or how I can remedy it?

    I know I can remove everything in /var/lib/mysql and do a full SST, but I'd like to avoid doing this ...

    Thanks for the help


  • #2
    Can you provide error logs from joiner and donor nodes to check further. Also, check If you have any innobackup-* logs exists in data directory of any of the nodes that will gives you hint as well that what stopping node to join back to cluster.

    Comment


    • #3
      mirfan
      I haven't got any innobackup-* logs that are of use I'm afraid. I think those logs are truncated after x time...

      I got a second node that crashed this weekend. With a similar error. I'm down to 1 running node in my cluster. :-(

      2017-12-15T23:25:36.869600Z 0 [Warning] InnoDB: A long semaphore wait:
      --Thread 139768016938752 has waited at dict0stats.cc line 3623 for 241.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0x3d8d6a8 created in file dict0dict.cc line 1198
      a writer (thread id 139768016938752) has reserved it in mode wait exclusive
      number of readers 2, waiters flag 1, lock_word: fffffffffffffffe
      Last time read locked in file row0purge.cc line 865
      Last time write locked in file /mnt/workspace/percona-xtradb-cluster-5.7-debian-binary/label_exp/ubuntu-xenial-64bit/percona-xtradb-cluster-5.7-5.7.19-29.22/storage/innobase/handler/ha_innodb.cc line 15330
      2017-12-15T23:25:36.869736Z 0 [Warning] InnoDB: A long semaphore wait:
      --Thread 139733382022912 has waited at row0purge.cc line 865 for 241.00 seconds the semaphore:
      S-lock on RW-latch at 0x3d8d6a8 created in file dict0dict.cc line 1198
      a writer (thread id 139768016938752) has reserved it in mode wait exclusive
      number of readers 2, waiters flag 1, lock_word: fffffffffffffffe
      Last time read locked in file row0purge.cc line 865
      Last time write locked in file /mnt/workspace/percona-xtradb-cluster-5.7-debian-binary/label_exp/ubuntu-xenial-64bit/percona-xtradb-cluster-5.7-5.7.19-29.22/storage/innobase/handler/ha_innodb.cc line 15330
      2017-12-15T23:25:36.869780Z 0 [Warning] InnoDB: A long semaphore wait:
      --Thread 139733356844800 has waited at btr0cur.cc line 1911 for 241.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0x7f1b002ce790 created in file buf0buf.cc line 1456
      a writer (thread id 139733356844800) has reserved it in mode wait exclusive
      number of readers 1, waiters flag 0, lock_word: ffffffffefffffff


      I don't understand what it is trying to tell me. It is waiting for a semaphore, why? Is this a long running query it is waiting for? Is it IO it is waiting for?
      I googled quite a bit and I'm finding bug reports on mysql that mention a similar issue: https://bugs.mysql.com/bug.php?id=73890 but I'm not smart enough to understand what the logs are saying (I'm guessing few people are since that bug report is still open... ).

      Comment


      • #4
        Hmmm... the semaphore wait in the first comment is not in the same place as the semaphore wait in the last comment.

        Looking at these lines in the logs:

        Last time read locked in file row0purge.cc line 865
        Last time write locked in file /mnt/workspace/percona-xtradb-cluster-5.7-debian-binary/label_exp/ubuntu-xenial-64bit/percona-xtradb-cluster-5.7-5.7.19-29.22/storage/innobase/handler/ha_innodb.cc line 15330

        The ha_innodb.cc lock is occurring in innodb_rename_table() and it's locking the data dictionary (dict_operation_lock)
        The row0purge.cc is also trying to obtain a lock on the data dictionary, (dict_operation_lock).
        The dict0stats.cc is also trying to obtain a lock on dict_operation_lock (and is also trying rename a table)

        I'm not familiar with this code so I'm not sure what exactly is happening (maybe a deadlock??). Were you trying to rename a table when the crash occurred?

        You may want to create a bug (with all the logs) on https://jira.percona.com so that this issue can be tracked.

        Comment


        • #5
          Based on the said lines
          2017-12-13T09:41:33.810011Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
          2017-12-13 10:41:33 0x7f3d92a4e700 InnoDB: Assertion failure in thread 139902430013184 in file ut0ut.cc line 917

          maybe it is possible that you are hitting following upstream issue

          https://bugs.launchpad.net/percona-server/+bug/1653764

          Comment

          Working...
          X