Announcement

Announcement Module
Collapse
No announcement yet.

Heavy swap usage MySQL 5.5 following kernel:BUG: soft lockup

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heavy swap usage MySQL 5.5 following kernel:BUG: soft lockup

    We just built a new development server running CentOS 6.3 (Kernel 2.6.32-279.5.2.el6.x86_64) with 4 CPUs and 24GB of memory. The server is a VM running ESXi 5. MySQL version is 5.5.27, with 74 databases, totaling 28,429 tables.

    The import of the databases went without a hitch, this morning however at the console was the following error:

    Message from syslogd@ksoppiscesdbdev02 at Sep 18 08:37:47 ... kernel:BUG: soft lockup - CPU#1 stuck for 67s! [mysqld:16744]


    This has only occurred once throughout the day, but following that the system has been using 9GB of 12GB of swap space.

    The system is to replace our dev server running running CentOS 5.6 with 4 CPUs and 8GB of memory on the same ESXi 5 host. While this system could use more memory it does not swap like the new box.

    Any ideas about what could be causing this? Any help would be greatly appreciated.

    Below are the contents of my.cnf and the full output from the syslog.
    my.cnf

    [mysqld]## Generaldatadir = /var/lib/mysqltmpdir = /var/lib/mysqltmpsocket = /var/lib/mysql/mysql.sockskip-name-resolvesql-mode = NO_ENGINE_SUBSTITUTION#event-scheduler = 1## Cachethread-cache-size = 16table-open-cache = 20000table-definition-cache = 2048query-cache-size = 64Mquery-cache-limit = 10M## Per-thread Bufferssort-buffer-size = 1Mread-buffer-size = 1Mread-rnd-buffer-size = 1Mjoin-buffer-size = 24M## Temp Tablestmp-table-size = 512Mmax-heap-table-size = 512M## Networkingback-log = 100max-connections = 300max-connect-errors = 10000max-allowed-packet = 16Minteractive-timeout = 600wait-timeout = 28800### Storage Enginesdefault-storage-engine = InnoDBinnodb = FORCE## MyISAMkey-buffer-size = 64Mmyisam-sort-buffer-size = 128M## InnoDBinnodb-buffer-pool-size = 18Ginnodb-log-file-size = 100Minnodb-log-buffer-size = 10Minnodb-flush-method = O_DIRECTinnodb-file-per-table = 1innodb-flush-log-at-trx-commit = 1#innodb-open-files = 20000## Replicationserver-id = 1#log-bin = /var/lib/mysqllogs/bin-log#relay-log = /var/lib/mysqllogs/relay-logrelay-log-space-limit = 16Gexpire-logs-days = 7#read-only = 1#sync-binlog = 1#log-slave-updates = 1#binlog-format = STATEMENT#auto-increment-offset = 1#auto-increment-increment = 2## Logginglog-output = FILEslow-query-log = 1slow-query-log-file = /var/lib/mysqllogs/slow-log#log-slow-slave-statementslong-query-time = 2[mysqld_safe]log-error = /var/log/mysqld.logopen-files-limit = 65535[mysql]no-auto-rehash[mysqldump]quickmax_allowed_packet = 16M


    /var/log/messages

    Sep 18 08:37:47 ksoppiscesdbdev02 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [mysqld:16744]Sep 18 08:37:47 ksoppiscesdbdev02 kernel: Modules linked in: vsock(U) vmci(U) bnx2fc fcoe libfcoe libfc 8021q garp scsi_transport_fc stp scsi_tgt llc sunrpc ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ppdev parport_pc parport vmware_balloon shpchp sg i2c_piix4 i2c_core ext4 mbcache jbd2 e1000 sd_mod crc_t10dif sr_mod cdrom mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: nf_defrag_ipv4]Sep 18 08:37:47 ksoppiscesdbdev02 kernel: CPU 1Sep 18 08:37:47 ksoppiscesdbdev02 kernel: Modules linked in: vsock(U) vmci(U) bnx2fc fcoe libfcoe libfc 8021q garp scsi_transport_fc stp scsi_tgt llc sunrpc ipt_REJECT ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ppdev parport_pc parport vmware_balloon shpchp sg i2c_piix4 i2c_core ext4 mbcache jbd2 e1000 sd_mod crc_t10dif sr_mod cdrom mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: nf_defrag_ipv4]Sep 18 08:37:47 ksoppiscesdbdev02 kernel:Sep 18 08:37:47 ksoppiscesdbdev02 kernel: Pid: 16744, comm: mysqld Not tainted 2.6.32-279.5.2.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference PlatformSep 18 08:37:47 ksoppiscesdbdev02 kernel: RIP: 0010:[] [] _spin_unlock_irqrestore+0x17/0x20Sep 18 08:37:47 ksoppiscesdbdev02 kernel: RSP: 0000:ffff880551631d88 EFLAGS: 00000282Sep 18 08:37:47 ksoppiscesdbdev02 kernel: RAX: dead000000200200 RBX: ffff880551631d88 RCX: 0000000000000000Sep 18 08:37:47 ksoppiscesdbdev02 kernel: RDX: ffff8804e8b79ca8 RSI: 0000000000000282 RDI: 0000000000000282Sep 18 08:37:47 ksoppiscesdbdev02 kernel: RBP: ffffffff8100bc0e R08: ffff880551631e50 R09: 00000000ffffffffSep 18 08:37:47 ksoppiscesdbdev02 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000Sep 18 08:37:47 ksoppiscesdbdev02 kernel: R13: 0000000000000000 R14: ffff880551631db8 R15: ffffffff8100bc0eSep 18 08:37:47 ksoppiscesdbdev02 kernel: FS: 00007fb192e7a700(0000) GS:ffff880028280000(0000) knlGS:0000000000000000Sep 18 08:37:47 ksoppiscesdbdev02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033Sep 18 08:37:47 ksoppiscesdbdev02 kernel: CR2: 00007ff0f73c7000 CR3: 0000000452a54000 CR4: 00000000000006e0Sep 18 08:37:47 ksoppiscesdbdev02 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000Sep 18 08:37:47 ksoppiscesdbdev02 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400Sep 18 08:37:47 ksoppiscesdbdev02 kernel: Process mysqld (pid: 16744, threadinfo ffff880551630000, task ffff88062aa18080)Sep 18 08:37:47 ksoppiscesdbdev02 kernel: Stack:Sep 18 08:37:47 ksoppiscesdbdev02 kernel: ffff880551631db8 ffffffff8109248c ffffffff81092450 ffff8804e8b79c80Sep 18 08:37:47 ksoppiscesdbdev02 kernel: 0000000000000001 ffff88062aa18080 ffff880551631f38 ffffffff811c4139Sep 18 08:37:47 ksoppiscesdbdev02 kernel: 00000000ffffffff ffff880551631e38 0000000000000000 ffff88062aa18080Sep 18 08:37:47 ksoppiscesdbdev02 kernel: Call Trace:Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? remove_wait_queue+0x3c/0x50Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? remove_wait_queue+0x0/0x50Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? read_events+0x329/0x450Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? default_wake_function+0x0/0x20Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? timeout_func+0x0/0x20Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? sys_io_getevents+0x56/0xb0Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? sys_io_getevents+0x1/0xb0Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? system_call_fastpath+0x16/0x1bSep 18 08:37:47 ksoppiscesdbdev02 kernel: Code: 00 00 00 01 74 05 e8 69 e3 d7 ff c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 48 89 fa c7 02 00 00 00 00 48 89 f7 57 9d <0f> 1f 44 00 00 c9 c3 66 90 55 48 89 e5 0f 1f 44 00 00 f0 ff 07Sep 18 08:37:47 ksoppiscesdbdev02 kernel: Call Trace:Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? remove_wait_queue+0x3c/0x50Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? remove_wait_queue+0x0/0x50Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? read_events+0x329/0x450Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? default_wake_function+0x0/0x20Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? timeout_func+0x0/0x20Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? sys_io_getevents+0x56/0xb0Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? sys_io_getevents+0x1/0xb0Sep 18 08:37:47 ksoppiscesdbdev02 kernel: [] ? system_call_fastpath+0x16/0x1bSep 18 08:37:47 ksoppiscesdbdev02 kernel: INFO: task mysqld:17859 blocked for more than 120 seconds.Sep 18 08:37:47 ksoppiscesdbdev02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Sep 18 08:37:47 ksoppiscesdbdev02 kernel: mysqld D 0000000000000002 0 17859 16104 0x00000080Sep 18 08:37:47 ksoppiscesdbdev02 kernel: ffff88027df7bdf8 0000000000000082 ffff88060383cde0 0000000000000082Sep 18 08:37:47 ksoppiscesdbdev02 kernel: ffff88027df7bd88 ffff880629145e80 0000000000000000 ffff8806297c90d0

  • #2
    Hi

    I guess, you are hitting this bug for CentOS 6.3 (http://bugs.centos.org/view.php?id=5823). As you said, it was working properly with CentOS 5.6, is MySQL version being changed while migrating from CentOS 5.6 to 6.3? If not then it doesn't look like MySQL issue. Can you please check and let me know if that is not the case?

    Regards,
    Nil

    Comment


    • #3
      Thanks for the suggestion Nil. As part of the migration we went from MySQL 5.1 to 5.5.

      Since my initial post, I have done testing with a staging server with the same OS (6.3, same kernel version as Dev too) and MySQL (5.5) version, no soft lockups, minimal swapping under load. This box is running on a different ESXi cluster at our co-lo data center, the VMWare host software is the same, but the hardware is different, 4 x Dell PowerEdge R910s vs 4 x HP ProLiant DL 580 G7s in our local data center.

      It's got us very puzzled, I am rebuilding the local (dev box) with CentOS 6.2 to see if the problem still occurs.

      Comment


      • #4
        It looks like we found the problem and resolved it.

        It appears we had an issue with CPU ready time on the VMs. I ran exstop on our ESXi hosts and found the VMs we were having issues with (and others as it turns out) were waiting for CPU time. Their %Ready value was anywhere from 60-130%. According to VMWare troubleshooting documentation (see below) a normal value should be 5% or lower. The odd thing was when we looked at our overall environment we were definitely not over-provisioned on CPU. It turns out a resource pool had been set up in ESXi to segregate our dev-qa environments from the production systems we run in-house. This resource pool WAS over-provisioned, we have given it more resources and the %Ready value in esxtop of returned to normal operating levels. Based on this I believe we will stop seeing soft lockups. Should that change I will report back with more info.

        Troubleshooting ESX/ESXi virtual machine performance issues:
        Quote:
        CPU constraints

        To determine whether the poor performance is due to a CPU constraint:

        Use the esxtop command to determine if the ESX/ESXi server is being overloaded. For more information about esxtop, see the Resource Management Guide for your version of ESX/ESXi:

        ESXi 5.0
        ESX/ESXi 4.1
        ESX 4.0
        ESX 3.5:
        Update 2 and later
        Initial Release and Update 1
        ESX 3.0
        Examine the load average on the first line of the command output.

        A load average of 1.00 means that the ESX/ESXi Server machine's physical CPUs are fully utilized, and a load average of 0.5 means that they are half utilized. A load average of 2.00 means that the system as a whole is overloaded.

        Examine the %READY field for the percentage of time that the virtual machine was ready but could not be scheduled to run on a physical CPU.

        Under normal operating conditions, this value should remain under 5%. If the ready time values are high on the virtual machines that experience bad performance, then check for CPU limiting:

        Make sure the virtual machine is not constrained by a CPU limit set on itself
        Make sure that the virtual machine is not constrained by its resource pool.

        For more information, see Impact of virtual machine memory and CPU resource limits (1033115).

        If the load average is too high, and the ready time is not caused by CPU limiting, adjust the CPU load on the host. To adjust the CPU load on the host, either:

        Increase the number of physical CPUs on the host

        OR

        Decrease the number of virtual CPUs allocated to the host. To decrease the number of virtual CPUs allocated to the host, either:

        Reduce the total number of CPUs allocated to all of the virtual machines running on the ESX host. For more information, see Determining if multiple virtual CPUs are causing performance issues (1005362).

        OR

        Reduce the number of virtual machines running on the host.

        If you are using ESX 3.5, determine whether IRQ sharing is an issue. For more information, see ESX has performance issues due to IRQ sharing (1003710).
        http://kb.vmware.com/selfservice/microsites/search.do?langua ge=en_US&cmd=displayKC&externalId=2001003

        Comment


        • #5
          Hi,

          Thanks, I'm glad that you found the problem, resolved and it wasn't related to MySQL Btw, please keep posted us your findings.

          Comment

          Working...
          X