Buy Percona ServicesBuy Now!

Unexpected error "Too many connections"

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unexpected error "Too many connections"

    Hello everyone,

    my name is Roman and I need help.

    In a nutshell: for the last 2 months I've been trying to fix issue with my application which is using Percona XtraDB Cluster 5.7. From time to time (from 1 to 10 days) mysql becomes unavailable: all new connections get error message "Too many connections", but it looks like that mysql does nothing (low CPU usage and disk i/o). Only "kill -9" and restart of mysql server can make it available.

    Below I'll provide more details. I'll be pleasant for any ideas what can I try to fix the issue.

    Some details about my environment. I'm using Ubuntu 14.04 with Percona XtraDB Cluster 5.7:
    Code:
    uname -a
    Linux **** 4.14.90-37 #1 SMP Tue Dec 25 17:20:38 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
    Code:
    lsb_release -a
    Distributor ID:    Ubuntu
    Description:    Ubuntu 14.04.5 LTS
    Release:    14.04
    Codename:    trusty
    Code:
    mysql --version
    mysql  Ver 14.14 Distrib 5.7.23-23, for debian-linux-gnu (x86_64) using readline 6.3
    Code:
    dpkg -l | grep percona
    ii  percona-repo-config                       1.4                                        all          Configures Percona mirror repo
    ii  percona-toolkit                           3.0.12-1.trusty                            amd64        Advanced MySQL and system command-line tools
    ii  percona-xtrabackup-24                     2.4.12-1.trusty                            amd64        Open source backup tool for InnoDB and XtraDB
    ii  percona-xtradb-cluster-client-5.7         5.7.23-31.31-2.trusty                      amd64        Percona XtraDB Cluster database client binaries
    ii  percona-xtradb-cluster-common-5.7         5.7.23-31.31-2.trusty                      amd64        Percona XtraDB Cluster database common files (e.g. /etc/mysql/my.cnf)
    ii  percona-xtradb-cluster-server-5.7         5.7.23-31.31-2.trusty                      amd64        Percona XtraDB Cluster database server binaries
    Code:
    ps uax | grep mysqld
    mysql     861896  894 85.4 338018320 225641360 ? S<l  Feb08 15960:03 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql-3307 --plugin-dir=/usr/lib/mysql/plugin --user=mysql --tmpdir=/var/tmp --lc-messages-dir=/usr/share/mysql --skip-external-locking --bind-address=:: --extra-port=3306 --character-set-server=utf8 --collation-server=utf8_general_ci --explicit-defaults-for-timestamp=1 --innodb-file-per-table=1 --innodb-flush-log-at-trx-commit=2 --innodb-flush-method=O_DIRECT --innodb-log-file-size=10G --slave-sql-verify-checksum=NONE --transaction-isolation=READ-COMMITTED --innodb-buffer-pool-size=192G --innodb-buffer-pool-instances=8 --innodb-checksum-algorithm=crc32 --innodb-io-capacity=5000 --innodb-io-capacity-max=5500 --innodb-log-compressed-pages=OFF --innodb-thread-concurrency=120 --innodb-flush-neighbors=0 --innodb-lru-scan-depth=256 --innodb-purge-threads=8 --innodb-page-cleaners=8 --innodb-buffer-pool-dump-at-shutdown=ON --innodb-buffer-pool-load-at-startup=ON --interactive-timeout=28800 --wait-timeout=28800 --max-allowed-packet=900M --max-connections=1000 --extra-max-connections=1000 --net-read-timeout=3600 --net-write-timeout=3600 --performance-schema-max-digest-length=10240 --thread-cache-size=200 --ft-min-word-len=1 --ft-stopword-file= --read-buffer-size=512K --read-rnd-buffer-size=1M --sort-buffer-size=1M --key-buffer-size=12G --slow-query-log=OFF --slow-query-log-file=/var/log/mysql/mysql-3307-slow.log --long-query-time=15 --binlog-cache-size=4096M --general-log=1 --general-log-file=/var/log/mysql/mysql-3307-general.log --sql-mode=NO_ENGINE_SUBSTITUTION,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ZERO_DATE,NO_ZERO_IN_DATE --replicate-ignore-db=performance_schema --replicate-ignore-db=information_schema --binlog-format=ROW --binlog-ignore-db=mysql --expire-logs-days=1 --enforce-gtid-consistency=ON --gtid-mode=Off --log-bin=/var/log/mysql/mysql-3307-bin.log --max-binlog-files=500 --relay-log=/var/log/mysql/mysqld-3307-relay-bin --server-id=307 --log-error=/var/log/mysql/mysql-3307-error.log --pid-file=/var/run/mysqld/mysqld-3307.pid --socket=/var/run/mysqld/mysqld-3307.sock --port=3307 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
    Code:
    SELECT VERSION();
    5.7.23-23-57-log
    Several words about my application and data. I have PHP based application which is working with 5 mysql shards. Shards are independent on each other. Each shard have master-slave replication to backup server. So, I have 10 servers: 5 masters and 5 replicas. Application is working only with master servers, once a day special cron task takes a backup from each replica by using Percona XtraBackup.

    My mysql-servers are very powerful:
    Code:
    lscpu
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                56
    ...
    Code:
    cat /proc/meminfo
    MemTotal:       264019652 kB
    MemFree:         2619616 kB
    MemAvailable:   32820228 kB
    Buffers:             920 kB
    Cached:         31100868 kB
    SwapCached:            0 kB
    ...
    Code:
    df -h
    Filesystem      Size  Used Avail Use% Mounted on
    udev            126G  4.0K  126G   1% /dev
    tmpfs            26G   16M   26G   1% /run
    /dev/md0         32G  5.2G   27G  17% /
    none            4.0K     0  4.0K   0% /sys/fs/cgroup
    none            5.0M     0  5.0M   0% /run/lock
    none            126G   28K  126G   1% /run/shm
    none            100M     0  100M   0% /run/user
    /dev/md2        1.8T  1.3T  505G  72% /srv/local
    tmpfs           128M  4.0K  128M   1% /var/tmp/skynet/cqudp
    Data is located in /srv/local partition, there are enough room for the data on this partition.

    My databases are big: each shard have size ~500 Gb. There are about 50 tables in my DBs, but only several of them contains almost all data. All shards have roughly speaking the same size, the same number of requests, the same number of clients. One important fact: I faced my issue only on one shard.

    My application is using a concept of "workers". Worker is a short php-program which usually working in the following way:
    1. start and establish connection to mysql-server,
    2. get some data from DB,
    3. get some data from external API,
    4. do some calculations and put data back to DB,
    5. close connection to DB.

    In common case I have not more than 200 workers working with each shard and I have max_connections variable in mysql-server set to value 1000. In common case it's much more then enough to work wihout any issues. Workers are usually working very fast: some of them can do their job during several seconds, but some of them can work during minutes or even dozens of minutes. Usually I have hundreds workers starts and finishes per every minute.

    Most part of queries made by workers are UPDATEs. I have a monitoring based on Percona's script https://github.com/percona/percona-m...ysql_stats.php, here is screenshot from it:



    When I faced issue with "Too many connections error" I checked that I didn't have more workers then usual. No one besides workers can connect to DB. First of all I tried to optimize my application and separate queries which update a lot of data to smaller queries and distribute them in time. It didn't help.

    Then I tried to compare settings between all my DB servers and found, that problem server has default value of variable binlog_cache_size - 32Kb, whereas on other servers this variable have value 100Mb. Then I also found that value of binlog cache usage is enormous on problem server - millions per day, whereas on other servers it's something like couple of thousands each day.

    When I set binlog_cache_size to 100 Mb on problem server it was working without problems for 10 days, before that it was working not longer then 2-3 days. But after 10 days it again becames unavailable with the message "Too many connections". I also noticed that when mysql-server starts serving error "Too many connections" it's extensively using CPU and doing lot of writes on disk. But after couple of minutes CPU usage and i/o goes down almost to 0, but mysq still serving error "Too many connections".

    I have logs from atop utility, which allows to log system state every minute. Below you can find a couple of screenshots made at the moment of error starts at 10:33.





    I didn'y have general log enabled on my problem server and I decided to enable it (writing data to file) in hope to find something interesting in it at the moment of error "Too many connections" begins. I haven't found anything yet in this log, but my mysql server broke down twise during 2 days I enabled this log. Looks like there too many disk i/o operation and this might be a reason of error. I have these SSDs in my server: https://www.micron.com/products/soli...uct-lines/5100 and I found that they can deal with up to 500 Mb/s of writes and 43000 iops. Look like I didn't reach these limits in my case.

    Does anyone has any ideas what should I try to find and fix the reason of issues with my DB server?

  • #2
    Additional information about settings of my server:

    innodb_buffer_pool_size = 206158430208
    innodb_log_file_size = 10737418240
    max_connections = 1000
    innodb_file_per_table = ON
    innodb_flush_log_at_trx_commit = 2
    innodb_flush_method = O_DIRECT
    innodb_log_buffer_size = 16777216
    Innodb_log_waits = 0 -- from SHOW GLOBAL STATUS
    query_cache_size = 1048576
    query_cache_type = OFF
    log_bin = ON
    skip_name_resolve = OFF

    Comment


    • #3
      I suspect we may need more data but first point to check is when you hit issue due to see flow control. Can you share output of "show processlist" and show status like 'wsrep%'
      It would be great to run pt-pmp (check percona toolkit) to get trace of what each thread is doing.

      You can also check pt-stalk that can help collect most of the needed params https://www.percona.com/doc/percona-.../pt-stalk.html

      Comment


      • #4
        Hello, krunalbauskar,

        thank you for your answer. I'll definitely try your advice and run these 2 queries and pt-pmp utility when I meet my issue next time and than I publish results here.

        I'll also set up pt-stalk to collect additional information. I have one additional question about pt-stalk setup. As far as I understood from docs in my case I should run it somehow like this:

        Code:
        pt-stalk --daemonize --ask-pass --collect-tcpdump --sleep=3600 --threshold=1000 --variable=Threads_running --user=myuser --host=myhost --cycles=5
        But the problem is that in time when I have "Too many connections" error values of "Threads_running" and other related counters are not high. Here is screenshot from the last outage: Click image for larger version

Name:	Selection_2019_0212_001.png
Views:	1
Size:	178.3 KB
ID:	53535
        I don't know if it just monitoring tool can't collect data because of outage, or it's a real values. Is it worth setting trigger on "Max_used_connections" variable? Something like this:
        Code:
        pt-stalk --daemonize --ask-pass --collect-tcpdump --sleep=3600 --threshold=1000 --variable=Max_used_connections --user=myuser --host=myhost --cycles=5
        Many thanks for your help!

        Comment


        • #5
          Hi Roman,

          You have gaps in your graphs, which likely means the monitoring agent could already not connect to the database due to the problem with avaialble connections.
          To avoid that, configure an extra TCP port for your database instances and use that one for monitoring as well as for pt-stalk itself.
          Check here: https://www.percona.com/doc/percona-...tml#extra_port

          For pt-stalk in this case, I'd rather use something like --threshold=400 --variable=Threads_connected

          Comment


          • #6
            Hello przemek,

            thank you, you are right, I'm using the same port for application and monitoring. I'll fix it.

            Comment


            • #7
              Hello everyone,

              today my DB again became unavailable with "Too many connections" error. And what is suspicious - last time this problem happened exactly 10 days ago. And previous outage (with the same DB settings) also was 10 days before.

              Unfortunately pt-stalk didn't collect any data, maybe because I set threshold to 900, now I decreased it to 400:
              Code:
              --threshold=400 --variable=Threads_connected
              But pt-pmp utility collected some data. Here is pt-pmp output when my server is working properly: https://pastebin.com/raw/iA1chWsk

              And here is pt-pmp output 20 minutes after the problem happened:
              Code:
              root@***:~# pt-pmp
              Wed Feb 20 18:07:46 MSK 2019
                  654 __lll_lock_wait(libpthread.so.0),_L_lock_1081(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),THD::init,THD::THD,Channel_info::create_thd,Channel_info_tcpip_socket::create_thd,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                   80 pthread_cond_wait,Stage_manager::enroll_for,MYSQL_BIN_LOG::change_stage,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,::??,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                   28 pthread_cond_wait,os_event::wait_low,lock_wait_suspend_thread,row_mysql_handle_errors,row_search_mvcc,ha_innobase::index_read,handler::ha_index_read_map,::??,sub_select,JOIN::exec,handle_query,::??,mysql_execute_command,mysql_parse,::??,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                   17 __lll_lock_wait(libpthread.so.0),_L_lock_1081(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),Global_THD_manager::remove_thd,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                   10 libaio::??(libaio.so.1),LinuxAIOHandler::collect,LinuxAIOHandler::poll,os_aio_handler,fil_aio_wait,io_handler_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    9 pthread_cond_wait,Stage_manager::enroll_for,MYSQL_BIN_LOG::change_stage,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit,mysql_execute_command,mysql_parse,::??,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    8 nanosleep(libpthread.so.0),os_thread_sleep,buf_lru_manager,start_thread(libpthread.so.0),clone(libc.so.6)
                    7 pthread_cond_wait,os_event::wait_low,srv_worker_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    7 pthread_cond_wait,os_event::wait_low,buf_flush_page_cleaner_worker,start_thread(libpthread.so.0),clone(libc.so.6)
                    7 poll(libc.so.6),vio_io_wait,vio_socket_io_wait,vio_read,::??,::??,my_net_read,Protocol_classic::read_packet,Protocol_classic::get_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    2 __lll_lock_wait(libpthread.so.0),_L_lock_1081(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),Global_THD_manager::do_for_all_thd_copy,fill_schema_processlist,::??,get_schema_tables_result,JOIN::prepare_result,JOIN::exec,handle_query,::??,mysql_execute_command,mysql_parse,::??,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 sigwait(libpthread.so.0),signal_hand,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 sigwaitinfo(libc.so.6),::??,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 pthread_cond_wait,os_event::wait_low,srv_purge_coordinator_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 pthread_cond_wait,os_event::wait_low,buf_resize_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 pthread_cond_wait,os_event::wait_low,buf_dump_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 pthread_cond_wait,compress_gtid_table,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,srv_monitor_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,srv_error_monitor_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,lock_wait_timeout_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,ib_wqueue_timedwait,fts_optimize_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,dict_stats_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 pthread_cond_timedwait,os_event::timed_wait,os_event::wait_time_low,buf_flush_page_cleaner_coordinator,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 pthread_cond_timedwait,MYSQL_BIN_LOG::wait_for_update_bin_log,Binlog_sender::send_binlog,Binlog_sender::run,mysql_binlog_send,com_binlog_dump,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 poll(libc.so.6),Mysqld_socket_listener::listen_for_connection_event,mysqld_main,__libc_start_main(libc.so.6),_start
                    1 nanosleep(libpthread.so.0),os_thread_sleep,srv_master_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 __lll_lock_wait(libpthread.so.0),_L_lock_909(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),PolyLock_lock_log::rdlock,sys_var::value_ptr,get_one_variable_ext,System_variable::init,PFS_system_variable_cache::do_materialize_all,table_session_variables::rnd_init,ha_perfschema::rnd_init,handler::ha_rnd_init,init_read_record,join_init_read_record,sub_select,JOIN::exec,TABLE_LIST::materialize_derived,join_materialize_derived,QEP_TAB::prepare_scan,sub_select,JOIN::exec,handle_query,::??,mysql_execute_command,mysql_parse,::??,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 __lll_lock_wait(libpthread.so.0),_L_lock_909(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),::??,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit,mysql_execute_command,mysql_parse,::??,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 __lll_lock_wait(libpthread.so.0),_L_lock_909(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),::??,MYSQL_BIN_LOG::new_file_impl,MYSQL_BIN_LOG::rotate,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,::??,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 __lll_lock_wait(libpthread.so.0),_L_lock_909(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),MYSQL_BIN_LOG::change_stage,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,::??,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 __lll_lock_wait(libpthread.so.0),_L_lock_1081(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),THD::init,THD::THD,Channel_info::create_thd,Channel_info_local_socket::create_thd,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 __lll_lock_wait(libpthread.so.0),_L_lock_1081(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),Log_in_use::operator,Global_THD_manager::do_for_all_thd,MYSQL_BIN_LOG::purge_logs_maximum_number,MYSQL_BIN_LOG::purge,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,::??,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
                    1 __lll_lock_wait(libpthread.so.0),_L_lock_1081(libpthread.so.0),pthread_mutex_lock(libpthread.so.0),Fill_process_list::operator,Global_THD_manager::do_for_all_thd_copy,fill_schema_processlist,::??,get_schema_tables_result,JOIN::prepare_result,JOIN::exec,handle_query,::??,mysql_execute_command,mysql_parse,::??,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)
              Does anyone have any ideas what can I try to do to catch and fix this bug?

              Comment


              • #8
                I also collected some data from atop logs. Here is atop log at the moment of outage:
                Code:
                ATOP - ***                                                 2019/02/20  17:48:06                                                 ---------P-/-                                                   1m0s elapsed
                PRC | sys   39.92s  | user  12m29s  |               |               | #proc    692  | #trun     27  | #tslpi   915 |  #tslpu    10 |  #zombie    0 |  clones  3859 |               |               |  #exit   1555 |
                CPU | sys      72%  | user   1251%  | irq      17%  |               |               | idle   4129%  | wait    132% |               |  steal     0% |  guest     0% |               |  curf 2.49GHz |  curscal  77% |
                CPL | avg1   22.59  |               | avg5   18.27  | avg15  15.91  |               |               | csw  5468160 |               |  intr 6275899 |               |               |  numcpu    56 |               |
                MEM | tot   251.8G  | free  978.1M  | cache  28.1G  | dirty 299.6M  | buff    0.8M  | slab    1.0G  | slrec 824.8M |  shmem  16.0M |  shrss   0.0M |  shswp   0.0M |  vmbal   0.0M |  hptot   0.0M |  hpuse   0.0M |
                SWP | tot     4.0G  | free    4.0G  |               |               |               |               |              |               |               |               |  vmcom 545.4G |               |  vmlim 129.9G |
                PAG | scan   32206  | steal  32206  |               | stall      0  |               |               |              |               |               |               |  swin       0 |               |  swout      0 |
                MDD |          md0  | busy      0%  |               | read    1122  | write     43  | KiB/r     69  | KiB/w     22 |               |  MBr/s    1.3 |  MBw/s    0.0 |  avq     0.00 |               |  avio 0.00 ms |
                MDD |          md2  | busy      0%  |               | read    7084  | write 149222  | KiB/r    108  | KiB/w     92 |               |  MBr/s   12.6 |  MBw/s  223.5 |  avq     0.00 |               |  avio 0.00 ms |
                DSK |          sdb  | busy     48%  |               | read    1860  | write  69289  | KiB/r     55  | KiB/w     50 |               |  MBr/s    1.7 |  MBw/s   56.8 |  avq    49.06 |               |  avio 0.41 ms |
                DSK |          sdd  | busy     46%  |               | read    1799  | write  70242  | KiB/r     57  | KiB/w     49 |               |  MBr/s    1.7 |  MBw/s   57.1 |  avq    32.42 |               |  avio 0.38 ms |
                DSK |          sdc  | busy     42%  |               | read    1810  | write  70641  | KiB/r     56  | KiB/w     49 |               |  MBr/s    1.7 |  MBw/s   57.1 |  avq    32.76 |               |  avio 0.35 ms |
                NET | transport     | tcpi  308618  | tcpo  660798  | udpi     318  | udpo     479  | tcpao    290  | tcppo   1551 |               |  tcprs    262 |  tcpie      0 |  tcpor    312 |  udpnp      0 |  udpie      0 |
                NET | network       | ipi   310601  |               | ipo   306791  | ipfrw      0  | deliv 310601  |              |               |               |               |  icmpi    856 |               |  icmpo   1595 |
                NET | eth0      5%  | pcki  325816  | pcko  661878  | sp   10 Gbps  | si   81 Mbps  | so  558 Mbps  |              |  coll       0 |  mlti       6 |  erri       0 |  erro       0 |  drpi       0 |  drpo       0 |
                NET | lo      ----  | pcki    1205  | pcko    1205  | sp    0 Mbps  | si  206 Kbps  | so  206 Kbps  |              |  coll       0 |  mlti       0 |  erri       0 |  erro       0 |  drpi       0 |  drpo       0 |
                
                   PID          TID       RUID           EUID            THR       SYSCPU        USRCPU        VGROW        RGROW        RDDSK        WRDSK        ST       EXC       S       CPUNR        CPU        CMD        1/1
                713711            -       mysql          mysql           267       26.42s        12m11s        -4.0G       60000K       361.0M         6.4G        --         -       S          16       1264%        mysqld
                712628            -       root           root              1        0.00s         0.00s           0K           0K           0K           0K        --         -       S          16         0%        mysqld_safe
                atop log 20 minutes after outage:
                Code:
                ATOP - ***                                                 2019/02/20  18:07:06                                                 ---------P-/-                                                   1m0s elapsed
                PRC | sys    8.60s  | user  13.07s  |               |               | #proc    682  | #trun      1  | #tslpi  1540 |  #tslpu     0 |  #zombie    0 |  clones  3055 |               |               |  #exit    678 |
                CPU | sys      16%  | user     22%  | irq       1%  |               |               | idle   5562%  | wait      0% |               |  steal     0% |  guest     0% |               |  curf 2.59GHz |  curscal  81% |
                CPL | avg1    0.28  |               | avg5    1.56  | avg15   6.91  |               |               | csw  1641770 |               |  intr  123694 |               |               |  numcpu    56 |               |
                MEM | tot   251.8G  | free    1.7G  | cache  18.5G  | dirty   0.3M  | buff    0.8M  | slab  925.6M  | slrec 742.9M |  shmem  16.2M |  shrss   0.0M |  shswp   0.0M |  vmbal   0.0M |  hptot   0.0M |  hpuse   0.0M |
                SWP | tot     4.0G  | free    4.0G  |               |               |               |               |              |               |               |               |  vmcom 621.6G |               |  vmlim 129.9G |
                MDD |          md0  | busy      0%  |               | read       0  | write      2  | KiB/r      0  | KiB/w    256 |               |  MBr/s    0.0 |  MBw/s    0.0 |  avq     0.00 |               |  avio 0.00 ms |
                MDD |          md2  | busy      0%  |               | read       0  | write    170  | KiB/r      0  | KiB/w     12 |               |  MBr/s    0.0 |  MBw/s    0.0 |  avq     0.00 |               |  avio 0.00 ms |
                DSK |          sdd  | busy      0%  |               | read       0  | write    267  | KiB/r      0  | KiB/w      6 |               |  MBr/s    0.0 |  MBw/s    0.0 |  avq     1.00 |               |  avio 0.30 ms |
                DSK |          sdc  | busy      0%  |               | read       0  | write    281  | KiB/r      0  | KiB/w      5 |               |  MBr/s    0.0 |  MBw/s    0.0 |  avq     1.00 |               |  avio 0.09 ms |
                DSK |          sda  | busy      0%  |               | read       6  | write    248  | KiB/r      2  | KiB/w      6 |               |  MBr/s    0.0 |  MBw/s    0.0 |  avq     1.00 |               |  avio 0.08 ms |
                NET | transport     | tcpi    2344  | tcpo    2322  | udpi     307  | udpo     453  | tcpao    454  | tcppo     89 |               |  tcprs    181 |  tcpie      0 |  tcpor    152 |  udpnp      0 |  udpie      0 |
                NET | network       | ipi     4274  |               | ipo     4521  | ipfrw      0  | deliv   4274  |              |               |               |               |  icmpi   1004 |               |  icmpo   1566 |
                NET | eth0      0%  | pcki    2993  | pcko    2862  | sp   10 Gbps  | si 1333 Kbps  | so  813 Kbps  |              |  coll       0 |  mlti       7 |  erri       0 |  erro       0 |  drpi       0 |  drpo       0 |
                NET | lo      ----  | pcki    1296  | pcko    1296  | sp    0 Mbps  | si   28 Kbps  | so   28 Kbps  |              |  coll       0 |  mlti       0 |  erri       0 |  erro       0 |  drpi       0 |  drpo       0 |
                
                   PID          TID       RUID           EUID            THR       SYSCPU        USRCPU        VGROW        RGROW        RDDSK        WRDSK        ST       EXC       S       CPUNR        CPU        CMD        1/1
                713711            -       mysql          mysql           839        0.00s         0.06s        5460K         248K           0K           0K        --         -       S          14         0%        mysqld
                712628            -       root           root              1        0.00s         0.00s           0K           0K           0K           0K        --         -       S          16         0%        mysqld_safe

                Comment


                • #9
                  I don't know is it vital or not, but my libc version is:
                  Code:
                  dpkg -l | grep libc-bin
                  ii  libc-bin                                  2.19-0ubuntu6.13                            amd64        Embedded GNU C Library: Binaries

                  Comment


                  • #10
                    Looks like I faced this bug: https://jira.percona.com/browse/PS-4716. I'll try to upgrade mysql to 5.7.25.

                    Comment


                    • #11
                      Originally posted by romka View Post
                      Looks like I faced this bug: https://jira.percona.com/browse/PS-4716. I'll try to upgrade mysql to 5.7.25.
                      Did upgrading resolve your issue?

                      Comment


                      • #12
                        Originally posted by аman View Post

                        Did upgrading resolve your issue?
                        I updated my DB only 2 days ago and now need to wait at least 3-4 weeks to make any conclusions, because even previous version of DB could work up to 10 days.
                        Last edited by romka; 02-27-2019, 04:25 AM.

                        Comment


                        • #13
                          Does anyone knows if the field "Affects Version/s:" in Percona's Jira contain exact list of affected versions or approximate list?

                          In the issue https://jira.percona.com/browse/PS-4716 field "Affected versions" contains values "5.7.22-22, 5.7.23-23". Before my DB was updated to 5.7.23 it had version 5.7.21 and I want to know if this version also had this bug or it was introduced in 5.7.22.
                          Last edited by romka; 02-27-2019, 04:27 AM.

                          Comment


                          • #14
                            Hello everyone,

                            more then 2 weeks passed from the moment I upgraded my mysql server from 5.7.23 to 5.7.25 (I'd done it at 25.02.2019). From that time server didn't have "Too many connections" error, looks like this upgrade fixed the issue. But unfortunately this upgrade dramatically decreased performance, value of innodb row lock time increased significantly:
                            Click image for larger version

Name:	Selection_2019_0314_001.png
Views:	2
Size:	108.9 KB
ID:	53724
                            Does someone have any ideas what can I check to fix this performance issue?

                            Some of my variables:

                            innodb_buffer_pool_chunk_size=134217728
                            innodb_buffer_pool_dump_at_shutdown=ON
                            innodb_buffer_pool_dump_now=OFF
                            innodb_buffer_pool_dump_pct=25
                            innodb_buffer_pool_filename=ib_buffer_pool
                            innodb_buffer_pool_instances=64
                            innodb_buffer_pool_load_abort=OFF
                            innodb_buffer_pool_load_at_startup=ON
                            innodb_buffer_pool_load_now=OFF
                            innodb_buffer_pool_size=206158430208
                            thread_cache_size=200
                            thread_handling=pool-of-threads
                            thread_pool_high_prio_mode=transactions
                            thread_pool_high_prio_tickets=4294967295
                            thread_pool_idle_timeout=60
                            thread_pool_max_threads=100000
                            thread_pool_oversubscribe=3
                            thread_pool_size=36
                            thread_pool_stall_limit=500
                            thread_stack=262144
                            thread_statistics=OFF


                            Attached Files

                            Comment

                            Working...
                            X