Emergency

Percona cluster failed to connect backend connection

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Percona cluster failed to connect backend connection

    Hi, i just install my percona cluster today with 2 node and load balancing. it seems fast and good after i tuned it, but theres some problem. after i restart my mysql, it didnt want to start, here is my log file


    2016-10-13T07:44:43.129771Z 0 [Note] WSREP: view((empty))
    2016-10-13T07:44:43.129980Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
    at gcomm/src/pc.cpp:connect():162
    2016-10-13T07:44:43.130481Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
    2016-10-13T07:44:43.130609Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1407: Failed to open channel 'testcluster' at 'gcomm://192.168.70.47,192.168.70.48': -110 (Connection timed out)
    2016-10-13T07:44:43.130631Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
    2016-10-13T07:44:43.130643Z 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.70.47,192.168.70.48) failed: 7
    2016-10-13T07:44:43.130650Z 0 [ERROR] Aborting

    2016-10-13T07:44:43.130660Z 0 [Note] Forcefully disconnecting 0 remaining clients
    2016-10-13T07:44:43.130668Z 0 [Note] WSREP: Service disconnected.
    2016-10-13T07:44:44.130794Z 0 [Note] WSREP: Some threads may fail to exit.
    2016-10-13T07:44:44.130823Z 0 [Note] Binlog end
    2016-10-13T07:44:44.130956Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

    2016-10-13T07:44:44.142021Z mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
    2016-10-13T07:46:31.000344Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
    2016-10-13T07:46:31.010145Z mysqld_safe Skipping wsrep-recover for 2bddc3e4-90f3-11e6-b012-66a4be778af1:425 pair
    2016-10-13T07:46:31.011705Z mysqld_safe Assigning 2bddc3e4-90f3-11e6-b012-66a4be778af1:425 to wsrep_start_position
    2016-10-13T07:46:31.230036Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
    2016-10-13T07:46:31.232209Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.14-8-57-log) starting as process 22401 ...
    2016-10-13T07:46:31.235301Z 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
    2016-10-13T07:46:31.235325Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
    2016-10-13T07:46:31.240039Z 0 [Note] WSREP: wsrep_load(): Galera 3.17(r447d194) by Codership Oy <info@codership.com> loaded successfully.
    2016-10-13T07:46:31.240111Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
    2016-10-13T07:46:31.240562Z 0 [Note] WSREP: Found saved state: 2bddc3e4-90f3-11e6-b012-66a4be778af1:425
    2016-10-13T07:46:31.241261Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.70.47; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 2147483647; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.i
    2016-10-13T07:46:31.260034Z 0 [Note] WSREP: Service thread queue flushed.
    2016-10-13T07:46:31.260149Z 0 [Note] WSREP: Assign initial position for certification: 425, protocol version: -1
    2016-10-13T07:46:31.260201Z 0 [Note] WSREP: wsrep_sst_grab()
    2016-10-13T07:46:31.260210Z 0 [Note] WSREP: Start replication
    2016-10-13T07:46:31.260226Z 0 [Note] WSREP: Setting initial position to 2bddc3e4-90f3-11e6-b012-66a4be778af1:425
    2016-10-13T07:46:31.260342Z 0 [Note] WSREP: protonet asio version 0
    2016-10-13T07:46:31.260501Z 0 [Note] WSREP: Using CRC-32C for message checksums.
    2016-10-13T07:46:31.260554Z 0 [Note] WSREP: backend: asio
    2016-10-13T07:46:31.260643Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
    2016-10-13T07:46:31.260780Z 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
    2016-10-13T07:46:31.260797Z 0 [Note] WSREP: restore pc from disk failed
    2016-10-13T07:46:31.261654Z 0 [Note] WSREP: GMCast version 0
    2016-10-13T07:46:31.261990Z 0 [Note] WSREP: (27fef24a, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
    2016-10-13T07:46:31.262008Z 0 [Note] WSREP: (27fef24a, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
    2016-10-13T07:46:31.262597Z 0 [Note] WSREP: EVS version 0
    2016-10-13T07:46:31.262759Z 0 [Note] WSREP: gcomm: connecting to group 'testcluster', peer '192.168.70.47:,192.168.70.48:'
    2016-10-13T07:46:31.264461Z 0 [Note] WSREP: (27fef24a, 'tcp://0.0.0.0:4567') connection established to 27fef24a tcp://192.168.70.47:4567
    2016-10-13T07:46:31.264488Z 0 [Warning] WSREP: (27fef24a, 'tcp://0.0.0.0:4567') address 'tcp://192.168.70.47:4567' points to own listening address, blacklisting
    2016-10-13T07:46:31.264552Z 0 [Note] WSREP: (27fef24a, 'tcp://0.0.0.0:4567') connection established to 27fef24a tcp://192.168.70.47:4567
    2016-10-13T07:46:34.265129Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
    2016-10-13T07:46:34.265218Z 0 [Note] WSREP: view(view_id(NON_PRIM,27fef24a,1) memb {
    27fef24a,0
    } joined {
    } left {
    } partitioned {
    })
    2016-10-13T07:46:34.765464Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50289S), skipping check
    2016-10-13T07:47:04.280984Z 0 [Note] WSREP: view((empty))
    2016-10-13T07:47:04.281243Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
    at gcomm/src/pc.cpp:connect():162
    2016-10-13T07:47:04.281281Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
    2016-10-13T07:47:04.281398Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1407: Failed to open channel 'testcluster' at 'gcomm://192.168.70.47,192.168.70.48': -110 (Connection timed out)
    2016-10-13T07:47:04.281433Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
    2016-10-13T07:47:04.281453Z 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.70.47,192.168.70.48) failed: 7
    2016-10-13T07:47:04.281466Z 0 [ERROR] Aborting

    2016-10-13T07:47:04.281481Z 0 [Note] Forcefully disconnecting 0 remaining clients
    2016-10-13T07:47:04.281494Z 0 [Note] WSREP: Service disconnected.
    2016-10-13T07:47:05.281674Z 0 [Note] WSREP: Some threads may fail to exit.
    2016-10-13T07:47:05.281718Z 0 [Note] Binlog end
    2016-10-13T07:47:05.281914Z 0 [Note] /usr/sbin/mysqld: Shutdown complete

    2016-10-13T07:47:05.298227Z mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended



    and here is my my.cnf configuration at node 1 ( and not much diffrent beetwen node 1 and 2 )


    [client]
    port = 3306
    socket = /var/run/mysqld/mysqld.sock

    # Here is entries for some specific programs
    # The following values assume you have at least 32M ram

    # This was formally known as [safe_mysqld]. Both versions are currently parsed.
    [mysqld_safe]
    socket = /var/run/mysqld/mysqld.sock
    nice = 0

    [mysqld]

    user = mysql
    pid-file = /var/run/mysqld/mysqld.pid
    socket = /var/run/mysqld/mysqld.sock
    port = 3306
    basedir = /usr
    datadir = /var/lib/mysql
    tmpdir = /tmp
    lc-messages-dir = /usr/share/mysql
    skip-external-locking

    bind-address = 192.168.70.47

    #key_buffer = 16M
    max_allowed_packet = 16M
    thread_stack = 192K
    thread_cache_size = 8
    #myisam-recover = BACKUP
    max_connections = 100
    query_cache_limit = 1M
    log_error = /var/log/mysql/error.log
    slow_query_log_file = /var/log/mysql/mysql-slow.log
    slow_query_log = 1
    long_query_time = 2
    log_queries_not_using_indexes

    server-id = 1
    log_bin = /var/log/mysql/mysql-bin.log
    expire_logs_days = 10
    max_binlog_size = 100M
    #binlog_do_db = include_database_name
    #binlog_ignore_db = include_database_name

    [mysqldump]
    quick
    quote-names
    max_allowed_packet = 16M

    [mysql]
    #no-auto-rehash # faster start of mysql but no tab completition

    [isamchk]
    #key_buffer = 16M


    !includedir /etc/mysql/conf.d/

    [mysqld]

    datadir=/var/lib/mysql

    wsrep_provider=/usr/lib/libgalera_smm.so
    wsrep_cluster_address=gcomm://192.168.70.47,192.168.70.48


    binlog_format=ROW

    #thread-cache-size=150M

    #query_cache_limit=1M

    # MyISAM storage engine has only experimental support
    default_storage_engine=InnoDB
    innodb-strict-mode = 1
    # This changes how InnoDB autoincrement locks are managed and is a requirement for Galera
    nnodb_autoinc_lock_mode=2
    innob_buffer_pool_size=10G
    innodb_log_file_size=512M
    innodb_log_buffer_size=1M
    innodb_flush_method = O_DIRECT
    innodb_log_files_in_group = 2
    innodb_flush_log_at_trx_commit = 1
    innodb_file_per_table = 1


    # Node #1 address
    wsrep_node_address=192.168.70.47

    # SST method
    wsrep_sst_method=xtrabackup

    # Cluster name
    wsrep_slave_threads=2
    wsrep_cluster_name=testcluster
    wsrep_sst_method=rsync
    wsrep_node_name=dbnode1


    # Authentication for SST method
    wsrep_sst_auth="secret:secret"



    i hope someone can help, thanks



  • #2
    Moved this post as it was in wrong category.
    After restart, the node could not reach the other cluster nodes, probably none from the wsrep_cluster_address list was up. If this was the first node to restart, it should be bootstrapped.

    Comment

    Working...
    X