Announcement

Announcement Module
Collapse
No announcement yet.

Percona Cluster node goes down.

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Percona Cluster node goes down.

    Hi there,

    I have PXC setup in Amazon VPC, all nodes are in same region but one node from three is in different availability zone. One some point in time one node fails without any meaningful output in logs:

    Code:
    2014-04-03 01:30:38 8514 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.68236S), skipping check
    2014-04-03 01:30:40 8514 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.53889S), skipping check
    140403 03:30:39 mysqld_safe Number of processes running now: 0
    140403 03:30:39 mysqld_safe WSREP: not restarting wsrep node automatically
    140403 03:30:39 mysqld_safe mysqld from pid file /var/lib/mysql/ip-10-1-7-180.pid ended
    This node is the one that in another availability zone.

    This is my.cnf file:

    Code:
    [mysqld]
    datadir=/var/lib/mysql
    user=mysql
    wsrep_provider=/usr/lib64/libgalera_smm.so
    wsrep_cluster_address=gcomm://10.1.7.180,10.1.8.159,10.1.8.16
    binlog_format=ROW
    default_storage_engine=InnoDB
    innodb_locks_unsafe_for_binlog=1
    innodb_buffer_pool_size = 5632M
    innodb_log_buffer_size = 4M
    max_connect_errors = 10000
    key_buffer_size = 2048M
    max_allowed_packet = 50M
    table_open_cache = 1024
    sort_buffer_size = 2M
    read_buffer_size = 2M
    read_rnd_buffer_size = 80M
    myisam_sort_buffer_size = 64M
    thread_cache_size = 32
    query_cache_size = 32M
    innodb_thread_concurrency = 8
    innodb_flush_method=O_DIRECT
    innodb_log_file_size=1G
    innodb_autoinc_lock_mode=2
    wsrep_node_address=10.1.7.180
    wsrep_sst_method=xtrabackup
    wsrep_cluster_name=my_centos_cluster
    wsrep_sst_auth="sstuser:s3cret"
    max_connections = 4000
    [mysql]
    prompt=\\u@\\h [\\d]>\\_
    The question is how can I investigate the root cause of the failure please? Also another question, what would be if update query will arrive on the node that is in "Joining: receiving State Transfer" state

    Thank you in advance.

  • #2
    A message like "140403 03:30:39 mysqld_safe Number of processes running now: 0" without anything logged by mysql prior to that, means your mysqld process was killed, most likely by OOMkiller. Check the system log.
    Joining node will refuse to accept connections until it synchronizes with cluster.

    Comment

    Working...
    X