Announcement

Announcement Module
Collapse
No announcement yet.

Lost server synchronisation in a three node cluster

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lost server synchronisation in a three node cluster

    Hi,

    I have a three node cluster, and I lost synchronisation regularly with one of the server. Now I have corruption in the InnoDB tablespace, I feel uncomfortable to run a innodb_force_recovery=6 because I have 2 important production dabatases on these servers. I retired the failed server from the cluster, reinstall a new one, and all the symptoms reappear : synchronisation lost, and corruption once again.
    Servers are Ubuntu 10.04.4 LTS, and I use percona-xtradb-cluster-server-5.5 (version 5.5.31-23.7.5-438).

    You can find the error log in the attached file, and below the my.cnf file :

    Code:
    [client]
    password        = 'xxxxx'
    port            = 3306
    socket          = /var/run/mysqld/mysqld.sock
    
    [mysqld_safe]
    wsrep_urls=gcomm://192.168.183.40:4567,gcomm://192.168.183.41:4567,gcomm://192.168.183.42:4567
    
    [mysqld]
    datadir=/var/lib/mysql
    user=mysql
    
    binlog_format=ROW
    
    wsrep_provider=/usr/lib64/libgalera_smm.so
    
    wsrep_slave_threads=2
    wsrep_cluster_name=prod_pa
    wsrep_sst_method=rsync
    wsrep_node_name=lxpadb03
    
    default_storage_engine=InnoDB
    innodb_locks_unsafe_for_binlog=1
    innodb_autoinc_lock_mode=2
    
    #tuning
    max_allowed_packet             = 16M
    max_connect_errors             = 1000000
    skip_name_resolve
    query_cache_size=0
    query_cache_type=0
    tmp_table_size                 = 32M
    max_heap_table_size            = 32M
    max_connections                = 500
    thread_cache_size              = 50
    open_files_limit               = 65535
    table_definition_cache         = 4096
    table_open_cache               = 4096
    # INNODB #
    innodb_flush_method            = O_DIRECT
    innodb_log_files_in_group      = 2
    innodb_log_file_size           = 256M
    innodb_flush_log_at_trx_commit = 1
    innodb_file_per_table          = 1
    innodb_buffer_pool_size        = 3072M
    Thanks in advance.
    Laeti

  • #2
    So this Innodb corruption happened only on this single node? And by a reinstalling a node you mean it's a new install on the same machine or completely different machine?
    If only single machine shows data corruption I would check dmesg & /var/log/syslog for any signs of disk or memory errors. Memcheck would be good to have too.
    If all nodes are experiencing data corruption, I would try taking of them off the cluster and mysldump all data if possible, probably in one of the innodb_force_recovery modes.

    Comment


    • #3
      Yes the Innodb corruption happens only on this node. When I said reinstalling, I created a new virtual machine, but only kept the same hostname and ip address. The three servers are virtual machines.
      There is nothing in dmesg and /var/log/syslog.
      What I noticed in /var/log/syslog is that the synchronisation is not done one the second database. I see these kind of lines : rsync to rsync_sst/./mysqlslap or rsync to rsync_sst/./performance_schema, rsync to rsync_sst/./private_prod but never for toplink_prod database.

      Thanks.
      Laeti

      Comment


      • #4
        Any other differences between this failing node and two other nodes? All living on the same host server?
        There is also a chance the data is corrupted on source node from which SST was performed. I would suggest trying Percona XtraBackup from the lxpadb01 node and check how preparing and using this backup on another test host works.

        Comment

        Working...
        X