GET 24/7 LIVE HELP NOW

Announcement

Announcement Module
Collapse
No announcement yet.

2 Nodes are going out of sync in 3 Nodes PXC set up at the time of re syncing

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2 Nodes are going out of sync in 3 Nodes PXC set up at the time of re syncing

    2 Nodes are going out of sync in 3 Nodes Percona XtraDB cluster set up at the time of re syncing:

    Requesting you Guys in helping to re sync all 3 nodes.
    Thanks you..
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    Used RPMS:
    Percona-XtraDB-Cluster-shared-5.5.27-23.6.356.rhel6.x86_64
    Percona-XtraDB-Cluster-server-5.5.27-23.6.356.rhel6.x86_64
    percona-release-0.0-1.x86_64
    Percona-XtraDB-Cluster-client-5.5.27-23.6.356.rhel6.x86_64
    Percona-XtraDB-Cluster-galera-2.0-1.114.rhel6.x86_64
    percona-xtrabackup-2.0.3-470.rhel6.x86_64

    OS: CentOS release 6.3 (Final)
    Environment: Virtual Systems.
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------

    Here is the mysql-error log from all 3 nodes:
    Node 2: which is up
    WSREP: FK key len exceeded 0 4294967295 3500
    131227 2:58:46 [ERROR] WSREP: FK key set failed: 11
    WSREP: FK key append failed

    Node 3: is down
    131227 5:00:11 [Note] WSREP: sst_donor_thread signaled with 0
    131227 5:00:11 [Note] WSREP: Flushing tables for SST...
    131227 5:00:11 [Note] WSREP: Provider paused at cf67b4da-6ea7-11e3-0800-7176739bc3d8:261
    131227 5:00:11 [Note] WSREP: Tables flushed.
    InnoDB: Warning: a long semaphore wait:
    --Thread 139738020943616 has waited at trx0rseg.ic line 46 for 241.00 seconds the semaphore:
    X-lock (wait_ex) on RW-latch at 0x7f177f07a6b8 '&block->lock'
    a writer (thread id 139738020943616) has reserved it in mode wait exclusive
    number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
    Last time read locked in file buf0flu.c line 1319
    Last time write locked in file /home/jenkins/workspace/percona-xtradb-cluster-rpms/label_exp/centos6-64/target/BUILD/Percona-XtraDB-Cluster-5.5.27/Percona-XtraDB-Cluster-5.5.27/storage/innobase/include/trx0rseg.ic line 46
    InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:

    ----------
    SEMAPHORES
    ----------
    OS WAIT ARRAY INFO: reservation count 46, signal count 44
    --Thread 139738020943616 has waited at trx0rseg.ic line 46 for 271.00 seconds the semaphore:
    X-lock (wait_ex) on RW-latch at 0x7f177f07a6b8 '&block->lock'
    a writer (thread id 139738020943616) has reserved it in mode wait exclusive
    number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
    Last time read locked in file buf0flu.c line 1319
    Last time write locked in file /home/jenkins/workspace/percona-xtradb-cluster-rpms/label_exp/centos6-64/target/BUILD/Percona-XtraDB-Cluster-5.5.27/Percona-XtraDB-Cluster-5.5.27/storage/innobase/include/trx0rseg.ic line 46
    Mutex spin waits 38, rounds 925, OS waits 30
    RW-shared spins 15, rounds 432, OS waits 14
    RW-excl spins 1, rounds 60, OS waits 2
    Spin rounds per wait: 24.34 mutex, 28.80 RW-shared, 60.00 RW-excl

    RANSACTIONS
    ------------
    Trx id counter A0E406071
    Purge done for trx's n < A0E40606E undo n < 0
    History list length 618
    LIST OF TRANSACTIONS FOR EACH SESSION:
    ---TRANSACTION A0E40606E, not started
    MySQL thread id 3, OS thread handle 0x7f174a757700, query id 2974 committed 260
    ---TRANSACTION A0E406070, not started
    MySQL thread id 1, OS thread handle 0x7f1b16edb700, query id 2976 committed 261
    ----------------------------
    END OF INNODB MONITOR OUTPUT
    ============================
    InnoDB: ###### Diagnostic info printed to the standard error stream
    InnoDB: Warning: a long semaphore wait:
    --Thread 139738020943616 has waited at trx0rseg.ic line 46 for 303.00 seconds the semaphore:
    X-lock (wait_ex) on RW-latch at 0x7f177f07a6b8 '&block->lock'
    a writer (thread id 139738020943616) has reserved it in mode wait exclusive
    number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
    Last time read locked in file buf0flu.c line 1319
    Last time write locked in file /home/jenkins/workspace/percona-xtradb-cluster-rpms/label_exp/centos6-64/target/BUILD/Percona-XtraDB-Cluster-5.5.27/Percona-XtraDB-Cluster-5.5.27/storage/innoba
    se/include/trx0rseg.ic line 46
    InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
    InnoDB: Pending preads 0, pwrites 0

    Node 1: down
    131227 4:49:46 [Note] WSREP: 1 (Node3): State transfer from 0 (Node1) complete.
    131227 4:49:46 [Note] WSREP: Member 1 (Node3) synced with group.
    05:00:03 UTC - mysqld got signal 11 ;
    This could be because you hit a bug. It is also possible that this binary
    or one of the libraries it was linked against is corrupt, improperly built,
    or misconfigured. This error can also be caused by malfunctioning hardware.
    We will try our best to scrape up some info that will hopefully help
    diagnose the problem, but since we have already crashed,
    something is definitely wrong and this may fail.
    Please help us make Percona Server better by reporting any
    bugs at http://bugs.percona.com/

    131227 5:11:34 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
    131227 5:11:34 [Note] WSREP: forgetting 49cd72df-6eb2-11e3-0800-3db8fd926ddb (tcp://XXX.XXX.XXX.53-Node3:4567)
    131227 5:11:34 [Note] WSREP: (bf5de37d-6eb3-11e3-0800-1b8b698cefc9, 'tcp://0.0.0.0:4567') turning message relay requesting off
    131227 5:11:34 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 5ac327af-6eb5-11e3-0800-8a7f196d2532
    131227 5:11:34 [Note] WSREP: STATE EXCHANGE: sent state msg: 5ac327af-6eb5-11e3-0800-8a7f196d2532
    131227 5:11:34 [Note] WSREP: STATE EXCHANGE: got state msg: 5ac327af-6eb5-11e3-0800-8a7f196d2532 from 0 (Node1)
    131227 5:11:34 [Note] WSREP: STATE EXCHANGE: got state msg: 5ac327af-6eb5-11e3-0800-8a7f196d2532 from 1 (Node2)
    131227 5:11:34 [Note] WSREP: Quorum results:
    version = 2,
    component = PRIMARY,
    conf_id = 4,
    members = 1/2 (joined/total),
    act_id = 864,
    last_appl. = 835,
    protocols = 0/4/2 (gcs/repl/appl),
    group UUID = cf67b4da-6ea7-11e3-0800-7176739bc3d8
    131227 5:11:34 [Warning] WSREP: Donor 49cd72df-6eb2-11e3-0800-3db8fd926ddb is no longer in the group. State transfer cannot be completed, need to abort. Aborting...
    131227 5:11:34 [Note] WSREP: /usr/sbin/mysqld: Terminated.
    131227 05:11:34 mysqld_safe mysqld from pid file /mnt/data//Node1.pid ended

    131227 5:24:10 [Note] WSREP: Assign initial position for certification: 960, protocol version: 2
    131227 5:24:10 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (cf67b4da-6ea7-11e3-

    131227 5:25:53 [Note] WSREP: Quorum results:
    version = 2,
    component = NON-PRIMARY,
    conf_id = -1,
    members = 1/1 (joined/total),
    act_id = -1,
    last_appl. = -1,
    protocols = -1/-1/-1 (gcs/repl/appl),
    group UUID = 00000000-0000-0000-0000-000000000000
    131227 5:25:53 [Note] WSREP: Flow-control interval: [8, 16]
    131227 5:25:53 [Note] WSREP: Received NON-PRIMARY.
    131227 5:25:53 [Note] WSREP: Shifting JOINER -> OPEN (TO: 961)
    131227 5:25:59 [Note] WSREP: cleaning up f9d65922-6eb6-11e3-0800-4de8ca27dd9e (tcp://XXX.XXX.XXX.52-Node2:4567)
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------


  • #2
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    Node 2 my.cnf: mysqld section: which is up
    [mysqld]
    # GENERAL #
    user = mysql
    default_storage_engine = InnoDB

    server_id=1
    wsrep_cluster_address=gcomm://
    wsrep_provider=/usr/lib64/libgalera_smm.so
    wsrep_slave_threads=2
    wsrep_cluster_name= ecomm
    wsrep_sst_method=rsync
    wsrep_node_name=Node2
    wsrep_sst_receive_address=XXX.XXX.XXX.52-Node2

    # MyISAM #
    key_buffer_size = 32M
    myisam_recover = FORCE,BACKUP

    # SAFETY #
    max_allowed_packet = 64M
    max_connect_errors = 1000000
    skip_name_resolve
    sql_mode = STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_ AUTO_CREATE_USER,NO_AUTO_VALUE_
    ON_ZERO,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZER O_IN_DATE,ONLY_FULL_GROUP_BY
    sysdate_is_now = 1
    innodb = FORCE
    innodb_strict_mode = 1

    # DATA STORAGE #
    datadir = /mnt/data/

    # BINARY LOGGING #
    log_bin = /mnt/data/mysql-bin
    expire_logs_days = 14
    sync_binlog = 1
    binlog_format = ROW

    # CACHES AND LIMITS #
    tmp_table_size = 128M
    max_heap_table_size = 128M
    query_cache_type = 0
    query_cache_size = 8
    max_connections = 2010
    thread_cache_size = 50
    open_files_limit = 65535
    table_definition_cache = 4096
    table_open_cache = 12000

    # INNODB #
    innodb_flush_method = O_DIRECT
    innodb_log_files_in_group = 2
    innodb_log_file_size = 256M
    innodb_flush_log_at_trx_commit = 1
    innodb_file_per_table = 1
    innodb_buffer_pool_size = 14G
    innodb_locks_unsafe_for_binlog = 1
    innodb_autoinc_lock_mode = 2
    wait_timeout = 1500
    interactive_timeout = 1500

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    Node 1 my.cnf: mysqld section: which is down
    [mysqld]

    # GENERAL #
    user = mysql
    default_storage_engine = InnoDB

    server_id=2
    wsrep_cluster_address=gcomm://XXX.XXX.XXX.52-Node2
    wsrep_provider=/usr/lib64/libgalera_smm.so
    wsrep_slave_threads=2
    wsrep_cluster_name= ecomm
    wsrep_sst_method=rsync
    wsrep_node_name=Node1
    wsrep_sst_receive_address=XXX.XXX.XXX.51-Node1

    # MyISAM #
    key_buffer_size = 32M
    myisam_recover = FORCE,BACKUP

    # SAFETY #
    max_allowed_packet = 64M
    max_connect_errors = 1000000
    skip_name_resolve
    sql_mode = STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_ AUTO_CREATE_USER,NO_AUTO_VALUE_
    ON_ZERO,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZER O_IN_DATE,ONLY_FULL_GROUP_BY
    sysdate_is_now = 1
    innodb = FORCE
    innodb_strict_mode = 1

    # DATA STORAGE #
    datadir = /mnt/data/

    # BINARY LOGGING #
    log_bin = /mnt/data/mysql-bin
    expire_logs_days = 14
    sync_binlog = 1
    binlog_format = ROW

    # CACHES AND LIMITS #
    tmp_table_size = 128M
    max_heap_table_size = 128M
    query_cache_type = 0
    query_cache_size = 8
    max_connections = 2010
    thread_cache_size = 50
    open_files_limit = 65535
    table_definition_cache = 4096
    table_open_cache = 12000

    # INNODB #
    innodb_flush_method = O_DIRECT
    innodb_log_files_in_group = 2
    innodb_log_file_size = 256M
    innodb_flush_log_at_trx_commit = 1
    innodb_file_per_table = 1
    innodb_buffer_pool_size = 14G
    innodb_locks_unsafe_for_binlog = 1
    innodb_autoinc_lock_mode = 2
    wait_timeout = 1500
    interactive_timeout = 1500
    __________________________________________________ __________________________________________________ ______________________

    Comment


    • #3
      Try changing two things: PXC never version then 5.5.27 (latest if possible), and wsrep_sst_method=xtrabackup (much less locking then rsync).

      Comment


      • #4
        Hi Przemek, Thanks for the suggestion.
        Actually all these days, all the 3 nodes are working quite fine. Now and then nodes went out of sync and we used to take down time and resync the nodes. After this cluster used to come back to normalcy.
        But in the latest scenario, 2 nodes went out of sync. Actions done are below.
        1) Took downtime 2) resynced the nodes from surviving Node 2 , resync successfully completed 3) created a blank schema in one node and verified the same in other nodes. blank schema synchronized in other nodes also. i.e, OK. 4) After 15 minutes, nodes went out of sync again. ERRORS are posted above.
        From the errors, errors are different from each node which are out of sync.(Node 1 & Node 3)
        Could u pl check these errors and suggest what can be done to fix and bring nodes to sync and stay up and running. Our problem is nodes goes out of sync very frequently.

        As a long term action, we can upgrade the PXC and galera to latest version.
        But for immediate action, any suggestions , so that 3 node cluster comes back to working status as it was working previously.

        Comment


        • #5
          Also to give info on the cluster type, this is a multi master 3 node cluster ( all are masters).
          Please suggest..

          Comment


          • #6
            Thanks Przemek.

            Just to check, Galera 2.0 does not support wsrep_sst_method=xtrabackup (pls correct me, If i'm wrong).

            Also we are using 5 HAProxy clients in 5 App system for Application connection to cluster database.

            Also write always happens to 1 node only from all 5 APP(HAProxy), to avoid deadlock.
            Last edited by Poorna PC; 12-28-2013, 04:44 AM.

            Comment


            • #7
              Poorna PC,

              wsrep_sst_method=xtrabackup - it's ok for galera 2.0
              you can read about pros and cons on codership site:
              http://www.codership.com/wiki/doku.php?id=sst_mysql

              the one of errors is similar to error described in bug:
              https://bugs.launchpad.net/codership-mysql/+bug/1057910

              This issue has not been reproduced so far. Code analysis shows that foreign key check will fail if one of the parts in the key has NULL value.
              The bug fixed in 5.5.28 version.
              So I'd suggest to upgrade.

              Comment


              • #8
                Thanks Mixa..

                Comment

                Working...
                X