Occassional node failure (FK related)

  • Filter
  • Time
  • Show
Clear All
new posts

  • Occassional node failure (FK related)

    Hi all,

    We have a 3 node cluster set up, with one of the three nodes acting as a replication slave to a standard Percona Server. There is no other traffic on the server at this time.

    Strangely, we're finding that every now and then one node (sometimes 2 nodes) will die due to node consistency being compromised . The transactions in question are being committed without problem on the replication master and the node in the cluster that is acting as replication slave.

    We're currently running the latest version of Percona Cluster as installed from the Percona repository (Percona-XtraDB-Cluster-server-5.5.28-23.7.369.rhel6.x86_64 and Percona-XtraDB-Cluster-galera-2.0-1.117.rhel6.x86_64) and have the following configuration applied on all three nodes (with appropriate address changes for various wsrep settings):

    [mysqld_safe]wsrep_urls=gcomm://,gcomm://[mysqld]datadir=/var/lib/mysqluser=mysqlmax_connections=4000server-id=40binlog_format=ROWdefault_storage_engine=InnoD Blong_query_time=10log_error=/var/log/mysql/error_loglog_slow_queries=/var/log/mysql/slow_query_loglog_slow_verbosity=fullsync_binlog=0 log_bin = bin-loglog_slave_updates = 1wsrep_node_address= _address= reentcewsrep_provider=/usr/lib64/libgalera_smm.sowsrep_slave_threads=2wsrep_sst_met hod=xtrabackupwsrep_cluster_name=percona_clusterws rep_node_name=node1wsrep_replicate_myisam=1innodb_ buffer_pool_instances=1innodb_data_home_dir=/data/mysqlinnodb_log_group_home_dir=/data/mysqlinnodb_data_file_path=ibdata1:1G:autoextendin nodb_buffer_pool_size=96Ginnodb_flush_method=ALL_O _DIRECTinnodb_additional_mem_pool_size=256Minnodb_ log_buffer_size=256Minnodb_log_file_size=256Minnod b_flush_log_at_trx_commit=0innodb_read_io_threads= 8innodb_autoinc_lock_mode=2innodb_locks_unsafe_for _binlog=1

    When the error does occur, the following error occurs (always involving the same tables):

    121123 15:03:32 [ERROR] Slave SQL: Could not execute Write_rows event on table nablive.t_schedule_payment; Cannot add or update a child row: a foreign key constraint fails (`ntproddb`.`t_schedule_payment`, CONSTRAINT `t_schedule_payment_ibfk_1` FOREIGN KEY (`sched_id`) REFERENCES `t_schedule` (`id`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 232, Error_code: 1452121123 15:03:32 [Warning] WSREP: RBR event 2 Write_rows apply warning: 151, 1663670121123 15:03:32 [ERROR] WSREP: Failed to apply trx: source: eed70aa6-346b-11e2-0800-24241b746e7d version: 2 local: 0 state: APPLYING flags: 1 conn_id: 4 trx_id: 676177914 seqnos (l: 59090, g: 1663670, s: 1663669, d: 1663365, ts: 1353643412747819106)121123 15:03:32 [ERROR] WSREP: Failed to apply app buffer: ���P�, seqno: 1663670, status: WSREP_FATAL at galera/src/replicator_smm.cpp:apply_wscoll():49 at galera/src/replicator_smm.cpp:apply_trx_ws():120121123 15:03:32 [ERROR] WSREP: Node consistency compromized, aborting...121123 15:03:32 [Note] WSREP: Closing send monitor...121123 15:03:32 [Note] WSREP: Closed send monitor.121123 15:03:32 [Note] WSREP: gcomm: terminating thread121123 15:03:32 [Note] WSREP: gcomm: joining thread121123 15:03:32 [Note] WSREP: gcomm: closing backend121123 15:03:32 [Note] WSREP: view(view_id(NON_PRIM,05aece68-3511-11e2-0800-5e30f373f04f,9) memb { 05aece68-3511-11e2-0800-5e30f373f04f,} joined {} left {} partitioned { eed70aa6-346b-11e2-0800-24241b746e7d, fbc2a20c-3503-11e2-0800-2365c6b05d93,})121123 15:03:32 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1121123 15:03:32 [Note] WSREP: view((empty))121123 15:03:32 [Note] WSREP: gcomm: closed121123 15:03:32 [Note] WSREP: Flow-control interval: [16, 16]121123 15:03:32 [Note] WSREP: Received NON-PRIMARY.121123 15:03:32 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1663674)121123 15:03:32 [Note] WSREP: Received self-leave message.121123 15:03:32 [Note] WSREP: Flow-control interval: [0, 0]121123 15:03:32 [Note] WSREP: Received SELF-LEAVE. Closing connection.121123 15:03:32 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 1663674)121123 15:03:32 [Note] WSREP: RECV thread exiting 0: Success121123 15:03:32 [Note] WSREP: recv_thread() joined.121123 15:03:32 [Note] WSREP: Closing slave action queue.121123 15:03:32 [Note] WSREP: /usr/sbin/mysqld: Terminated.121123 15:03:33 mysqld_safe Number of processes running now: 0121123 15:03:33 mysqld_safe WSREP: not restarting wsrep node automatically

    The create table statements for the two tables in the FK relationship that are at the centre of this error are:

    CREATE TABLE `t_schedule` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(50) NOT NULL, `version` int(11) NOT NULL, `client_ref` varchar(50) NOT NULL, `freq` int(11) NOT NULL, `start_dt` datetime NOT NULL, `to_dt` datetime DEFAULT NULL, `exec_cnt` int(11) DEFAULT NULL, `created_by` int(11) NOT NULL, `created_on` datetime NOT NULL, `payment_type` int(11) NOT NULL, `lastmod_by` int(11) DEFAULT NULL, `lastmod_on` datetime DEFAULT NULL, `merchant_id` varchar(6) NOT NULL, `type_id` smallint(6) NOT NULL DEFAULT '0', KEY `merchant_id_fk` (`merchant_id`), KEY `id` (`id`), KEY `fk_schedule_isof_type` (`type_id`), CONSTRAINT `merchant_id_fk` FOREIGN KEY (`merchant_id`) REFERENCES `t_merchants` (`Merchant_ID`), CONSTRAINT `t_schedule_ibfk_1` FOREIGN KEY (`type_id`) REFERENCES `t_schedule_type` (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=71507 DEFAULT CHARSET=latin1 CREATE TABLE `t_schedule_payment` ( `id` int(11) NOT NULL AUTO_INCREMENT, `version` int(11) NOT NULL, `processed` bit(1) NOT NULL, `cc_holder` varchar(50) DEFAULT NULL, `cc_exp_yy` varchar(2) DEFAULT NULL, `cc_exp_mm` varchar(2) DEFAULT NULL, `created_by` int(11) NOT NULL, `created_on` datetime NOT NULL, `cc_no` varchar(60) DEFAULT NULL, `transfer_dt` datetime NOT NULL, `lastmod_by` int(11) DEFAULT NULL, `lastmod_on` datetime DEFAULT NULL, `amt` int(11) DEFAULT NULL, `canceled` bit(1) NOT NULL, `pan` varchar(12) DEFAULT NULL, `sched_id` int(11) NOT NULL, `currency` char(3) NOT NULL DEFAULT 'AUD', `card_seq_no` char(3) DEFAULT NULL, `crn` varchar(20) DEFAULT NULL, `dd_bsb` varchar(20) DEFAULT NULL, `dd_acc_no` varchar(50) DEFAULT NULL, `dd_acc_name` varchar(255) DEFAULT NULL, `batch_id` int(11) DEFAULT NULL, `batch_trxid` int(11) DEFAULT NULL, KEY `id` (`id`), KEY `sched_id_fk` (`sched_id`), CONSTRAINT `t_schedule_payment_ibfk_1` FOREIGN KEY (`sched_id`) REFERENCES `t_schedule` (`id`) ON DELETE CASCADE) ENGINE=InnoDB AUTO_INCREMENT=3964335 DEFAULT CHARSET=latin1

    To rule out data corruption / referential integrity issues, we've run the following query on all the available nodes and gotten the same results:

    mysql> select distinct sched_id from t_schedule_payment where sched_id not in (select id from t_schedule);Empty set (2.47 sec)

    I've seen a few mentions of FK strangeness in this forum recently but it appeared that various patches had solved this issue. Is there something obvious (or not obvious, or undocumented etc) that we're missing here?

    If there's any further information we can provide, we definitely will.

  • #2
    I have the same issues but with primary keys and duplicate keys that in theory should not even happen.


    • #3
      anyone figure this out?


      • #4
        Foreign keys are pretty tricky it seems in this kind of clustering solution, plus parallel applying makes it even more complicated. However many FK-related bugs were fixed till now.
        This thread is pretty old, but Shockwavecs - do you experience anything similar?