Emergency

Percona xtradb cluster crash: HA_ERR_ROW_IS_REFERENCED

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Percona xtradb cluster crash: HA_ERR_ROW_IS_REFERENCED

    Hi,

    We migrated a series of 100G Databases into different Galera Instances.Its running fine so far. I do have 5-6 years experience with Galera.
    The cluster has 5 Nodes and the Loadbalancers always targets 1 Node for Read/Write and only takes the second(third...) when the primary node is down (or reports wsrep_ready off).

    Version is the same on all servers:
    Server version: 5.6.37-82.2-56-log Percona XtraDB Cluster (GPL), Release rel82.2, Revision 114f2f2, WSREP version 26.21, wsrep_26.21


    In the night, always roughly at the same time, all 4 non-active nodes drop-out of the cluster at the same time with this error:
    2018-01-29 01:26:56 33304 [ERROR] Slave SQL: Could not execute Delete_rows event on table storedb.object; Cannot delete or update a parent row: a foreign key constraint fails (`storedb`.`object`, CONSTRAINT `fk_object_object` FOREIGN KEY (`parentid`) REFERENCES `object` (`objectid`)), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 250, Error_code: 1451
    2018-01-29 01:26:56 33304 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 3097067
    2018-01-29 01:26:56 33304 [Warning] WSREP: Failed to apply app buffer: seqno: 3097067, status: 1


    The table is looking like this:
    | object | CREATE TABLE `object` (
    `objectid` int(11) NOT NULL AUTO_INCREMENT,
    `classid` int(11) NOT NULL,
    `parentid` int(11) DEFAULT NULL,
    `alias` varchar(255) COLLATE utf8_bin NOT NULL,
    `siteid` int(11) NOT NULL,
    PRIMARY KEY (`objectid`),
    UNIQUE KEY `u_object_guid` (`guid`),
    UNIQUE KEY `u_object_alias` (`parentid`,`alias`),
    KEY `i_object_class` (`classid`),
    KEY `i_object_siteid_classid_alias` (`siteid`,`classid`,`alias`),
    CONSTRAINT `fk_object_object` FOREIGN KEY (`parentid`) REFERENCES `object` (`objectid`),
    CONSTRAINT `fk_object_object_1` FOREIGN KEY (`siteid`) REFERENCES `object` (`objectid`)
    ) ENGINE=InnoDB AUTO_INCREMENT=<verylargenumber> DEFAULT CHARSET=utf8 COLLATE=utf8_bin |


    The Query that is generating the error is *really* easy:
    DELETE FROM `storedb`.`object` WHERE objectid='highnumber';


    I saved the datadir from the moment before the cluster crashed (just kept one node out of the cluster after the crash and started it without wsrep provider to have a state to play with).
    I can issue the query above with no problem. It deletes the row.

    A corresponding GRA.log file is generated but mysqbinlog complains that it cant read the event:
    ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 105, event_type: 32


    I am running weekly consistency checks (percona pt-table-checksum) and they report no inconsistency.

    Any idea what is happening here?

    Greetings
    julian

  • #2
    Hey,

    as an update: It is NOT happening when wsrep_slave_threads are set to 1.
    But this is not a solution as it slows down the Cluster.

    Comment


    • #3
      Hi,

      Do you have reproducible test-case? I tried looking at the output but don't foresee any immediate clue.

      Comment


      • #4
        Hi,

        I am still working on that. It seems quite hard to trigger (with my limited skills in programming). With bash, mysql binary and mysqlimport, even in parrallel i can't trigger it. I *do* have code inside a big application i can run and it crashes in a few seconds but that is nothing i can provide so far.

        Greetings

        Comment

        Working...
        X