Catastrophic failure with Percona XtraDB Cluster

  • Filter
  • Time
  • Show
Clear All
new posts

  • Catastrophic failure with Percona XtraDB Cluster

    Hi folks,

    I've been racking my brain over this for the last two days without avail.

    I'm somewhat a victim of combined issues creating one massive issue which has rendered our entire database cluster completely dead.
    Effectively what happened was an operating system update caught a sysctl flag that apparently no longer existed (though did work perfectly fine prior to this update), which killed procps and placed it into an unrecoverable state until the flag was removed, at which time communications between our nodes seemed to have broken, and when they were also automatically updated, instead of dying gracefully, they died a horrible death.

    The only properly working backup that we have is from 2 weeks ago, however i'd prefer to get as much data back as i can, for obvious reasons.

    So, the problem that i'm seeing is that when you start mysql, it starts perfectly fine, however the error logs show a slew of innodb errors indicating that the innodb lsn's are majorly out of sync; I've tried using every single one of the innodb_force_recovery options, and have also tried a gdb lsn update on a copied filesystem, none of which works.

    innochecksum shows all of the individual table files are valid, yet when i login to mysql, i can list the databases, use the database, list the tables, but cannot use or describe any of the tables; the server throws a database.table does not exist error, at the same time, an error message in the error log shows that it apparently can't access the table.frm or table.ibd, however they exist and are accessible, with the correct permissions.

    I've tried absolutely everything that I can think of, and am somewhat at the end of my rope.

    Any assistance would be most appreciated.

    Currently running my.cnf is attached, with wsrep statements removed in order to prevent replication of failed databases during recovery etc.

    On one of the servers we get this on start-up:

    InnoDB: space id 1636 did not exist in memory. Retrying an open.
    InnoDB: error: space object of table 'database/omments',

    The other we get messages like this:

    InnoDB: Error: page xx log sequence number yyyy
    InnoDB: is in the future! Current system log sequence number zzzzz.
    InnoDB: Your database may be corrupt or you may have copied the InnoDB
    InnoDB: tablespace but not the InnoDB log files.

    Error upon attempted access of table:

    [ERROR] Cannot find or open table database/user from
    the internal data dictionary of InnoDB though the .frm file for the
    table exists. Maybe you have deleted and recreated InnoDB data
    files but have forgotten to delete the corresponding .frm files
    of InnoDB tables, or you have moved .frm files to another database?
    or, the table contains indexes that this version of the engine
    doesn't support.
    See http://dev.mysql.com/doc/refman/5.5/...eshooting.html
    how you can resolve the problem.

    Kind Regards,


  • #2
    Matt, I had a brief chat with you in Skype.
    I described you briefly how to recover .ibd file knowing table structure.
    I wrote blog post about that:

    You can try to recover data first (this is high priority for you) and then to fix cluster.