How We Spent a Tuesday Fixing a MySQL Replication Bug

We found a simple XA transaction that crashes MySQL 5.5 replication. This simple transaction inserts a row into an InnoDB table and a TokuDB table. The bug was caused by a flaw in the logging code exposed by the transaction’s use of two XA storage engines (TokuDB and InnoDB). This bug was fixed in the TokuDB 6.0.1 release.

Here are some details.  Suppose that a database contains the following tables.

create table t1 (a int) engine=InnoDB
create table t2 (a int) engine=TokuDB

 The following transaction

insert into t1 values (1)
insert into t2 values (2)

causes the replication slave to crash.

The crash occurs when mysqld tries to dereference a NULL pointer.

#4  0x000000000088e203 in MYSQL_BIN_LOG::log_and_order (this=0x14b8640, thd=0x7f7758000af0, xid=161, all=true, need_prepare_ordered=false, need_commit_ordered=true) at /home/mariadb-5.5.25/sql/
7491      cache_mngr->using_xa= TRUE;
(gdb) p cache_mngr
$1 = (binlog_cache_mngr *) 0x0

We posted a description of the problem to the MySQL and MariaDB developers internals email lists and received some very helpful feedback.  The bug fix is to create the binlog_cache_mngr object if it has not yet been created in the log_and_order method and other similar places in the logging code.  Our Mariadb 5.5 patch can be found on launchpad in the  lp:~prohaska7/5.5-xa-rpl-crash-fix branch.

Share this post

Comments (4)

  • Justin Swanhart Reply

    That is great, but XA still isn’t replication safe. It says so right in the manual.

    July 16, 2012 at 4:20 pm
  • Daniël van Eeden Reply

    This seems to be a normal transaction (BEGIN instead of XA START). It is probably an internal XA transaction between the storage engine and the binlog.

    Does it only happen with replication or also with binlogs (e.g. for point-in-time recovery)?

    If it’s a real XA transaction, does XA RECOVER work after a restart? (the transaction must be prepared state of course)

    August 23, 2012 at 7:32 pm
    • Rich Prohaska Reply

      When MySQL commits an transaction that involves > 1 XA storage engine, it uses the 2 phase commit protocol in the commit. MySQL refers to this an an internal XA transaction. So, there are prepares to all of the storage engines followed by commits to all of the storage engines. If the MySQL binary log is enabled, transactions involving at least 1 XA storage engine also use a 2 phase commit protocol. The transaction is prepared in storage engines, the transaction is logged in the binlog, and finally the transaction is committed in the storage engines.

      August 23, 2012 at 9:21 pm

Leave a Reply