October 24, 2014

Can MySQL temporary tables be made safe for statement-based replication?

A while ago I wrote about how to make MySQL replication reliable, part of which is to eliminate temporary tables. The idea is this: if a slave is stopped (or crashed) while a temporary table is open and is then restarted, the temporary table doesn’t exist anymore, and the slave will have problems trying to replay any further statements that refer to these tables. Thus, I claimed, there’s no alternative but to eliminate temporary tables. This problem may not exist for row-based replication in MySQL 5.1 and later, but most installations I know of are using statement-based replication, even on MySQL 5.1

This is a contentious topic. People love their temporary tables and will ask hopefully “are you sure this isn’t safe?” They’ll propose all sorts of ways to mitigate the danger, and I’ve heard many of them. But I recently heard an angle on this I had not heard before.

The argument is this: “you can create an InnoDB temporary table and use it only within one transaction, and then if the slave crashes and restarts, it’ll roll back the transaction to the beginning.” In other words, in theory if the temporary table exists only within that one transaction, and if your transaction accesses only InnoDB tables, it’s safe.

My first thought was, you can’t do that. CREATE TABLE commits the transaction, so there’s implicitly more than one transaction. However, as the person pointed out, that isn’t true with CREATE TEMPORARY TABLE. I tested this (sometimes the manual is wrong!) and found that indeed, you can open a transaction, make some changes, create a temporary table with ENGINE=InnoDB, and the InnoDB transaction ID does not change in SHOW INNODB STATUS. The statements are all within one transaction. (However, if you type ROLLBACK the temporary table doesn’t get dropped. It’s not really transactional — it just doesn’t auto-commit the transaction. The ROLLBACK will produce a warning that says “Some non-transactional changed tables couldn’t be rolled back”, which is interesting.)

But does that mean it’s safe for replication?

There is one good way to find out: test it. I fired up my master-and-two-slaves replication sandbox, flushed all the logs, and set out to get to the bottom of the matter.

First, I stopped the slave threads so I could choose which statements to replay on the slave and pick the “crash point” as I wished. (I didn’t shut down the slave, I just stopped the replication processes. This is safe to do even when temporary tables are open.) Then I created a temporary table on the master, inserted some rows into it, and dropped it:

In theory, that’s all in one transaction. Since I flushed the logs before I did this, everything in the binary log so far comes from these statements. Let’s look at the binary logs:

Very interesting! The order of statements is not the same in the binlog as I typed into the console. If you replay the binary log, you’ll get two transactions here.

This shows us something interesting that isn’t considered in the “all inside one transaction” argument: transactions aren’t the only thing that matters. How the server logs events to the binary log is equally important. It appears that we can break replication on the slave by killing the slave after event 98 executes and before event 207 executes. But let’s not draw any conclusions yet. The only way to tell for sure is to really test it.

Since I’d stopped the slave, I could easily test my theory. Let’s let the slave replay events up until position 207, kill it, and restart it:

The slave is now “vulnerable,” in theory. To test my theory, I’ll shut down and restart the slave gracefully, rather than simulating a crash with kill -9, and see what happens.

That’s the error I thought I’d see. Even though it was used entirely within one transaction on the master, the temporary table was not safe for replication.

I’m pretty sure this is a bug. The temporary table shouldn’t be logged out-of-order on the master like this (I suspect it’s logged out-of-order because CREATE TEMPORARY TABLE can’t be rolled back). But bug or no, it is what it is.

There’s one more angle to the email thread that inspired this article: what if the whole transaction is inside a stored procedure? Whether this works or not depends, again, on how the stored procedure call is logged to the binary log. Let’s create a stored procedure to hold the transaction, which this time will insert data from the temporary table into a non-temporary InnoDB table:

Now calling the stored procedure should put a row into the test.ins table. Let’s see:

Good. Let’s see what’s in the binary log:

What you see depends on your version of MySQL, because the logging of stored procedures has changed over time. If just the CALL statement had been logged, I think we might have been safe using a stored procedure. However, since all the statements went into the binlog individually, there’s clearly an opportunity to break replication here. It looks like this doesn’t avoid the problem either.

Interestingly, I also created a version of the stored procedure that doesn’t begin and commit a transaction. After calling it, the CREATE TEMPORARY TABLE statement is logged into the binlog; after then typing COMMIT, the rest of the statements go into the binlog. It appears to me that there’s no way to get the CREATE TEMPORARY TABLE statement to be logged inside the transaction. And when it comes to a replication slave, what’s logged — not what executes on the master — is what’s important.

In summary, I still don’t see any way to use temporary tables with MySQL statement-based replication without some risk of breaking slaves. At some point I may test how it works with row-based replication; I believe even row-based logging format is going to have some problems, because the CREATE TABLE is logged in statement format. But that’s a topic for another post.

About Baron Schwartz

Baron is the lead author of High Performance MySQL.
He is a former Percona employee.

Comments

  1. Michael, actually I knew about all these things already, and that’s not what’s happening. There is NOT an implicit commit, which I proved to myself by examining SHOW INNODB STATUS carefuly. But commit or no commit, the CREATE TEMPORARY TABLE statement is logged right away to the binlog. This is not the same thing as implicit commit.

  2. I thought I had figured this one out, so I was pretty shocked to see your post that it doesn’t work. I was about to go back to the drawing board, but I realized that you were looking directly at the binlogs and I was actually crash-testing my slaves. So I took a look at my bin-logs and I see a very surprising difference.

    Note that I am using a stored procedure and that I have my code surrounded by START TRANSACTION; COMMIT;

    mysql> SHOW binlog events IN ‘bin-logs.000440’\G
    *************************** 1. row ***************************
    Log_name: bin-logs.000440
    Pos: 4
    Event_type: Format_desc
    Server_id: 1
    End_log_pos: 98
    Info: Server ver: 5.0.27-standard-log, Binlog ver: 4
    *************************** 2. row ***************************
    Log_name: bin-logs.000440
    Pos: 98
    Event_type: Query
    Server_id: 1
    End_log_pos: 224
    Info: use test; DROP TEMPORARY TABLE IF EXISTS test.temporary_table
    *************************** 3. row ***************************
    Log_name: bin-logs.000440
    Pos: 224
    Event_type: Query
    Server_id: 1
    End_log_pos: 300
    Info: use test; BEGIN
    *************************** 4. row ***************************
    Log_name: bin-logs.000440
    Pos: 300
    Event_type: Query
    Server_id: 1
    End_log_pos: 272
    Info: use test; CREATE TEMPORARY TABLE test.temporary_table
    ENGINE = InnoDB
    AS SELECT id,
    value_1,
    value_2,
    value_3
    FROM test.permanent_table

    I was including a drop temporary table statement because if this worked, I was going to create a process that could potentially run this multiple times per session. Interestingly enough, we don’t care if the drop table statement isn’t rolled back in a crash – because a crash would drop the tmp table anyway.

    Then I decided to check a new version, so I ran this in 5.1.24-rc and I got the same results.

    So at first glance, we suspect that its a bug where the very first temporary table transaction is not included. I ran several additional tests, and I can’t really find any consistency. If I create two temp tables, the BEGIN occurs between the two, however if I do create temp table, drop temp table, create temp table, the BEGIN occurs before the last create.

  3. Ooops… I spoke too soon!

    You can completely disregard the previous post because I just figured out that the real difference is the “CREATE TEMPORARY TABLE … AS SELECT” syntax. If I take that out and try to make a normal temporary table, it puts the table creation _before_ the transaction starts.

    The plot thickens… you can do the following, and everything will be inside the transaction:

    START TRANSACTION;
    CREATE TEMPORARY TABLE test.test2 AS SELECT 1;
    CREATE TEMPORARY TABLE test.temporary_table
    ( id int(11) NOT NULL,
    value_1 varchar(10) default NULL,
    value_2 varchar(10) default NULL,
    value_3 varchar(10) default NULL
    ) ENGINE=InnoDB;

  4. So I dug even deeper, and I found out that you basically need to perform some kind of action to initiate the transaction before creating your temp table. It seems that merely issuing a “START TRANSACTION” is not enough.

    For example, if you create a table with 1 column:
    CREATE TABLE test.ints (id int(1));

    And your procedure looks like this:
    START TRANSACTION;
    INSERT INTO test.ints VALUES (1);
    CREATE TEMPORARY TABLE…

    COMMIT;

    … then the temporary table will be created inside the transaction.

    I’ve never looked at the internals, but if you just issue START TRANSACTION; COMMIT; with nothing in between, nothing shows up in the binlog. I’m assuming then there is a list of “events” that can occur that cause a transaction to start in the binlog. It seems to me right now that this is simply a case of “create temporary table” is left out of this set of events. I suspect that the word temporary has nothing to do with it… that a simple “CREATE TABLE” does not cause a transaction to start, but that “CREATE TABLE AS SELECT” does.

    I logged a bug – #37011

  5. Parvesh says:

    Baron,

    1. Against creating *scratch tables*: Running DDL statements in production can screw you up a big time. And anyhow my doing a create table won’t be transaction safe if it is MyISAM and if InnoDB, I will be messing up with the common tablespace which I don’t want to do in production. This puts me into a trade-off of saving data in replication and risk of losing everything one bad morning.

    2. Against row based replication: My last resort is row based replication that goes against the very first reason of using temporary tables. Most of us use temporary tables (there are more reasons also) because we want to replace multiple inserts with one bulk in the main table. Now if I enable RBR, I go back to stone age as far as my slaves are concerned.

    I’m in a complete deadlock situation of Reliability vs Performance. And to top it all, I want both.

  6. Gregory, thanks for writing in with these interesting findings. I think a good bug report is lurking in here, and you seem qualified to submit it :-)

  7. Actually, I did submit a bug: Bug #37011: creating temporary table inside a transaction may result in incorrect binlog.

Speak Your Mind

*