Where the open source community meets: Secure your spot for Percona Live Amsterdam! - Register

Downloads

Blog

No more MySQL Crash Safe Replication in 5.0 ?

January 29, 2008

Author

Peter Zaitsev

Insight for DBAs

Share this Post:

As you might know even if you’re only using Innodb tables your replication is not completely crash safe – if Slave MySQL Server crashes/power goes down it is likely for relay logs to run out of sync (they are not synced to the disk) plus position on the master which slave remembers becomes stale.

During MySQL 4.0 and 4.1 series there was a great workaround if you’re using only Innodb tables – Innodb when Innodb does crash recovery it would print position in master log files up to which replication was done:

InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 115, file name portland-bin.001717

1 2	InnoDB: In a MySQL replication slave the last master binlog file InnoDB: position 0 115, file name portland-bin.001717

All you needed to do is to use –skip-slave-start on the slave server and have a little script which will do CHANGE MASTER TO to specified location to restore replication in case of crash (assuming you’re only using Innodb tables of course)

Another way this functionality was usable is cloning Slave->Slave by use of LVM without pausing replication (so you can get consistent master position).

It is all great but it does not work any more in MySQL 5.0 Baron has spotted it by incident when we were verifying some of examples for High Performance MySQL book.

In the bug Heikki explains the code was probably removed in MySQL 5.0 during XA implementation though the code which prints the data back on recovery was not, so it prints you some log file name and position but they have nothing to do with real position on the master anymore.

I hope Innodb team will find a way to restore this functionality or at least remove confusing message which leaves impression this thing works.

Until this issue is fixed getting Crash Safe replication with MySQL is not impossible but surely more complicated and has much higher performance overhead – you can run slave with –sync-binlog and –log-slave-updates so you can see what last statement was executed before the crash and then find matching position in the master logs.

Interesting enough similar functionality is implemented in Mark Callaghan’s patches if you use rpl_transaction_enabled=1

Note even though this functionality is currently broken other somewhat similar functionality works as expected.
Innodb during recovery also prints position in the MySQL binary log:

InnoDB: Last MySQL binlog file position 0 589600615, file name ./galax-bin.001376

1	InnoDB: Last MySQL binlog file position 0 589600615, file name ./galax-bin.001376

This one is helpful in other cases – for example when you’re taking LVM snapshot (for backup or to clone slave from the master) and can’t do traditional and recommended way with FLUSH TABLES WITH READ LOCK for snapshot creation. This happens when you have large amount of tables or your load pattern is to cause unacceptable stall if this lock is used. You can just take LVM Snapshot (assuming you’re only using Innodb tables and not touching your MyISAM system tables) and use this position to point to proper location on the master, or later use for point in time recovery using binary log.

0 0 votes

Article Rating

7 Comments

Oldest

Newest Most Voted

mike503

18 years ago

What I’d like to see is auto-healing replication. Instead of people having to create different custom scripts to do CHANGE MASTER and all that, why can’t something simple be added into the replication protocol itself. Basically you tell a master to replicate to a slave, and the slave marks itself as offline/unusable until it catches up.

I *believe* ibpool does this already (from what the docs say) – it does require some sort of proxy/middle layer to be able to mark servers offline/online, but it would allow a bunch of slaves to basically “subscribe” to the master at any point in time and “catch up” as fast as it can, and once it is caught up, mark itself as open for business.

I think it is close to being there, with tools such as maatkit, ibpool, people’s custom scripts, and replication enhancements. It would be nice to see MySQL actually just build this in so we don’t need those custom things anymore and can rely on MySQL itself to sync itself… that’s been my biggest fear and the reason I have not adopted it yet. Once a slave is out of sync, I have to take the master down, or put it in read-only until the slave catches up…

Help me Obiwan, you’re my only hope!

Author

Peter Zaitsev

18 years ago

Mike,

Absolutely. It is best if MySQL Replication would not break in case of power failure and have single command ways to resync slave from the master (or from the slave) if it out of sync for some reason. Until that is done however we have to relay on various scripts and tools and sometimes these tools require certain underlying functionality to do their job.

Google is working a lot with replication safety and deal between Google and MySQL may mean we will see results of those efforts in MySQL mainline in a few years.

mike503

18 years ago

I hope it is not a “few years”!

That would be a Microsoft-style release cycle… not an open source one!

In a few years I would expect MySQL to be fully enterprise-ready, with clustering/replication/etc, proxying, etc. all being easy to setup, configure and automate.

I think MySQL is so close on so many fronts already…

Konstantin Osipov

18 years ago

File a bug?

Author

Peter Zaitsev

18 years ago

Baron did:
http://bugs.mysql.com/bug.php?id=34058 and it is mentioned in the post 🙂

17 years ago

Mysql doesn’t have synchronous replication and this sux a lot, since postgres has at least 2 projects with this.

Ignacio Nin

15 years ago

An update on this issue.

As reported in http://bugs.mysql.com/bug.php?id=34058, this has since been fixed (this bug was marked as duplicate of http://bugs.mysql.com/bug.php?id=22540) at version 5.0.46.

From two tests conducted on 5.0.91 (latest in 5.0 series):

error log (snipped):
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 517, file name mysql-bin.000002
InnoDB: Last MySQL binlog file position 0 98, file name ./mysql-bin.000002

error log (snipped):
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 1604, file name mysql-bin.000001
InnoDB: Last MySQL binlog file position 0 98, file name ./mysql-bin.000001

The difference is consistently 27, which corresponds to unimportant binlog events:

/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 517
#100810 18:54:31 server id 1 end_log_pos 544 Xid = 18
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

Notice that previous value was an unrelated integer.